Instead of plain snprintf(). To fix the MSVC 2013 build:
Compiling src\util\u_queue.c ...
u_queue.c
src\util\u_queue.c(325) : warning C4013: 'snprintf' undefined; assuming extern returning int
...
mesautil.lib(u_queue.obj) : error LNK2001: unresolved external symbol _snprintf
scons: building terminated because of errors.
Fixes: b238e33bc9 ("kutil/queue: add a process name into a thread name")
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Brian Paul <brianp@vmware.com>
Cc: Roland Scheidegger <sroland@vmware.com>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Instead of plain snprintf(). To fix the MSVC 2013 build:
Compiling src\gallium\auxiliary\util\u_tests.c ...
u_tests.c
src\gallium\auxiliary\util\u_tests.c(624) : warning C4013: 'snprintf' undefined; assuming extern returning int
...
gallium.lib(u_tests.obj) : error LNK2019: unresolved external symbol _snprintf referenced in function _test_texture_barrier
build\windows-x86-debug\gallium\targets\graw-gdi\graw.dll : fatal error LNK1120: 1 unresolved externals
scons: *** [build\windows-x86-debug\gallium\targets\graw-gdi\graw.dll] Error 1120
scons: building terminated because of errors.
Fixes: 56342c97ee ("gallium/u_tests: test FBFETCH and shader-based blending with MSAA")
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Brian Paul <brianp@vmware.com>
Cc: Roland Scheidegger <sroland@vmware.com>
Cc: Dieter Nützel <Dieter@nuetzel-hh.de>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
During code review, Jason pointed out that:
2b3064c073 "i965, anv: Use INTEL_DEBUG for disk_cache driver flags"
Didn't account for INTEL_SCALER_* environment variables.
To fix this, let the compiler return the disk_cache driver flags.
Another possible fix would be to pull the INTEL_SCALER_* into
INTEL_DEBUG bits, but as we are currently using 41 of 64 bits, I
didn't think it was a good use of 4 more of these bits. (5 since
INTEL_PRECISE_TRIG needs to be accounted for as well.)
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Shader time hard codes an index of the shader time buffer within the
gen program.
In order to support shader time in the disk shader cache, we'd need to
add the shader time index into the program key. This should work, but
probably is not worth it for this particular debug feature.
Therefore, let's just disable the disk shader cache if the shader time
debug feature is used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106382
Fixes: 96fe36f7ac "i965: Enable disk shader cache by default"
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes new piglit test:
tests/spec/glsl-1.20/execution/qualifiers/vs-out-conversion-int-to-float-vec4-index.shader_test
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Instead of just looking at the number of color attachments, look at
which ones are actually used by the subpass. This lets us potentially
throw away chunks of the fragment shader. In DXVK, for example, all
subpasses have 8 attachments and most are VK_ATTACHMENT_UNUSED so this
is very helpful in that case.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The back-end compiler emits the number of color writes specified by
wm_prog_key::nr_color_regions regardless of what nir_store_outputs we
have. Once we've gone through and figured out which render targets
actually exist and are written by the shader, we should restrict the key
to avoid extra RT write messages.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
With the new deref instructions, we have to keep the modes consistent
between the derefs and the variables they reference. Since we remove
outputs by changing them to local variables, we need to run the fixup
pass to fix the modes.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The NIR nir_lower_io_arrays_to_elements pass attempts to split I/O
variables which are arrays or matrices into a sequence of separate
variables. This can help link-time optimization by allowing us to
remove varyings at a more granular level.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15177645 -> 15168494 (-0.06%)
instructions in affected programs: 79857 -> 70706 (-11.46%)
helped: 392
HURT: 0
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise, only the first vec4 of a matrix or other complex type will
get marked as flat and we'll interpolate the others. This was caught by
a dEQP test which started failing because it did a SSO vs. non-SSO
comparison. Previously, we did the interpolation wrong consistently in
both versions. However, with one of Tim Arceri's NIR linkingpatches, we
started splitting the matrix input into vectors at link time in the
non-SSO version and it started getting correctly interpolated which
didn't match the broken SSO version. As of this commit, they both get
correctly interpolated.
Fixes: e61cc87c75 "i965/fs: Add a flat_inputs field to prog_data"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
By including the proper headers for getpid and for mkdir.
Fixes: 6ff0c6f4eb
("gallium: move ddebug, noop, rbug, trace to auxiliary to improve build times")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
On windows process.h is a system provided header, and it's required in
include/c11/threads_win32.h. This header interferes with searching for
that header, and results in windows build warnings with scons, but
errors in meson which doesn't allow implicit function declarations. Just
rename process to u_process, which follows the style of utils anyway.
Fixes: 2e1e6511f7
("util: extract get_process_name from xmlconfig.c")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
finish sooner on the older CPUs. (otherwise it gets killed and we fail
the test)
Acked-by: Dave Airlie <airlied@gmail.com>
I missed an important part when porting the change over, fixing my
compiler warning but breaking -Werror=format-security.
Fixes: e6ff5ac446 ("v3d: use snprintf(..., "%s", ...) instead of strncpy")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107443
CXXLD gallium_dri.la
../../../../src/gallium/drivers/vc4/.libs/libvc4.a(vc4_cl_dump.o): In function `vc4_dump_cl':
src/gallium/drivers/vc4/vc4_cl_dump.c:45: undefined reference to `clif_dump_init'
src/gallium/drivers/vc4/vc4_cl_dump.c:82: undefined reference to `clif_dump_destroy'
../../../../src/broadcom/cle/.libs/libbroadcom_cle.a(cle_libbroadcom_cle_la-v3d_decoder.o): In function `v3d_field_iterator_next':
src/broadcom/cle/v3d_decoder.c:902: undefined reference to `clif_lookup_bo'
Fixes: e92959c4e0 ("v3d: Pass the whole clif_dump structure to v3d_print_group().")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107423
CC: Eric Anholt <eric@anholt.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Andres Gomez <agomez@igalia.com>
The ubuntu version provided by Travis is a bit old, and does not detect
correctly some C functions.
Use a more modern version through scons.
Reviewed-by: Andres Gomez <agomez@igalia.com>
Gen10+ has an additional bit in MI_BATCH_BUFFER_END to signal the end
of the context image.
We select the largest size for the context image regardless of the
generation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Python 2 had string_escape and unicode_escape codecs. Python 3 only has
the latter. These work the same as far as we're concerned, so let's use
the future-proof one.
However, the reste of the code expects unicode strings, so we need to
decode them again.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Python 2 had two integer types: int and long. Python 3 dropped the
latter, as it made the int type automatically support bigger numbers.
As a result, Python 3 lost the 'L' suffix on integer litterals.
This probably doesn't make much difference when compiling the generated
C code, but adding it explicitly means that both Python 2 and 3 generate
the exact same C code anyway, which makes it easier to compare and check
for discrepencies when moving to Python 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
In both Python 2 and 3, zlib.Compress.compress() takes a byte string,
and returns a byte string as well.
In Python 2, the script was working because:
1. string literalls were byte strings;
2. opening a file in unicode mode, reading from it, then passing the
unicode string to compress() would automatically encode to a byte
string;
On Python 3, the above two points are not valid any more, so:
1. zlib.Compress.compress() refuses the passed unicode string;
2. compressed_data, defined as an empty unicode string literal, can't be
concatenated with the byte string returned by compress();
This commit fixes this by explicitly using byte strings where
appropriate, so that the script works on both Python 2 and 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The latter is a constructor for file objects, but when actually opening
a file, using the former is more idiomatic.
In addition, file() is not a builtin any more in Python 3, so this makes
the script compatible with both Python 2 and Python 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The XML parser wants byte strings, not unicode strings.
In both Python 2 and 3, opening a file without specifying the mode will
open it for reading in text mode ('r').
On Python 2, the read() method of the file object will return byte
strings, while on Python 3 it will return unicode strings.
Explicitly specifying the binary mode ('rb') makes the behaviour
identical in both Python 2 and 3, returning what the XML parser
expects.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The hex() builtin returns a string containing the hexa-decimal
representation of an integer.
When the argument is not an integer, then the function calls that
object's __hex__() method, if one is defined. That method is supposed to
return a string.
While that's not explicitly documented, that string is supposed to be a
valid hexa-decimal representation for a number. Python 2 doesn't enforce
this though, which is why we got away with returning things like
'NIR_TRUE' which are not numbers.
In Python 3, the hex() builtin instead calls an object's __index__()
method, which itself must return an integer. That integer is then
automatically converted to a string with its hexa-decimal representation
by the rest of the hex() function.
As a result, we really can't make this compatible with Python 3 as it
is.
The solution is to stop using the hex() builtin, and instead use a hex()
object method, which can return whatever we want, in Python 2 and 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
In Python 2, iterating over a byte-string yields single-byte strings,
and we can pass them to ord() to get the corresponding integer.
In Python 3, iterating over a byte-string directly yields those
integers.
Transforming the byte string into a bytearray gives us a list of the
integers corresponding to each byte in the string, removing the need to
call ord().
This makes the script compatible with both Python 2 and 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Detect if the display (X-Server) gpu and Prime renderoffload gpu prefer
different channel ordering for color depth 30 formats ([X/A]BGR2101010
vs. [X/A]RGB2101010) and perform format conversion during the blitImage()
detiling op from tiled backbuffer -> linear buffer.
For this we need to find the visual (= red channel mask) for the
X-Drawable used to display on the server gpu. We use the same proven
logic for finding that visual as in commit "egl/x11: Handle both depth
30 formats for eglCreateImage()".
This is mostly to allow "NVidia Optimus" at depth 30, as Intel/AMD
gpu's prefer xRGB2101010 ordering, whereas NVidia gpu's prefer
xBGR2101010 ordering, so we can offload to nouveau without getting
funky colors.
Tested on Intel single gpu, NVidia single gpu, Intel + NVidia prime
offload with DRI3/Present.
Note: An unintended but pleasant surprise of this patch is that it also
seems to make the modesetting-ddx of server 1.20.0 work at depth 30
on nouveau, at least with unredirected "classic" X rendering, and
with redirected desktop compositing under XRender accel, and with OpenGL
compositing under GLX. Only X11 compositing via OpenGL + EGL still gives
funky colors. modesetting-ddx + glamor are not yet ready to deal with
nouveau's ABGR2101010 format, and treat it as ARGB2101010, also exposing
X-visuals with ARGB2101010 style channel masks. Seems somehow this triggers
the logic in this patch on modesetting-ddx + depth 30 + DRI3 buffer sharing
and does the "wrong" channel swizzling that then cancels out the "wrong"
swizzling of glamor and we end up with the proper pixel formatting in
the scanout buffer :). This so far tested on a NVA5 Tesla card under KDE5
Plasma as shipping with Ubuntu 16.04.4 LTS.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
We need to distinguish if the backing storage of a pixmap
is XRGB2101010 or XBGR2101010, as different gpu hw supports
different formats. NVidia hw prefers XBGR, whereas AMD and
Intel are happy with XRGB.
Use the red channel mask of the first depth 30 visual of
the x-screen to distinguish which hw format to choose.
This fixes desktop composition of color depth 30 windows
when the X11 compositor uses EGL.
v2: Switch from using the visual of the root window to simply
using the first depth 30 visual for the x-screen, as testing
shows that each driver only exports either xrgb ordering or
xbgr ordering for the channel masks of its depth 30 visuals,
so this should be unambiguous and avoid trouble if X ever
supports depth 30 pixmaps on screens with a non-depth 30 root
window visual. This per Michels suggestion.
v3: No change to v2, but spent some time testing this more on
AMD hw, with my software hacked up to intentionally choose
pixel formats/visual with the non-preferred xBGR2101010
ordering on the ati-ddx, also with a standard non-OpenGL
X-Window with depth 30 visual, to make sure that things show
up properly with the right colors on the screen when going
through EGL+OpenGL based compositing on KDE-5. Iow. to confirm
that my explanation to the v2 patch on the mailing list of why
it should work and the actual practice agree (or possibly that
i am good at fooling myself during testing ;).
v4: Drop the local `red_mask` and just `return visual->red_mask`/
`return 0`, as suggested by Eric Engestrom.
Rebased onto current master, to take the cleanup via the new
function dri2_format_for_depth() into account.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The hardware doesn't support byte immediates, so similar to setup_imm_df()
for doubles, these helpers work by loading the constant value into a
VGRF.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fixes the following dEQP-GLES31 cases from NotSupported to
Pass for me:
- dEQP-GLES31.functional.blend_equation_advanced.state_query.*
- dEQP-GLES31.functional.blend_equation_advanced.basic.*
- dEQP-GLES31.functional.blend_equation_advanced.srgb.*
- dEQP-GLES31.functional.blend_equation_advanced.msaa.*
- dEQP-GLES31.functional.blend_equation_advanced.barrier.*
- dEQP-GLES31.functional.draw_buffers_indexed.overwrite_*advanced_blend_eq*
- dEQP-GLES31.functional.state_query.indexed.blend_equation_advanced_*
- dEQP-GLES31.functional.debug.negative_coverage.*.advanced_blend.*
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In gallium, supporting FBFETCH means supporting non-coherent fetches, but
in virglrenderer, due to technical reasons this is backed by coherent
fetches instead. This means we don't need to do anything for the barriers.
However, if we don't have a texture_barrier implementation, we get crashes
because the non-coherent extensions is exposed.
So, let's leave this as a NOP for now.
[airlied: I've got a more complete impl of this somewhere, once we
land the host side].
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
This instruction is used to ensure that TMU stores have been processed
before moving on. In particular, you need any TMU ops to be done by the
time the shader ends.
When texture buffers are used as images in compute shaders, the guest
never sees the modified data since the TBO is always marked as clean.
Fixes most dEQP-GLES31.functional.image_load_store.buffer.* tests.
Example test cases:
dEQP-GLES31.functional.image_load_store.buffer.load_store.r32ui
dEQP-GLES31.functional.image_load_store.buffer.qualifiers.coherent_r32f
dEQP-GLES31.functional.image_load_store.buffer.format_reinterpret.rgba8_rgba8ui
Note: virglrenderer side patch also needed to bind TBOs correctly
Reviewed-by: Dave Airlie <airlied@redhat.com>
Make glXChooseFBConfig properly handle the case where the only matching
configs have the sRGB flag set, but no sRGB attribute is specified.
Since 6e06e281, the sRGBcapable flag is now actually compared, using
MATCH_DONT_CARE.
7b0f912e added defaulting of sRGBcapable to GL_FALSE in
__glXInitializeVisualConfigFromTags(), to handle servers which don't report
it, but this function is also used by glXChooseFBConfig(), so sRGBcapable is
implicitly false when not explicitly specified.
(This can cause e.g. glxinfo to fail to find anything matching the simple
config it looks for if all the candidates have the sRGB flag set to true.
I'm assuming this doesn't happen 'normally' as candidate configs with and
without sRGB true are available)
Move this defaulting to createConfigsFromProperties(), and set the default
for glXChooseFBConfig() in init_fbconfig_for_chooser() to GLX_DONT_CARE.
Reviewed-by: Eric Anholt <eric@anholt.net>
get_supported_modifiers() and pixmap_from_buffers() requests both
expect a window as drawable, passing a pixmap will fail as the Xserver
will fail to match the given drawable to a window.
That leads to dri3_alloc_render_buffer() to return NULL and breaks
rendering when using GLX_DOUBLEBUFFER on pixmaps.
Query the root window of the pixmap on first init, and use the root
window instead of the pixmap drawable for get_supported_modifiers()
and pixmap_from_buffers().
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107117
Fixes: 069fdd5 ("egl/x11: Support DRI3 v1.1")
Signed-off-by: Olivier Fourdan <ofourdan@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: ignore names on purpose, for consistency with other places where
we are doing the same (Alejandro)
v3: changes proposed by Timothy Arceri, implemented by Alejandro Piñeiro:
* Remove redundant 'struct active_xfb_varying'
* Update several comments, including spec quotes if needed
* Rename struct 'active_xfb_varying_array' to 'active_xfb_varyings'
* Rename variable 'array' to 'active_varyings'
* Replace one if condition for an assert (<MAX_FEEDBACK_BUFFERS)
* Remove BufferMode initialization (was already done)
v4: simplify output pointer handling (Timothy)
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
For now we are just adding nir lowerings that are needed/mandatory to
get things working. After everything is settled, we would start to add
good-to-have lowerings.
This patch adds the following calls:
* nir_split_var_copits and nir_split_per_member_structs: as vulkan
drivers are doing now. See commit
b0c643d8f5 ("spirv: Use NIR
per-member splitting") for more info.
Without this commit, piglit tests like this crashes:
spec/arb_gl_spirv/execution/varying/block
And in general most of the shaders that includes any kind of
struct.
* nir_copy_prop: after nir_deref_instr introduction, function calls
need this. See commit "nir,spirv: Rework function calls"
(c11833ab24) for more info.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Whenever a non-zero stream is written to it now sets uses_streams to
true. This reflects the code in validate_geometry_shader_emissions for
GLSL.
v2: set uses_streams at gather_info instead that at spirv to nir
(Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It looks like it was previously taking the SPIR-V instruction number
directly instead of looking up the constant value.
v2: use vtn_constant_value helper (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
From SPIR-V 1.0 spec, section 3.20, "Decoration":
"Stream
Apply to an object or a member of a structure type. Indicates the
stream number to put an output on."
Note the "or", so that means that it is allowed for both a full struct
or a membef or a struct (although the wording is not really ideal, and
somewhat error-prone, imho).
We found this with some Geometry Streams tests for ARB_gl_spirv, where
the full gl_PerVertex is assigned Stream 0 (default value on OpenGL
for gl_PerVertex).
So this commit allows structs to have this Decoration, and sets the
stream at the nir variable if needed.
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
v2: squash two Decoration Stream patches (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
These set the new explicit XFB members on nir_variable.
This is needed to support ARB_gl_spirv, as Vulkan doesn't support
transform feedback.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
These are copied from the from the corresponding values in
ir_variable. The intention is to eventually use them in a pure-NIR
linker.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Allow platform_surfaceless to use swrast even if DRM is not available.
To be used to allow a fuzzer for virgl to be run on a jailed VM without
hardware GL or DRM support.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Signed-off-by: David Riley <davidriley@chromium.org>
A V3D_DEBUG=clif file from a non-texturing .shader_test can now be
successfully run through the CLIF runner in the simulator. Now I need to
build an open source CLIF runner against the v3d DRM module.
We need to dump each buffer's contents in order for a CLIF file, so we
need to collect all of the relocs into a buffer (such as the indirect CL
full of both uniforms and GL shader states) before we start dumping.
These will match the names that the CLIF parser expects to see. I may in
the future decide to change more of the other names so that I match the
names the HW/closed SW team uses for their packets, rather than the names
in the spec (which only they and I can read anyway).
This matches what CLIF parsing expects, and makes
TILE_BINNING_MODE_CONFIGURATION_COMMON_CONFIGURATION into a much more
legible TILE_BINNING_MODE_CFG_COMMON.
The CLIF format expects american english spelling, and the rest of Mesa is
too. I was previously adhering to the spec's spelling, which is
counterproductive.
A few of the upcoming changes would make the V3D_DEBUG=cl output less
readable, so let's make proper CLIF file production be under a separate
V3D_DEBUG=clif flag.
V3D only has one of these (the top 16 bits of a float32) left in its CLs,
but VC4 had many more. This gets us proper pretty-printing of the values
instead of a large uint.
I copied this value from radeonsi, but it was wrong, 1024
seems to be correct answer from looking at gpuinfo.
This should fix a few compute shader related hangs. (at least in CTS)
Cc: <mesa-stable@lists.freedesktop.org>
(airlied: pushed because it avoids hangs)
To avoid serializing, this has the user constant buffer always be 65536
bytes and enabled unless it's required that something else is used for
constant buffer 0.
Fixes artifacts with at least XCOM: Enemy Within, 0 A.D. and Unigine
Valley, Heaven and Superposition.
v2: changed uniform_buffer_bound to be bool instead of a uint32_t
v3: remove magic constants
v3: remove pointless code in nvc0_validate_driverconst
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100177
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The kernel by default serializes the BCL on previous BCLs submitted on
this FD, but not RCLs. For now this fix is conservative and blocks on
last RCL if any vertex texturing is done, which fails to get bin/render
overlap if there was an intermediate job that doesn't draw to the BCL's
buffer. I've dropped a perf_debug() in here to note that as a potential
future improvement.
Fixes intermittent failures in
KHR-GLES3.copy_tex_image_conversions.required.*
We weren't returning at the end of the nir_isntr_type_deref case in
nir_instrs_equal and it was falling through to the default of false.
While we're at it, make the default unreachable because all statements
in the switch now have their own returns. Had we done that before, we
would have caught this bug a long time ago.
Fixes: 19a4662a54 "nir: Add a deref instruction type"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Thomas Helland<thomashelland90@gmail.com>
We no longer have semi-custom clear pipe that uses 3d state. Normal
clears happen via hw blitter, and everything else uses u_blitter these
days. So we don't need this hack.
TODO a3xx+a4xx could get same treatment.
Signed-off-by: Rob Clark <robdclark@gmail.com>
v2: use nir_metadata_preserve
preserve metadata in case of !progress
Fixes: 074f5ba0b5
"nir: Add a simple int64 lowering pass"
Signed-off-by: Karol Herbst <kherbst@redhat.com>
radv implements the Android Vulkan HAL interface, this patch adds
Android.mk building rules by porting of radv automake rules.
vendor HAL module is installed as /vendor/lib/hw/vulkan.radv.so
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Patch changes radv entrypoints generator to not skip this extension even
though it is set as disabled in the vk.xml
Reference: 63525ba730 ("android: enable VK_ANDROID_native_buffer")
Fixes: 69f447553c ("vulkan: Drop vk_android_native_buffer.xml")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Android build system will try to compile vk_format_table.c
as a shipped source, but at compile time it will be missing,
we move it to generated source, where it belongs
Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The xlib/swrast driver only supports GL 2.1. This patch fixes a
crash if the app calls glGetString(GL_SHADING_LANGUAGE_VERSION).
Reviewed-by: Neha Bhende <bhenden@vmware.com>
By default after saying you are emitting a buffer, it'll expect a buffer
size. Once you set a format, it'll keep parsing that format until you
announce something else.
Previously, we emitted in XML order, which I happen to type in the
decreasing offset order of the specifications. However, the CLIF parser
wants increasing offsets.
With CLIFs, the parser will choose an address for the buffer being
created, so we need to use effectively relocations to buffers instead of
the addresses that the driver uses. This is also a whole lot more
intelligible for console output than raw addresses!
To generate CLIF files that the v3dv3 simulator can parse, we're going to
need to decode addresses, and for that we'll need the vaddr lookup
function from the clif structure from within v3d_decoder.
Looks like a mistake from when the deref stuff landed.
Fixes: 506a07e4e3 ("ac/nir: Add deref support to image intrinsics.")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will be used by indirect multidraws.
v2: clean up the function further, change return types to unsigned
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
If we have tracing enabled we could do all the tracing emits
and overflow the precalculated cdw_max.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The code sizes return here get passed to the cache shader insert function,
which then memcpy from the code ptr, and causes all sorts of valgrind
errors like:
==6755== Invalid read of size 8
==6755== at 0x4C32FEE: memcpy@GLIBC_2.2.5 (vg_replace_strmem.c:1021)
==6755== by 0x2305D4C7: radv_pipeline_cache_insert_shaders (radv_pipeline_cache.c:416)
==6755== by 0x2305791D: radv_create_shaders (radv_pipeline.c:2158)
==6755== by 0x2305C523: radv_pipeline_init (radv_pipeline.c:3404)
==6755== by 0x2305C890: radv_graphics_pipeline_create (radv_pipeline.c:3515)
==6755== by 0x230188AB: radv_device_init_meta_blit_color (radv_meta_blit.c:871)
==6755== by 0x2301D50E: radv_device_init_meta_blit_state (radv_meta_blit.c:1278)
==6755== by 0x23011893: radv_device_init_meta (radv_meta.c:352)
==6755== by 0x2300744B: radv_CreateDevice (radv_device.c:1576)
==6755== by 0x5187D0F: ??? (in /usr/lib64/libvulkan.so.1.1.77)
==6755== by 0x518F6A3: ??? (in /usr/lib64/libvulkan.so.1.1.77)
==6755== by 0x5192A42: vkCreateDevice (in /usr/lib64/libvulkan.so.1.1.77)
==6755== Address 0x22a58548 is 4 bytes after a block of size 116 alloc'd
==6755== at 0x4C2EBAB: malloc (vg_replace_malloc.c:299)
==6755== by 0x23089DC4: ac_elf_read (ac_binary.c:144)
==6755== by 0x23090A60: ac_compile_module_to_binary (ac_llvm_helper.cpp:162)
==6755== by 0x23053F06: compile_to_memory_buffer (radv_llvm_helper.cpp:58)
==6755== by 0x23053F06: radv_compile_to_binary (radv_llvm_helper.cpp:98)
==6755== by 0x23052769: ac_llvm_compile (radv_nir_to_llvm.c:3394)
==6755== by 0x23052823: ac_compile_llvm_module (radv_nir_to_llvm.c:3418)
==6755== by 0x23053C05: radv_compile_nir_shader (radv_nir_to_llvm.c:3542)
==6755== by 0x23061B4E: shader_variant_create (radv_shader.c:580)
==6755== by 0x23061CFD: radv_shader_variant_create (radv_shader.c:634)
==6755== by 0x23057765: radv_create_shaders (radv_pipeline.c:2123)
==6755== by 0x2305C523: radv_pipeline_init (radv_pipeline.c:3404)
==6755== by 0x2305C890: radv_graphics_pipeline_create (radv_pipeline.c:3515)
Since we are just inserting the code into the cache, we can avoid these
bad reads and data in the cache by just using the binary code size here.
Fixes: 939e5a382 (radv: add padding for the UMR disassembler)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The kernel's scheduler doesn't rely on our emitting them, and in fact we'd
get in trouble if the kernel decided to schedule too many bins in a row
before getting around to scheduling the corresponding render.
This reflects a change on the HW/closed SW side to drop this unused HW.
With it dropped on their side, the CLIF parser no longer expects to find
VG fields.
Instead of using _mesa_is_winsys_fbo or
_mesa_is_user_fbo to infer if an fbo is
flipped use the FlipY flag.
v2:
* additional window-system framebuffer checks [for jason]
v3:
* s/inverted_y/flip_y/g [for chadv]
* s/InvertedY/FlipY/g [for chadv]
Reviewed-by: Chad Versace <chadversary@chromium.org>
Adds an extension to glFramebufferParameteri
that will specify if the framebuffer is vertically
flipped. Historically system framebuffers are
vertically flipped and user framebuffers are not.
Checking to see the state was done by looking at
the name field. This adds an explicit field.
v2:
* updated spec language [for chadv]
* correctly specifying ES 3.1 [for chadv]
* refactor access to rb->Name [for jason]
* handle GetFramebufferParameteriv [for chadv]
v3:
* correct _mesa_GetMultisamplefv [for kusmabite]
v4:
* update spec language [for chadv]
* s/GLboolean/bool/g [for chadv]
* s/InvertedY/FlipY/g [for chadv]
* s/inverted_y/flip_y/g [for chadv]
* assert changes [for chadv]
Reviewed-by: Chad Versace <chadversary@chromium.org>
Problem 1: u_debug_stack_android.cpp transitively included
"pipe/p_compiler.h", but src/gallium/include was missing from the C++
include path.
Problem 2: Add -std=c++11 to AM_CXXFLAGS. Android's libbacktrace headers
require C++11, but the Android toolchain (at least in the Chrome OS SDK)
does not enable C++11 by default.
v2: Add -std=c++11.
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Cc: Eric Engestrom <eric.engestrom@intel.com>
Gen 11 workarounds table #2056 WABTPPrefetchDisable suggests to
disable prefetching of binding tables for ICLLP A0 and B0
steppings. It fixes multiple gpu hangs in
ext_framebuffer_multisample* tests on ICLLP B0 h/w.
Anuj: Add comments and commit message.
Add gen 11 checks in the code.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Now that the elements version handles both cases, remove the
non-elements version.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Keep information in acp_entry whether the entry is full or not, and
use the ACP in more nodes when visiting the instructions:
- add_copy: write whole variables to the ACP state (regardless the
type).
- visit(ir_dereference_variable *): perform the propagation here if we have a
full candidate. Element-wise here doesn't apply because the mask
isn't available at this point.
- visit_leave(ir_assignment *): process beyond scalar and vector, as
the full variables might have other types.
Also import an improvement from opt_copy_propagation.cpp: if ir_call
is an intrinsic, we know the variables affected, so keep going.
v2: (all from Eric Anholt)
Describe how acp_entry attributes are used.
Don't do book-keeping to avoid adding repeated element to
the dsts in write_elements().
v3: Use _mesa_set_remove_key. (Thomas Helland)
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
The pass can create a temporary result for the instruction and then
moves from it to the original destination, however, if the original
instruction was predicated, the mov has to be predicated as well.
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
This wasn't wrong but it looks better to me like this. It's
only used for debugging purposes (ie. RADV_TRACE_FILE).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Until now, we had separate passes for lowering gl_PatchVerticesIn to
a statically known constant (for TES inputs when linked against a TCS),
and a uniform in the other cases. Annoyingly, one had to be run before
nir_lower_system_values, and the other afterward. This simplified the
passes, but made life painful for the callers.
This patch combines both into a single pass. If you give it a non-zero
static count, it uses that. If you give it Mesa state slots, it turns
it back into a built-in uniform. Otherwise, it does nothing.
This also moves the i965 uniform lowering out to shared code.
v2: Make token arrays const.
Reviewed-by: Eric Anholt <eric@anholt.net>
The extension requires at least OpenGL 3.0 and
OpenGL ES 3.0.
Fixes two ext_base_instance tests:
arb_base_instance-baseinstance-doesnt-affect-gl-instance-id_gles3
arb_base_instance-drawarrays_gles3
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
It's OK to pass them in memory, which is what kernel invocation needs.
Fixes regressions since llvm r337535 ("Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"):
scalar-arithmetic-char
scalar-arithmetic-uchar
scalar-arithemtic-short
scalar-arithmetic-ushort
scalar-comparison-char
scalar-comparison-uchar
scalar-comparison-short
scalar-comparison-ushort
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
These are lowered by brw_nir_lower_vs_inputs(). If they weren't, we
would have already hit the unreachable() in emit_system_values_block().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This seems like a sensible precaution to avoid extra draws. It doesn't
deal with the case of a Z24S8 buffer created by the window system for an
application that happens to never use S.
First, figure out if we can just sneak the clear into the TLB clear, even
if drawing has already happened (since we have job->load and job->clear to
tell us), taking into account GFXH-1461. For any pieces we can't TLB
clear, fall back to drawing a quad without flushing the scene.
Fixes extra scene flushes in glmark2 due to GFXH-1461.
We were computing this at RCL generation time, but that means you can't
unflag the store for an invalidate_resource, or not flag the store if
writmasking is disabled.
These describe what the fields mean in RCL generation. "resolve" is left
over from VC4, and sounds like MSAA resolves (which may or may not be
involved in the store we generate).
This is controlled by a new nir_shader_compiler_options flag, and fixes
dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Seems something went wrong somehow when it was pushed.
v2: combine into one list
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek OIšák <marek.olsak@amd.com>
When this bit was added, it seems the some initialization code
was omitted by mistake.
Since stack-variables have kinda random contents, and we don't
zero initialize the whole struct in these code-paths, we end up
getting random-ish values for this bit.
Spotted by Coverity in the following CIDs:
- 1438115
- 1438123
- 1438130
Fixes: 70425bcfe6 ("gallium: plumb
invariant output attrib thru TGSI")
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jakob Bornecrantz <jakob@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Move the integer conversion after the fixup.
This fixes some regressions with
dEQP-VK.pipeline.vertex_input.single_attribute.mat4.as_a2r10g10b10*
Fixes: b722b29f10 ("radv: add support for 16bit input/output")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This breaks printing input/output variables with more than
4 components like mat4.
Fixes: 1beef89ad8 ("nir: prepare for bumping up max components to 16")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The various base addresses are simply addresses. There may or may not
be a buffer located at those addresses. So, it doesn't make much sense
to request one. Just save the raw address so we can add it later, when
asking about BOs at the final <base + offset> address.
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Normally, i965 programs STATE_BASE_ADDRESS every batch, and puts all
state for a given base in a single buffer.
I'm working on a prototype which emits STATE_BASE_ADDRESS only once at
startup, where each base address is a fixed 4GB region of the PPGTT.
State may live in many buffers in that 4GB region, even if there isn't
a buffer located at the actual base address itself.
To handle this, we need to save the STATE_BASE_ADDRESS values across
multiple batches, rather than assuming we'll see the command each time.
Then, each time we see a pointer, we need to ask the driver for the BO
map for that data. (We can't just use the map for the base address, as
state may be in multiple buffers, and there may not even be a buffer
at the base address to map.)
v2: Fix things caught in review by Lionel:
- Drop bogus bind_bo.size check.
- Drop "get the BOs again" code - we just get the BOs as needed
- Add a message about interface descriptor data being unavailable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
CovID: 1438132
Fixes: a99c9e63a0 "anv: finish the binding_table_pool on
destroyDevice when use_softpin"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
CovID: 1438113, 1438118, 1438119, 1438121
Fixes: dc1d10b396 "anv,radv: Add support for VK_KHR_get_display_properties2"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This wasn't updated for the new scalar ISA parameter. It worked anyway
because all the function's callers live in the same file, so it found
the correct function. Tim made this external for the new st prog_to_nir
translator, which got reverted, but which I'd like to land eventually.
So, fix the prototype.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
It seems the hardware always expects floating point border color values
[0,1] for unsigned, and [-1,1] for signed texture component, regardless
of pixel type, but the border colors are passed according to texture
component type. Hence, before submitting the border color, convert and
scale it these ranges accordingly.
This doesn't seem to work for textures with 32 bit integer components
though, here, it seems that the border color is always set to zero,
regardless of the BORDER_COLOR_TYPE state set in Q_TEX_SAMPLER_WORD0_0.
v2: Simplyfy logic as suggested by Roland Schneidegger
Fixes:
dEQP-GLES31.functional.texture.border_clamp.formats.compressed*
dEQP-GLES31.functional.texture.border_clamp.formats.r* (non 32 bit integer)
dEQP-GLES31.functional.texture.border_clamp.per_axis_wrap_mode.texture_2d*
and a number of piglits out of
piglit run gpu -t texture -t gather -t formats
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Since various options within INTEL_DEBUG could impact code generation,
we need to set the disk cache driver_flags parameter based on the
INTEL_DEBUG flags in use.
An example that will affect the program generated by i965 is the
INTEL_DEBUG=nocompact option.
The DEBUG_DISK_CACHE_MASK value is added to mask the settings of
INTEL_DEBUG that can affect program generation.
v2:
* Use driver_flags (Tim)
* Also update Anvil (Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This extra character should not be used by snprintf, but we make it
available to verify that we printed the exact number we wanted, and
didn't overflow.
v2:
* Also update Anvil
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
we need rounding modes on other conversions involving floats and it is easier
to rename f2f16_undef than renaming all the other ones.
v2: rebased on master
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
also move some of the GLSL builtins over we will need for implementing
some OpenCL builtins
v2: replace NIR_IMM_FP by nir_imm_floatN_t in ported code
fix up changes caused by swizzle rework
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Lightly edited to be valid 'C' code.
Is there a bug open to fix this upstream?
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Python 2 has a range() function which returns a list, and an xrange()
one which returns an iterator.
Python 3 lost the function returning a list, and renamed the function
returning an iterator as range().
As a result, using range() makes the scripts compatible with both Python
versions 2 and 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In Python 2, iterators had a .next() method.
In Python 3, instead they have a .__next__() method, which is
automatically called by the next() builtin.
In addition, it is better to use the iter() builtin to create an
iterator, rather than calling its __iter__() method.
These were also introduced in Python 2.6, so using it makes the script
compatible with Python 2 and 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
In Python 2, dict.keys() and dict.values() both return a list, which can
be sorted in two ways:
* l.sort() modifies the list in-place;
* sorted(l) returns a new, sorted list;
In Python 3, dict.keys() and dict.values() do not return lists any more,
but iterators. Iterators do not have a .sort() method.
This commit moves the build scripts to using sorted() on dict keys and
values, which makes them compatible with both Python 2 and Python 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In Python 2, dictionaries have 2 sets of methods to iterate over their
keys and values: keys()/values()/items() and iterkeys()/itervalues()/iteritems().
The former return lists while the latter return iterators.
Python 3 dropped the method which return lists, and renamed the methods
returning iterators to keys()/values()/items().
Using those names makes the scripts compatible with both Python 2 and 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Most functions in the builtin string module also exist as methods of
string objects.
Since the functions were removed from the string module in Python 3,
using the instance methods directly makes the code compatible with both
Python 2 and Python 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Until now, the needed bits were wrongly included in linux/memfd.h
Since Travis' sys/syscall.h doesn't provide the SYS_memfd_create, we
generate that header manually, including the needed bits to avoid
compilation problems, as the ones observed after:
3228335b55 ("intel: aubinator: handle GGTT mappings")
v2: replace fixes commit with the first direct user of
syscall.h (Emil).
Fixes: 3228335b55 ("intel: aubinator: handle GGTT mappings")
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: Dylan Baker <dylan.c.baker@intel.com>
Cc: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
New versions of virglrenderer supports the precise-flag, so let's
forward it from TGSI if that's the case.
This fixes a few dEQP-GLES31 tests:
- dEQP-GLES31.functional.tessellation.common_edge.quads_equal_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_even_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_odd_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.triangles_equal_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_even_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_odd_spacing_precise
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
40% is the decrease in the LGKM counter (which includes SMEM too)
for the GFX9 LSHS stage.
This will make the LDS size slightly larger, but I wasn't able to increase
the patch stride without corruption, so I'm increasing the vertex stride.
R600_DEBUG=gisel will tell LLVM to use GlobalISel rather than
SelectionDAG for instruction selection.
v2: mareko: move the helper to src/amd/common
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
The original pass only looked for load_uniform intrinsics but there are
a number of other places that could end up loading a push constant. One
obvious omission was images which always implicitly use a push constant.
Legacy VS clip planes also get pushed into the shader. This fixes some
new Vulkan CTS tests that test random combinations of bindings and, in
particular, test lots of UBOs and images together.
Cc: mesa-stable@lists.freedesktop.org
Cc: Kenneth Graunke <kenneth@whitecape.org>
There might be a nicer way to do this, but this is at least correct.
This fixes:
KHR-GL44.tessellation_shader.single.max_patch_vertices
KHR-GL44.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
The pt emit path can only handle 65535 - the number of vertices is
truncated to a ushort, resulting in a too small buffer allocation, which
will crash.
Forcing the pipeline path looks suboptimal, then again this bug is
probably there ever since GS is supported, so it seems it's not
happening often. (Note that the vertex_id in the vertex header is 16
bit too, however this is only used by the draw pipeline, and it denotes
the emit vertex nr, and that uses vbuf code, which will only emit smaller
chunks, so should be fine I think.)
Other solutions would be to simply allow 32bit counts for vertex
allocation, however 65535 is already larger than this was intended for
(the idea being it should be more cache friendly). Or could try to teach
the pt emit path to split the emit in smaller chunks (only the non-index
path can be affected, since gs output is always linear), but it's a bit
tricky (we don't know the primitive boundaries up-front).
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=107295
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This adds the guest side support for ARB_shader_storage_buffer_object.
Co-authors: Gurchetan Singh <gurchetansingh@chromium.org>
v2: move to using separate maximums
(fixup macros)
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Delegating constructors is a C++11 feature, so this was breaking when
compiling with C++98. Change the copy_propagation_state() calls that
used the convenience constructor to use a static member function
instead.
Since copy_propagation_state is expected to be heap allocated, this
change is a good fit.
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107305
We can do one per instruction, and we have to be careful not to overwrite
raddr_b, but this greatly reduces the pressure on uniform loads
(particularly around ldvpm/stvpm instructions).
total instructions in shared programs: 90768 -> 88220 (-2.81%)
instructions in affected programs: 82711 -> 80163 (-3.08%)
Sometimes when iterating over sources, we might want to check if it's the
implicit one. We wouldn't want to match on a non-implicit src using this
function.
These instructions let us write directly to the phys regfile, instead of
just R4. That lets us avoid moving out of R4 to avoid conflicting with
other SFU results, and to avoid conflicting with thread switches.
There is still an extra instruction of latency, which is not represented
in the scheduler at the moment. If you use the result before it's ready,
the QPU will just stall, unlike the magic R4 mode where you'd read the
previous value. That means that the following shader-db results aren't
quite representative (since we now cause some stalls instead of emitting
nops), but they're impressive enough that I'm happy with the change.
total instructions in shared programs: 95669 -> 91275 (-4.59%)
instructions in affected programs: 82590 -> 78196 (-5.32%)
Similarly to VC4's implementation, by not picking r0 immediately upon
freeing it, we give the scheduler more of a chance to fit later writes in
earlier. I'm not clear on whether there's any real cost to picking phys
over accumulators, so keep that behavior for now.
shader-db:
total instructions in shared programs: 96831 -> 95669 (-1.20%)
instructions in affected programs: 77254 -> 76092 (-1.50%)
This restriction existed in V3D 2.x, but lifting it was a major change in
3.x.
shader-db results:
total instructions in shared programs: 98117 -> 96831 (-1.31%)
instructions in affected programs: 48520 -> 47234 (-2.65%)
It's actually also a bit safer, since now the compiler will warn if
the string is larger than the `.name` array.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
According to the spec, these should apply to all read/write access
types (so would be equivalent to specifying all other access types
individually). Currently, they were doing nothing.
v2: Handle VK_ACCESS_MEMORY_WRITE_BIT in dstAccessMask.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
modules[i] can be NULL for merged shaders but we have to
free the NIR code. radv_can_dump_shader_stats() already handles
if modules[i] is NULL, no need to check it twice.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The first fix attempt contained a nasty typo which somehow didn't get
caught in review. It also didn't work as intended because the sRGB
conversion was happening but then throwing away all but the red channel
because it dind't know it was RGB. Really, it's my fault for trying to
fix a bug without first writing tests. I've now written tests and they
pass with this change. :)
Fixes: 11712b9ca1 "intel/blorp: Fix blits to R8G8B8_UNORM_SRGB"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We've had several broadwell hangs that have come down to this bit just
not working correctly. Most recently, we've had a pile of hangs
reported with apps running under DXVK:
https://github.com/doitsujin/dxvk/issues/469
Instead, use the bit that doesn't try to imply weird D3D coherency
things and just force-enables the PS like we want.
cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We support mipmapped and arrayed linear images so we need to support
vkGetImageSubresourceLayout on them. Fortunately, it's just a trivial
call into ISL.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This allows NIR to CSE more operations. LLVM does this also so the
impact is limited, however doing this in NIR allows other opts to
make progress. For example some loops in Civilization Beyond Earth
shaders are unrolled.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Note that the use of ICMS_INNER_CONSERVATIVE disagrees with the GL driver.
Perhaps it's more performant than ICMS_NORMAL and is otherwise permitted?
Not sure, so I left it as-is.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This patch applies the necessary changes in Android.common.mk
as per automake rules, to avoid following building error:
external/mesa/src/gallium/drivers/nouveau/nouveau_screen.c:159:8:
error: implicit declaration of function 'disk_cache_get_function_timestamp'
is invalid in C99 [-Werror,-Wimplicit-function-declaration]
if (disk_cache_get_function_timestamp(nouveau_disk_cache_create,
^
1 error generated.
(v2) -DENABLE_SHADER_CACHE Android cflag is kept, to leave the AS-IS capability enabled
Fixes: cc10b34 ("util/disk_cache: Fix disk_cache_get_function_timestamp with disabled cache.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The commit 76dfed8ae2 changed nir_intrinsics.h to be a generated
header, but the corresponding dependency was not updated for Android.
It causes the error:
[ 0% 19/4336] target C: libmesa_pipe_radeonsi <= external/mesa/src/gallium/drivers/radeonsi/si_debug.c
...
In file included from external/mesa/src/gallium/drivers/radeonsi/si_debug.c:25:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_pipe.h:28:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_shader.h:140:
In file included from external/mesa/src/amd/common/ac_llvm_build.h:30:
external/mesa/src/compiler/nir/nir.h:966:10: fatal error: 'nir_intrinsics.h' file not found
^~~~~~~~~~~~~~~~~~
1 error generated.
Fixes: 76dfed8ae2 ("nir: mako all the intrinsics")
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Mauro Rossi <issor.oruam@gmail.com>
After optimization passes and many trasfromations most of memory
NIR holds is a garbage which was being freed only after shader deletion.
Freeing it at the end of linking will save memory which would be useful
in case there are a lot of complex shaders being compiled.
The common case for this issue is 32bit game running under Wine.
The cost of the optimization is around ~3-5% of compilation speed
with complex shaders.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Nothing in EGL_KHR_gl_image.txt seems to let us deny creation based on
formats, and doing so causes many failures in
dEQP-EGL.functional.image.api.*
The NONE value we were protecting from only gets looked at in the
__DRI_IMAGE_ATTRIB_FORMAT and __DRI_IMAGE_ATTRIB_FOURCC queries, which are
used from wayland and gbm (which throw an error cleanly on unknown format)
and DMABUF export.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The EGL CTS expects that you can make images from all sorts of things,
including things like z16 and s8, which we don't have DRM fourccs for.
Just return an error when trying to export one of those.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Recreating our context's syncobj with ALREADY_SIGNALED meant that if you
created two fences in a row, then waiting on the second would succeed
immediately. Instead, export a sync file in the gallium fence (since we
don't have a syncobj clone ioctl), and just create a new syncobj to wait
on whenever we need to.
Noticed while debugging
dEQP-GLES3.functional.fence_sync.client_wait_sync_finish
It tends to return >0 in the success case (I think the value is something
like "how much of the timeout remained"). Fixes
dEQP-GLES3.functional.fence_sync.client_wait_sync_finish
When requesting a texture of the internal format GL_RGB32F Gallium will
try to allocate a renderable texture and returns RGBA32F or RGBX32F, but
when one requests GL_RGB32I or GL_RGB32UI the according 3-component
texture will be returned. This leads to problems later, when one wants
to use glCopyImageSubData to copy data between these textures that should
be compatible, but given the way virgl and Gallium handle this the latter
fails with an assertion, because the per-texel bit size is different.
By allowing the GL_RGB32* only for texture buffers these problems are avoided
without losing the ARB_tbo_rgb32 extension (thanks Ilia Mirkin).
v2: Correct spelling (Gurchetan Singh)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
When running gdb, make sure to pass the LD_PRELOAD variable only to
the executed program, not the debugger. Otherwise the debugger will
run the preloaded constructor/destructor too and bad things will
happen.
Suggested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The problem with passing the configuration of the dump lib through a
file descriptor is that it can be read only once. But under gdb you
might want to rerun your program multiple times.
This change hands the configuration through a temporary file that is
deleted once the command line passes to intel_dump_gpu has exited.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The array index needs to be corrected and it must be insured that it is
rounded and its value is non-negative before it is combined with the
face id.
v5: Use RNDNE instead of ADD 0.5 and FLOOR (Ilia Mirkin)
v6: Fix type (Roland Scheidegger)
Fixes 182 from android/cts/master/gles31-master.txt:
dEQP-GLES31.functional.texture.filtering.cube_array.formats.*
dEQP-GLES31.functional.texture.filtering.cube_array.sizes.*
dEQP-GLES31.functional.texture.filtering.cube_array.combinations.nearest_mipmap_*
dEQP-GLES31.functional.texture.filtering.cube_array.combinations.linear_mipmap_*
dEQP-GLES31.functional.texture.filtering.cube_array.no_edges_visible.*
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Correct the array index for TEXTURE_*1D_ARRAY, and TEXTURE_*2D_ARRAY
The standard says the array index is evaluated according to
floor(z + 0.5)
but RNDNE is sufficient also for the test cases were z is close to 1.5
and it is likely to hit 1.5, the corner case were RNDNE gives a result
different from above formula.
v5: - Use RNDNE instead of ADD 0.5 and FLOOR (Ilia Mirkin)
- update commit message
Fixes 325 tests from android/cts/master/gles3-master.txt:
dEQP-GLES3.functional.shaders.texture_functions.texture.*sampler2darray*
dEQP-GLES3.functional.shaders.texture_functions.textureoffset.*sampler2darray*
dEQP-GLES3.functional.shaders.texture_functions.texturelod.sampler2darray*
dEQP-GLES3.functional.shaders.texture_functions.texturelodoffset.*sampler2darray*
dEQP-GLES3.functional.shaders.texture_functions.texturegrad.*sampler2darray*
dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.*sampler2darray*
dEQP-GLES3.functional.texture.filtering.2d_array.formats.*
dEQP-GLES3.functional.texture.filtering.2d_array.sizes.*
dEQP-GLES3.functional.texture.filtering.2d_array.combinations.*
dEQP-GLES3.functional.texture.shadow.2d_array.*
dEQP-GLES3.functional.texture.vertex.2d_array.*
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Gradients used in texture lookups and the offsets must reside in the
same fetch clause (the first is imposed by the hardware and the second
is expected by sb). In order to ensure that no ALU clause is inserted
between emission and use of these, delay the emission of these
instructions until the texture instruction using them is also emitted.
This is needed in preparation for the correction of the texture array
indices.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
radv always needs it, so just check the header instead. Also
do not declare the function if the variable is not set, so we
get a nice compile error instead of failing to open a device
at runtime.
Fixes: b87ef9e606 "util: fix MSVC build issue in disk_cache.h"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reinserting code directly before a jump means the block gets split
and merged, removing the original block and replacing it in the
process.
Hence keeping a pointer to the continue block over a reinsert
causes issues.
This code changes nir_opt_if to simply look for the new continue
block.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107275
CC: 18.1 <mesa-stable@lists.freedesktop.org>
The goal is to use radv_barrier()/radv_subpass_barrier() as
much as possible for further optimizations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
total instructions in shared programs : 5480808 -> 5472107 (-0.16%)
total gprs used in shared programs : 647530 -> 647532 (0.00%)
total shared used in shared programs : 389120 -> 389120 (0.00%)
total local used in shared programs : 21064 -> 21064 (0.00%)
total bytes used in shared programs : 58551648 -> 58459352 (-0.16%)
local shared gpr inst bytes
helped 0 0 73 2609 2609
hurt 0 0 71 34 34
An alternative solution to the problem fixed in
0bd83d0 ("nv50/ir: move LateAlgebraicOpt to the very end").
total instructions in shared programs : 5481195 -> 5480808 (-0.01%)
total gprs used in shared programs : 647535 -> 647530 (-0.00%)
total shared used in shared programs : 389120 -> 389120 (0.00%)
total local used in shared programs : 21064 -> 21064 (0.00%)
total bytes used in shared programs : 58555784 -> 58551648 (-0.01%)
local shared gpr inst bytes
helped 0 0 2 34 34
hurt 0 0 0 0 0
This instruction seems to be faster than S2R and requires no barrier,
though the range of special registers it can read from is limited.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
LLVM 7 returns incorrect results when count is 0, something
has been broken since LLVM 6. Of course, the best solution is
to fix LLVM but this workaround works as expected for now.
Original workaround by Philippe Rebohle.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107276
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When handling 'if' in copy propagation elements, if a certain variable
was killed when processing the first branch of the 'if', then the
second would get any propagation from previous nodes.
x = y;
if (...) {
z = x; // This would turn into z = y.
x = 22; // x gets killed.
} else {
w = x; // This would NOT turn into w = y.
}
With the change, we let copy propagation happen independently in the
two branches and only then apply the killed values for the subsequent
code.
One example in shader-db part of shaders/unity/8.shader_test:
(assign (xyz) (var_ref col_1) (var_ref tmpvar_8) )
(if (expression bool < (swiz y (var_ref xlv_TEXCOORD0) )(constant float (0.000000)) ) (
(assign (xyz) (var_ref col_1) (expression vec3 + (var_ref tmpvar_8) ... ) ... )
)
(
(assign (xyz) (var_ref col_1) (expression vec3 lrp (var_ref col_1) ... ) ... )
))
The variable col_1 was replaced by tmpvar_8 in the then-part but not
in the else-part.
NIR deals well with copy propagation, so it already covered for the
missing ones that this patch fixes.
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead of keeping multiple acp_entries in lists, have a single
acp_entry per variable. With this, the implementation of clone is more
convenient and now fully implemented. In the previous code, clone was
only partial.
Before this patch, each acp_entry struct represented a write to a
variable including LHS, RHS and a mask of what channels were written
to. There were two main hash tables, the first (lhs_ht) stored a list
of acp_entries per LHS variable, with the values available to copy for
that variable; the second (rhs_ht) was a "reverse index" for the first
hash table, so stored acp_entries per RHS variable.
After the patch, there's a single acp_entry struct per LHS variable,
it contains an array with references to the RHS variables per
channel. There now is a single hash table, from LHS variable to the
corresponding entry. The "reverse index" is stored in the ACP entry,
in the form of a set of variables that copy from the LHS. To make the
clone operation cheaper, the ACP entries are created on demand.
This should not change the result of copy propagation, a later patch
will take advantage of the clone operation.
v2: Add note clarifying how the hashtable is destroyed.
v3: (all from Eric Anholt)
Add remove_unused_var_from_dsts() function for reuse.
Remove from dsts as we go instead of clearing at the end.
Add clarifying comment to erase().
Reviewed-by: Eric Anholt <eric@anholt.net>
Separate higher level logic of visiting instructions and chosing when
to store and use new copy data from the datastructure holding the copy
propagation information. This will also make easier later patches that
change the structure.
v2: Remove empty destructor and clarify how hash tables are destroyed.
Reviewed-by: Eric Anholt <eric@anholt.net>
In commit 86cb05a6d3 ("intel: aubinator: remove standard input
processing option") we removed the ability to process aub as an input
stream because we're now rely on mmapping the aub file to back the
buffers aubinator is parsing.
intel_aubdump was the provider of the standard input data and since
we've copied/reworked intel_aubdump into intel_dump_gpu within Mesa,
we don't need that code anymore.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
r600_gpu_load.c: In function ‘r600_gpu_load_thread’:
../../../../src/util/os_time.h:82:7: warning: assuming signed overflow does not occur when assuming that (X + c) >= X is always true [-Wstrict-overflow]
if (start <= end)
Assignment and usage of this variable both happen inside an
if(rad_image_has_dcc()) {} blocks. It seems gcc plays it safe and
assumes that both function calls could have different return values.
But in this case we should be safe.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We should only use a #define locally once it's been upstreamed, and at
that point you should just update our drm_fourcc.h.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
color_interp_vgpr_index was declared as a generic char value.
Because signed values are used in this variable, the result
was not safe across architectures and crashed on ppc64[el]
and arm.
Declare color_interp_vgpr_index as a signed type.
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This lets us move the glBlitFramebuffer nonsense into the GL driver and
make the usage of BLORP mutch more explicit and obvious as to what it's
doing.
Reviewed-by: Chad Versace <chadversary@chromium.org>
At the moment, this is entirely internal but we'll expose it to clients
of the BLORP API in the next commit.
Reviewed-by: Chad Versace <chadversary@chromium.org>
After optimization passes and many trasfromations most of memory
NIR holds is a garbage which was being freed only after shader deletion.
Freeing it at the end of linking will save memory which would be useful
in case there are a lot of complex shaders being compiled.
The common case for this issue is 32bit game running under Wine.
The cost of the optimization is around ~3-5% of compilation speed
with complex shaders.
V2: by Jason Ekstrand
- Move nir_sweep up, right after the last change of NIR
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103274
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Dependencies between rings are inserted correctly if a buffer is
represented by only one unique amdgpu_winsys_bo instance.
Use a hash table keyed by amdgpu_bo_handle to have exactly one
amdgpu_winsys_bo per amdgpu_bo_handle.
v2: return offset and stride properly
Tested-by: Leo Liu <leo.liu@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Instead of having quite so many singletons, we use a struct aub_file to
organize the bits we need for writing an aub file.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
For large buffers which span an entire l1 page table, we got the range
calculations wrong. In this case, we end up with an l1_start which is
the first byte represented by the given l1 table and an l1_end which is
the first byte after the range represented by the l1 table. Then
l2_start_index == L2_index(l2_end) due to roll-over. Instead, compute
lN_end using (1Ull << shift) - 1 so that lN_end is the last byte in the
range represented by the Nth level page table. When we do this, we
don't need the conditional expression anymore.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Compiler doesn't pick up that level and start_layer will be defined,
so do as was done for num_layers in 4d8b476fa9 "intel/blorp: Fix
compiler warning about num_layers." and always set it.
Fixes warning
../../src/mesa/drivers/dri/i965/brw_blorp.c: In function ‘brw_blorp_clear_depth_stencil’:
../../src/mesa/drivers/dri/i965/brw_blorp.c:1439:4: warning: ‘start_layer’ may be used uninitialized in this function [-Wmaybe-uninitialized]
blorp_clear_depth_stencil(&batch, &depth_surf, &stencil_surf,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
level, start_layer, num_layers,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
x0, y0, x1, y1,
~~~~~~~~~~~~~~~
(mask & BUFFER_BIT_DEPTH), ctx->Depth.Clear,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stencil_mask, ctx->Stencil.Clear);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../src/mesa/drivers/dri/i965/brw_blorp.c:1439:4: warning: ‘level’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
And also specify the maximum size when writing to static buffers. The
warning below refers to the case where "str5" could be larger than
"str5 - str4", then the strcat would have overlapping dst and src.
Compiler doesn't pick up the bound from the snprintf above, so we make
clear the bounds of str5 by using strncat() instead of strcat().
../../src/util/tests/string_buffer/string_buffer_test.cpp: In member function ‘virtual void string_buffer_string_buffer_tests_Test::TestBody()’:
../../src/util/tests/string_buffer/string_buffer_test.cpp:106:10: warning: ‘char* strcat(char*, const char*)’ accessing 81 or more bytes at offsets 48 and 128 may overlap 1 byte at offset 128 [-Wrestrict]
strcat(str4, str5);
~~~~~~^~~~~~~~~~~~
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
GCC 8.1.1 is having a hard time identifying that the values are
properly initialized when used. In the 'memset_value' case, we pass
the uninitialized value to another function (that will use only if the
conditions match the initialization).
Just give enough hint to the compiler to figure things out. Fixes the
warnings
../../src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function ‘intel_miptree_alloc_aux’:
../../src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1839:18: warning: ‘memset_value’ may be used uninitialized in this function [-Wmaybe-uninitialized]
mt->aux_buf = intel_alloc_aux_buffer(brw, &aux_surf, needs_memset,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
memset_value);
~~~~~~~~~~~~~
../../src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1698:10: warning: ‘initial_state’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (wants_memset)
^
../../src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1772:23: note: ‘initial_state’ was declared here
enum isl_aux_state initial_state;
^~~~~~~~~~~~~
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Code assumes that all the necessary fields will exist, but compiler
doesn't know about this. Provide zero as default values, like in other
decoding functions.
Fixes warnings
../../src/intel/common/gen_batch_decoder.c: In function ‘handle_media_interface_descriptor_load’:
../../src/intel/common/gen_batch_decoder.c:347:7: warning: ‘binding_entry_count’ may be used uninitialized in this function [-Wmaybe-uninitialized]
dump_binding_table(ctx, binding_table_offset, binding_entry_count);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../src/intel/common/gen_batch_decoder.c:347:7: warning: ‘binding_table_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
../../src/intel/common/gen_batch_decoder.c:346:7: warning: ‘sampler_count’ may be used uninitialized in this function [-Wmaybe-uninitialized]
dump_samplers(ctx, sampler_offset, sampler_count);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../src/intel/common/gen_batch_decoder.c:346:7: warning: ‘sampler_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
../../src/intel/common/gen_batch_decoder.c:343:7: warning: ‘ksp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
ctx_disassemble_program(ctx, ksp, "compute shader");
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../src/intel/common/gen_batch_decoder.c: In function ‘decode_dynamic_state_pointers’:
../../src/intel/common/gen_batch_decoder.c:663:54: warning: ‘state_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
const uint32_t *state_map = ctx->dynamic_base.map + state_offset;
~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
../../src/intel/common/gen_batch_decoder.c: In function ‘gen_print_batch’:
../../src/intel/common/gen_batch_decoder.c:856:13: warning: ‘next_batch.map’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (next_batch.map == NULL) {
^
../../src/intel/common/gen_batch_decoder.c:860:13: warning: ‘next_batch.addr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
gen_print_batch(ctx, next_batch.map, next_batch.size,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next_batch.addr);
~~~~~~~~~~~~~~~~
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
strncpy() doesn't guarantee the terminator NUL, so we would need to
set ourselves. Just use snprintf() instead.
Fixes the warnings
../../src/intel/common/gen_decoder.c: In function ‘iter_decode_field’:
../../src/intel/common/gen_decoder.c:897:7: warning: ‘strncpy’ specified bound 128 equals destination size [-Wstringop-truncation]
strncpy(iter->name, iter->field->name, sizeof(iter->name));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘iter_advance_field’,
inlined from ‘gen_field_iterator_next’ at ../../src/intel/common/gen_decoder.c:1015:9:
../../src/intel/common/gen_decoder.c:844:7: warning: ‘strncpy’ specified bound 128 equals destination size [-Wstringop-truncation]
strncpy(iter->name, iter->field->name, sizeof(iter->name));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The error buffer is limited to 256, but the report contains the
filename and possibly other data. So give it more space.
Avoids the warnings
../../src/intel/vulkan/anv_util.c: In function ‘__anv_perf_warn’:
../../src/intel/vulkan/anv_util.c:66:42: warning: ‘%s’ directive output may be truncated writing up to 255 bytes into a region of size 254 [-Wformat-truncation=]
snprintf(report, sizeof(report), "%s: %s", file, buffer);
^~ ~~~~~~
../../src/intel/vulkan/anv_util.c:66:4: note: ‘snprintf’ output 3 or more bytes (assuming 258) into a destination of size 256
snprintf(report, sizeof(report), "%s: %s", file, buffer);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../src/intel/vulkan/anv_util.c: In function ‘__vk_errorf’:
../../src/intel/vulkan/anv_util.c:96:48: warning: ‘%s’ directive output may be truncated writing up to 255 bytes into a region of size 252 [-Wformat-truncation=]
snprintf(report, sizeof(report), "%s:%d: %s (%s)", file, line, buffer,
^~ ~~~~~~
../../src/intel/vulkan/anv_util.c:96:7: note: ‘snprintf’ output 8 or more bytes (assuming 263) into a destination of size 256
snprintf(report, sizeof(report), "%s:%d: %s (%s)", file, line, buffer,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
error_str);
~~~~~~~~~~
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When one of the cases is not part of the enum, the compilar complains:
../../src/intel/vulkan/anv_formats.c: In function ‘anv_GetPhysicalDeviceFormatProperties2’:
../../src/intel/vulkan/anv_formats.c:728:7: warning: case value ‘1000001004’ not in enumerated type ‘VkStructureType’ {aka ‘enum VkStructureType’} [-Wswitch]
case VK_STRUCTURE_TYPE_WSI_FORMAT_MODIFIER_PROPERTIES_LIST_MESA:
^~~~
Given the switch has an "default:" case, we don't lose anything by
switching on the unsigned value to avoid the warning.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The "__inst" will contain the name used for the variable of type
"__type *". Parenthesis is not necessary as the name itself shouldn't
be an expression.
Fixes warning:
In file included from ../../src/mesa/main/mtypes.h:49,
from ../../src/intel/compiler/brw_compiler.h:30,
from ../../src/intel/compiler/brw_shader.h:29,
from ../../src/intel/compiler/brw_fs.h:31,
from ../../src/intel/compiler/brw_fs_cse.cpp:24:
../../src/intel/compiler/brw_fs_cse.cpp: In member function ‘bool fs_visitor::opt_cse_local(bblock_t*)’:
../../src/compiler/glsl/list.h:675:12: warning: unnecessary parentheses in declaration of ‘entry’ [-Wparentheses]
__type *(__inst); \
^
../../src/intel/compiler/brw_fs_cse.cpp:257:10: note: in expansion of macro ‘foreach_in_list_use_after’
foreach_in_list_use_after(aeb_entry, entry, &aeb) {
^~~~~~~~~~~~~~~~~~~~~~~~~
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Explicitly convert to signed integer. Conversion is valid since is the
same (implicitly) used to initialize the loop. Avoids the warning:
../../src/intel/compiler/brw_fs.cpp: In member function ‘bool fs_visitor::lower_simd_width()’:
../../src/intel/compiler/brw_fs.cpp:5761:45: warning: comparison of integer expressions of different signedness: ‘int’ and ‘unsigned int’ [-Wsign-compare]
split_inst.eot = inst->eot && i == n - 1;
~~^~~~~~~~
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes warning:
../../src/compiler/spirv/vtn_variables.c: In function ‘var_decoration_cb’:
../../src/compiler/spirv/vtn_variables.c:1400:12: warning: ‘is_vertex_input’ may be used uninitialized in this function [-Wmaybe-uninitialized]
bool is_vertex_input;
^~~~~~~~~~~~~~~
The code used to set is_vertex_input in all possible codepaths, but
after 23edc5b1ef "spirv: translate default-block uniforms" the
compiler isn't sure all codepaths will initialize the variable.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Core infrastructure for performance counters, using gallium's batch
query interface (to support AMD_performance_monitor).
Signed-off-by: Rob Clark <robdclark@gmail.com>
For batch queries we have N different query_type's for one query, so
mapping a single query_type to a sample_provider doesn't really work
out. Instead add a new constructor to construct a query directly
from a sample_provider.
Also, the sample buffer size needs to be determined at runtime, as
it depends on the number of query_types.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Take the query object, rather than the ctx. The ctx ptr isn't hugely
useful but for back queries we will need the query object to properly
get the results.
Signed-off-by: Rob Clark <robdclark@gmail.com>
For now it still goes to stdout, this will make it easier to support
output on stderr like what frameretrace expects.
(If we eventually have a proper GL extension for this, implementation
probably looks like dumping shader disasm to a tmp file and then dumping
that out over whatever mechanism is used.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
v2: reword comment about lower_helper_invocations to be more clear
that it might not work on all hardware
v3: add special variant of load_sample_id which does not imply per-
sample shading
Signed-off-by: Rob Clark <robdclark@gmail.com>
Frameretrace ends up w/ excess calls to SelectPerfMonitorCountersAMD()
which ends up re-enabling already enabled counters. Which causes
ActiveCounters[group] to be double incremented for the same counter.
This causes BeginPerfMonitorAMD() to fail.
The AMD_performance_monitor spec doesn't say that an error should be
generated in this case. So I think the safe thing to do is just safe-
guard against excess increments/decrements.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Virgl could save a lot of work converting buffers in the host side
between formats if Mesa supported a bunch of other formats when reading
pixels.
This commit adds cases to handle specific formats so that the values
reported by the two calls match more closely the underlying native
formats.
In GLES is important that IMPLEMENTATION_COLOR_READ_* return the native
format and data type because the spec only allows reading with those,
besides GL_RGBA or GL_RGBA_INTEGER.
Additionally, because virgl currently doesn't implement such
conversions, this commit fixes several tests in
dEQP-GLES3.functional.fbo.color.clear.*, when using virgl in the guest
side.
The logic is based on knowledge that is shared with
_mesa_format_matches_format_and_type() but we cannot assert that the
results match as we don't have all the starting information at both
points. So leave the assert out and hope CI comes soon to save us all.
v2: * Let R10G10B10A2_UINT fall back to GL_RGBA_INTEGER (Eric Anholt)
* Assert with _mesa_format_matches_format_and_type (Eric Anholt)
v3: * Remove the assert, as it won't be reliable (Eric Anholt)
v4: * Use _mesa_is_format_integer in the fallback (Eric Anholt)
v5: * Remove superfluous call to
_mesa_uncompressed_format_to_type_and_comps (Eric Anholt)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>
By default, our internal rendering commands are discarded
only if the predicate is non-zero (ie. DRAW_VISIBLE). But
VK_EXT_conditional_rendering also allows to discard commands
when the predicate is zero, which means we have to use a
different flag.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
VK_EXT_conditional_rendering allows to discard draw commands
(not only normal draws).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Addresses in the command streams should be in canonical form (i.e
bit[63:48] == bit[47]). If the [bo->gtt_offset, bo->gtt_offset +
target_offset] range contains the address 0x800000000000, the current
code will fail that criteria.
v2: Fix missing include (Lionel)
Fixes: 1c9053d076 ("i965: Prepare batchbuffer module for softpin support.")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't need to emit PS_PARTIAL_FLUSH for the pre fragment shader
stages (ie. geometry/tessellation). Emitting VS_PARTIAL_FLUSH
is enough for these stages. Note that PS_PARTIAL_FLUSH also
synchronizes all vertex stages.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The assert is checking that we are not binding more descriptor sets
than the supported by the driver. When binding the descriptor set
number MAX_SETS-1, it was breaking the assert because
descriptorSetCount = 1.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Just a nice hint for both peoples and compilers.
Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Just a nice hint for both peoples and compilers.
Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Just a nice hint for both peoples and compilers.
Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Ported from radeonsi. Improves windowed glxgears ran as
vblank_mode=0 glxgears -info -geometry 0+0+512+512
from ≈2270 FPS to ≈2360 FPS. Tested with AMD TURKS.
v2: turned out glxgears ignores the option above, the correct way would
be "512x512+0+0". Now it can be seen 512x512 actually loses 30 FPS.
300×300 however wins around a hundred FPS, and to leave some room in
case results may differ for other cards I want not to nitpick in search
of an optimum but to simply leave 300×300 in the code.
v3: remove redundant braces, and try harder for the mail to stick to
the rest of the series.
Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Reviewed-by: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Annoyingly we still have to briefly drop the lock to unref resources..
but push the lock down into __fd_batch_destroy() so we can invalidate
the batch and reset resources before dropping the lock.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Re-allocate rather than re-use. Originally we had an unnecessarily
complex design to avoid re-allocating cmdstream buffers. But now that
support for "growable" cmdstream buffers has been in place for a couple
years, I guess we can care a bit less about the extra overhead on older
kernels.
But making the batches one-shot removes a class of potential race
conditions vs the flush_queue.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Instead of the reading batch setting a dependency on the writing batch,
simply flush the writing batch immediately. This avoids situations
where we have to flush the context's current batch later.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This was basically to avoid a zero-dword IB (indirect-branch), but
instead just don't emit the IB packet in that case.
Signed-off-by: Rob Clark <robdclark@gmail.com>
pipe_framebuffer_state can have samples=0 in various cases, which is
actually the same thing as samples=1. So use the _get_num_samples()
helper to populate the key, to avoid this looking like two distinct
fb states to the cache.
Signed-off-by: Rob Clark <robdclark@gmail.com>
It is possible for a batch to be freed under our feet when flushing, so
it is best to hold a reference to all of them up-front.
Signed-off-by: Rob Clark <robdclark@gmail.com>
OpenCL knows vector of size 8 and 16.
v2: rebased on master (nir_swizzle rework)
rework more declarations with nir_component_mask_t
adjust print_var_decl
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
If we know that the given image doesn't have any metadata,
we don't need to flush.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The Vulkan 1.1.80 spec says:
"viewMask has the same effect for the described subpass as
VkRenderPassMultiviewCreateInfo::pViewMasks has on each
corresponding subpass."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The current code is buggy: if there are only 12 dwords left in cbuf,
we emit a zero data length command which will be rejected by virglrenderer.
Fix it by calling flush in this case.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dave Airlie <airlied@redhat.com>
This allows us to implement glMinSampleShading correctly, which up
until now just got ignored.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
When handling loops in constant propagation, implement the "FINISHME"
comment like copy propagation: perform a first pass to find values
that can't be propagated, then perform a second pass with the ACP
containing still valid values.
Certain values are killed because the loop may run more than one
iteration, so we can't copy propagate them as they would be invalid in
the later iterations.
Reviewed-by: Eric Anholt <eric@anholt.net>
When handling 'if' in constant propagation, if a certain variable was
killed when processing the first branch of the 'if', then the second
would get any propagation from previous nodes. This is similar to the
change done for copy propagation code.
x = 1;
if (...) {
z = x; // This would turn into z = 1.
x = 22; // x gets killed.
} else {
w = x; // This would NOT turn into w = 1.
}
With the change, we let constant propagation happen independently in
the two branches and only then apply the killed values for the
subsequent code.
The new code use a single hash table for keeping the kills of both
branches (the branches only write to it), and it gets deleted after we
use -- instead of waiting for mem_ctx to collect it.
NIR deals well with constant propagation, so it already covered for
the missing ones that this patch fixes.
Reviewed-by: Eric Anholt <eric@anholt.net>
total instructions in shared programs: 98578 -> 98119 (-0.47%)
instructions in affected programs: 27571 -> 27112 (-1.66%)
and it also eliminates most spills/fills on the CTS's randomized uniform
usage testcases.
SNB doesn't have a definition of 3DSTATE_CONSTANT_BODY, thats
why we got segmentation fault when used INTEL_DEBUG=bat.
Fixed by adding of 3DSTATE_CONSTANT_BODY into 3DSTATE_CONSTANT
of VS, GS and PS structures.
v2: added definition of 3DSTATE_CONSTANT_BODY to the gen6.xml
Fixes: 169d8e011a (intel: Fix 3DSTATE_CONSTANT buffer decoding.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107190
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes the arguments match the (thing, container) pattern used in
other nir_foreach macros and also renames it to make that a bit more
clear.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
For one thing, the NIR opcodes for image load/store always take and
return a vec4 value regardless of the image type. We need to fix up
both the source and destination to handle it. For another thing, we
weren't actually setting up a destination in the OpAtomicLoad case.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: mesa-stable@lists.freedesktop.org
Converting from a switch statement that would not allow intermediate sample counts
to use an if-else chain went a bit wrong, so that in some cases the range that
should be inclusive was exclusive and the line for 16 samples was copies wrongly.
v2: elaborate commit message.
Fixes: 91f48cdfe5
virgl: Add support for glGetMultisample
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> (v1)
fixes a couple of packed_pixel CTS tests. No regressions inside a CTS run.
v2: simplify the changes a bit
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
This fixes a nasty hang in Batman: Arkham City which apparently calls
vkCmdClearColorImage on a linear RGB image.
cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
In this case, the surface faking will give us a R8_UNORM surface and we
need to do an sRGB conversion in the shader. Found by inspection.
cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
A while ago, we added a bunch of format conversion helpers; we should
use them instead of hand-rolling sRGB conversions.
Reviewed-by: Eric Anholt <eric@anholt.net>
If you load S and clear Z or vice versa, the clear may get lost. Just
fall back to drawing a quad.
Fixes KHR-GLES3.packed_depth_stencil.verify_read_pixels.depth24_stencil8
These are not unit tests, as they rely on the host's XVMC and some user
configuration. Switch them over to being general installed tools, to fix
unit testing.
Fixes: 22a817af8a ("meson: build gallium xvmc state tracker")
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The current spill code checks in each instruction of an instruction group whether
spilling is needed and if so, it adds spilling for each component as a seperate
instruction and it allocates a new temporary for each component and since it takes
the write mask from the TGSI representation, all components might be written
each time and as a result already written components might be overwritten with
garbage like:
...
y: MOV R9.y, [0x42140000 37].x
t: MOV R8.x, [0x42040000 33].y
...
MEM_SCRATCH WRITE_IND_ACK 0 R9.xy__, @R4.x ES:3
MEM_SCRATCH WRITE_IND_ACK 0 R8.xy__, @R4.x ES:3
...
To resolve this isse accumulate spills to the same memory location so that only one
memory write instruction is emitted for an instruction group that writes up to all
four components.
This fixes updated piglits (see https://patchwork.freedesktop.org/series/46064/):
spec/glsl-1.30/execution
fs-large-local-array-vec2.shader_test
fs-large-local-array-vec3.shader_test
fs-large-local-array-vec4.shader_test
v2: fix some typos and add comment about piglits (Roland Scheidegger)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v1)
Note that the separate stencil miptree now has the same alloc_flag as
the depth component. Only stencil renderbuffers (as opposed to textures)
have BO_ALLOC_BUSY.
v2: Add note about BO_ALLOC_BUSY in message (Topi).
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The current behavior masked two bugs where the flag was not set to true
after modifying the stencil texture. One case was a regression
introduced with commit bdbb527a65 and
another was a bug in the depthstencil mapping code. These have since
been fixed.
To prevent such bugs from being masked in the future, initialize
r8stencil_needs_update to false.
v2: Keep the delayed allocation.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Enable a future patch to create the r8stencil_mt in this function.
v2: Explicitly set etc_format to MESA_FORMAT_NONE (Topi).
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Note that this maintains BO_ALLOC_BUSY for depth renderbuffers, but not
depth textures.
v2: Add note about BO_ALLOC_BUSY in message (Topi).
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This enum constant was introduced to enable blit maps with
intel_miptree_create da2880bea0. Now that
such maps use the more direct make_surface function which allows you to
specify the tiling directly, the constant is no longer being used.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Do this so that we don't have to special case linearly-tiled depth
buffers in miptree_create.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fix the case where stencil writes are enabled on a depth stencil
texture. Found by inspection.
v2: Fix message to allow for depth stencil writes (Topi).
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fixes the regresion introduced with commit
bdbb527a65
"i965: Use ISL for emitting depth/stencil/hiz state on gen6+"
Found by inspection.
Prevents regressing the piglit test, fbo-depth-array stencil-draw, later
on in this series.
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Found by initializing the r8stencil_needs_update to false in
make_separate_stencil_surface.
Prevents regressing the piglit test arb_stencil_texturing-draw, later on
in the series.
Cc: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
There are no fixed sized array arguments in C, those are simply pointers
to unsized arrays and as the size is passed in anyway, just rely on that.
where possible calls are replaced by nir_channel and nir_channels.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Drop an if statement whose predicate never evaluates to true. row_pitch
belongs to a surface with non-linear tiling. According to
isl_calc_tiled_min_row_pitch, the pitch is a multiple of the tile width.
By looking at isl_tiling_get_info, we see that non-linear tilings have
widths greater than or equal to 128B.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
It wasn't causing problems since there's nothing to delete, but better
be consistent with the rest of existing codebase.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The only value in kill_entry is the writemask, which can be stored in
the data pointer of the hash table entry.
Suggested by Eric Anholt.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Since 4654439fdd "glsl: Use hash tables for
opt_constant_propagation() kill sets." uses a hash_table for storing
kill_entries, so the structs can be simplified.
Remove the exec_node from kill_entry since it is not used in an
exec_list anymore.
Remove the 'var' from kill_entry since it is now redundant with the
key of the hash table.
Suggested by Eric Anholt.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
For V3D, the HW will interpolate slightly differently along the shared
edge of the trifan. The conformance tests manage to catch this in the
nearest_consistency_* group. To get interpolation to match, we need the
last vertex of the triangle to be shared.
I first tried implementing draw_rectangle to do triangles instead, but
that was quite a bit (147 lines) of code duplication from u_blitter, and
this seems much simpler and less likely to break as u_blitter changes.
Fixes dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_* on V3D.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
These helpers have been unused, and were definitely not useful since
330d0607ed ("gallium: remove pipe_index_buffer and set_index_buffer")
made it so that they never had an index buffer passed in.
For an upcoming u_blitter change to use these helpers, I have just 6 bytes
of index data, so pass it as user data until a more interesting caller
comes along.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
I had mistakenly used the COHERENT flag, which can only be set when
PERSISTENT is mapped, but isn't always.
Fixes: a2014c2eb9 ("vc4: Simplify the DISCARD_RANGE handling")
All of our other texture arrays will be tiled, but 1D is an array of
raster mappings and we had the wrong value plugged in here. Fixes piglit
getteximage-targets 1D_ARRAY
We were only emitting the RT blend state for RT 0 and only enabling it for
RT 0, when the gallium API for !independent_blend is for rt0's state to
apply to all of them.
Fixes piglit fbo-drawbuffers-blend-add.
We just never set the value that was returned for MSAA mappings (directly
reading back an MSAA framebuffer). Since we're handing back ss_map, it
should be ss_map's stride from our nested transfer.
Fixes piglit /home/anholt/src/piglit/bin/fbo-depthstencil -samples=4
cases.
Reviewed-by: Rob Clark <robdclark@gmail.com>
We created a temporary with box->{width,height} and then tried to map
width,height from a nonzero offset when we meant to just map the whole
temporary.
Fixes segfaults in V3D in dEQP-GLES3.functional.prerequisite.read_pixels
with --deqp-egl-config-name=rgba8888d24s8ms4 and also piglit's read-front
clear-front-first -samples=4
Reviewed-by: Rob Clark <robdclark@gmail.com>
At 232ed89802 "i965/fs: Register allocator
shoudn't use grf127 for sends dest" we didn't take into account the case
of SEND instructions that are not send_from_grf. But since Gen7+ although
the backend still uses MRFs internally for sends they are finally
assigned to a GRFs.
In the case of unspills the backend assigns directly as source its
destination because it is suppose to be available. So we always have a
source-destination overlap. If the reg_allocator assigns registers that
include the grf127 we fail the validation rule that affects Gen8+
"r127 must not be used for return address when there is a src and dest
overlap in send instruction."
So this patch activates the grf127_send_hack_node for Gen8+ and if we
have any register spilled we add interferences to the destination of
the unspill operations.
We also need to avoid that opt_bank_conflicts() optimization, that runs
after the register allocation, doesn't move things around, causing the
grf127 to be used in the condition we were avoiding.
Fixes piglit test tests/spec/arb_compute_shader/linker/bug-93840.shader_test
and some shader-db crashed because of the grf127 validation rule..
v2: make sure that opt_bank_conflicts() optimization doesn't change
the use of grf127. (Caio)
Found by Caio Marcelo de Oliveira Filho
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107193
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends dest"
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Cc: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
It's optional, only implemented by the etnaviv driver so far.
Fixes: 501d0edeca "st/mesa: call resource_changed when binding a
EGLImage to a texture"
Fixes: a37cf630b4 "gallium: add pipe_screen::resource_changed callback
wrappers"
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
While we're at it, add some extensions we missed along the way like the
VK_KHR_maintenanceN extensions.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
we already have this code duplicated and we will need it for the global
group size as well
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
also reorder to match the gl_system_value enum.
It is weird that the STATIC_ASSERT doesn't trigger though.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Now that 'set' can't be NULL because the meta operations no
longer bind a NULL descriptor, the logic can be simplified
a little bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We mostly use the same priority for all buffer objects, so
I don't think that matter much. This should reduce CPU
overhead a little bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When a EGLImage is newly bound to a texture, we need to make sure the
driver is informed that the resource might have changed. Fixes stale
texture content on Etnaviv when binding an existing EGLImage to an
existing texture object.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
VkCreateRenderPass2KHR() is quite similar to VkCreateRenderPass()
but refactoring the code is a bit painful.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the depth function is EQUAL, then we'll only write the depth value
when it already matches what's in the buffer, which is pointless.
Skipping these writes can save bandwidth.
The state tracker can easily take care of this, so all drivers benefit.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In a single call to vk_errorf() in the Android code, the arguments were
swapped. The bug has existed since day one. Chrome OS used to forgive
the warning, but it is now a compilation error.
CC: <mesa-stable@lists.freedesktop.org>
Fixes: 053d4c32 "anv: Implement VK_ANDROID_native_buffer (v9)"
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Changes to vk.xml and anv_entrypoints_gen.py broke the Autotools build
on Android. The changes undef'd the VK_ANDROID_native_buffer entrypoints
in anv_entrypoints.h.
Fix it with CPPFLAGS += -DVK_USE_PLATFORM_ANDROID_KHR.
CC: <mesa-stable@lists.freedesktop.org>
See-Also: 63525ba7 "android: enable VK_ANDROID_native_buffer"
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
If the glsl is something like this:
in vec4 some_input;
interpolateAtCentroid(some_input[idx])
then it now gets generated as if it were:
interpolateAtCentroid(some_input)[idx]
This is necessary because the index will get generated as a series of
nir_bcsel instructions so it would no longer be an input variable. It
is similar to what is done for GLSL in ca63a5ed3e.
Although I can’t find anything explicit in the Vulkan specs to say
this should be allowed, the SPIR-V spec just says “the operand
interpolant must be a pointer to the Input Storage Class”, which I
guess doesn’t rule out any type of pointer to an input.
This was found using the spec/glsl-4.40/execution/fs-interpolateAt*
Piglit tests with the ARB_gl_spirv branch.
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
v2: update after nir_deref_instr land on master. Implemented by
Alejandro Piñeiro. Special thanks to Jason Ekstrand for guidance
at the new nir_deref_instr world.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There are a number of opcode_desc table entries for many of these
unused opcodes. A symbolic opcode enum will be required in a future
commit in order to keep them in the opcode description tables. The
alternative would be to remove the unused opcodes from the opcode
description tables.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes the message length available at the IR level, which should
save some guesswork in a future commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Constructing a descriptor in-place as part of the immediate of an ALU
instruction is no longer supported.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The return value is not used anymore. This allows simplifying the
code slightly, and in addition it should frustrate anybody's attempts
to continue using the obsolete piecemeal approach to construct a
message descriptor in combination with brw_send_indirect_message().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All users of brw_send_indirect_surface_message() should be providing a
full descriptor immediate up front by now, this isn't necessary
anymore.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The current approach of returning a setup instruction where additional
descriptor fields can be specified is still supported in order to keep
things working, but it will be removed later in this series.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This replaces brw_set_message_descriptor() with the composition of
brw_set_desc() and a new inline helper function that packs the common
message descriptor controls into an integer. The goal is to represent
all message descriptors as a 32-bit integer which is written at once
into the instruction, which is more flexible (SENDS anyone?), robust
(see d2eecf0b0b fixing an issue
ultimately caused by some bits of the extended message descriptor
being left undefined) and future-proof than the current approach of
specifying the individual descriptor fields directly into the
instruction.
This approach also seems more self-documenting, since it will allow
removing calls to functions with way too many arguments like
brw_set_*_message() and brw_send_indirect_message(), and instead
provide a single descriptor argument constructed from an appropriate
combination of brw_*_desc() helpers.
Note that because brw_set_message_descriptor() was (conditionally?)
overriding fields of the instruction which strictly speaking weren't
part of the message descriptor, this involves calling
brw_inst_set_sfid() and brw_inst_set_eot() in some cases in addition
to brw_set_desc().
v2: Use SET_BITS macro instead of left shift (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Allows to specify a bitfield based on its upper and lower bounds
instead of a symbolic field definition, kind of what the current
GET_BITS macro is to GET_FIELD.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This introduces helpers that can be used to specify or extract the
whole descriptor of a SEND message instruction at once. Because the
the instruction encoding of these is rather awkward on some
generations using the generic brw_inst.h macros doesn't seem like an
option.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows brw_search_cache to be used to find programs without
causing extra state to be emitted in the case where the program isn't
being made active. (For example, to find the program to save out with
the ARB_get_program_binary interface.)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This might be required because some stages might generate different
programs depending on the other stages in the program. For example,
the i965 driver's tessellation control stage depends on the
tessellation evaluation shader.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We will need to populate the default key for ARB_get_program_binary to
allow us to retrieve the default gen program to store in the program
binary.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Trying to make sure the setup of the default program key is not
dependent on the GL state.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
For saving programs (shader cache; get program binary) it is useful to
set the id to 0, with the stage being a parameter.
For restoring programs it is useful to set the id to the id allocated
to the program at creation time.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We will want to use these for both the disk shader cache, and for the
ARB_get_program_binary.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
brw_program_deserialize_driver_blob will be a more generic form of
brw_program_deserialize_nir. In addition to nir, it will also be able
to extract gen binaries and upload them to the program cache.
In this commit, it continues to only support nir.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This will allow get_program_binary to add the gen program into its
serialization in addition to just the nir program.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The driver may prefer to have a different blob for
ARB_get_program_binary compared to the version saved out for the disk
shader cache.
Since they both use the driver_cache_blob field, we need to always
give the driver the opportunity to fill in the driver_cache_blob when
saving the program binary.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This function is called just before the gl_program::driver_cache_blob
is saved out as part of the gl_program serialization.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Previously the mesa core code would not call to serialize the
driver_cache_blob if it existed. We will update it to always call to
serialize the driver_cache_blob meaning we should avoid re-serializing
it under mesa/state_tracker.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Until now we have assumed that we could skip emitting these barriers
in the general case based on empirical testing and a few assumptions
detailed in a comment in the driver code, however, recent CTS tests
have showed that we actually need them to produce correct behavior.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Looks like I broke intel CI compiles.
Fixes: 6f3aee40f9 (radv: using tls to store llvm related info and speed up compiles (v10))
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Error states coming from actual Vulkan applications tend to have fairly
long command buffers and lots of chained batches. 30 total BOs isn't
nearly enough. This commit bumps it to 256, makes some things use the
actual number of sections instead of the #define, and adds asserts if we
ever go over 256 sections.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Our attempt to restart the loop with the second level batch worked at
one point but got broken at some point. It was too fragile anyway and
we're not likely to have enough secondaries to actually overflow the
stack so we may as well recurse in both cases.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The vtest protocol is pretty simple but also pretty dumb, and
the v1 caps query was fixed size, with no nice way to expand it,
however the server also ignores any command it doesn't understand.
So we can query v2 caps by sending a v2 followed by a v1, if the
v2 is ignored we know it's an old vtest server, and the we get
a v2 answer then we can just read the v1 answer and discard it.
Acked-by: Jakob Bornecrantz <jakob@collabora.com> (sounds good)
CACHE_MODE_SS is not listed in gfxspecs table for user mode
non-privileged registers. So, making any changes from Mesa
will do nothing. Kernel is already setting this bit in
CACHE_MODE_SS register which is saved/restored to/from
the HW context image.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
CACHE_MODE_SS is not listed in gfxspecs table for user mode
non-privileged registers. So, making any changes from Mesa
will do nothing. Kernel is already setting this bit in
CACHE_MODE_SS register which is saved/restored to/from
the HW context image.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Enables SPV_KHR_8bit_storage and VK_KHR_8bit_storage on gen 8+
using the VK_KHR_get_physical_device_properties2 functionality
to expose if the extension is supported or not.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When the destination is a BYTE type allow raw movs
even if the stride is not exact multiple of destination
type and exec type, execution type is Word and its size is 2.
This restriction was only allowing stride==2 destinations
for 8-bit types.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Since Gen8+ Intel PRM states that "r127 must not be used for return
address when there is a src and dest overlap in send instruction."
This patch implements this restriction creating new grf127_send_hack_node
at the register allocator. This node has a fixed assignation to grf127.
For vgrf that are used as destination of send messages we create node
interfereces with the grf127_send_hack_node. So the register allocator
will never assign to these vgrf a register that involves grf127.
If dispatch_width > 8 we don't create these interferences to the because
all instructions have node interferences between sources and destination.
That is enough to avoid the r127 restriction.
This fixes CTS tests that raised this issue as they were executed as SIMD8:
dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom
Shader-db results on Skylake:
total instructions in shared programs: 7686798 -> 7686797 (<.01%)
instructions in affected programs: 301 -> 300 (-0.33%)
helped: 1
HURT: 0
total cycles in shared programs: 337092322 -> 337091919 (<.01%)
cycles in affected programs: 22420415 -> 22420012 (<.01%)
helped: 712
HURT: 588
Shader-db results on Broadwell:
total instructions in shared programs: 7658574 -> 7658625 (<.01%)
instructions in affected programs: 19610 -> 19661 (0.26%)
helped: 3
HURT: 4
total cycles in shared programs: 340694553 -> 340676378 (<.01%)
cycles in affected programs: 24724915 -> 24706740 (-0.07%)
helped: 998
HURT: 916
total spills in shared programs: 4300 -> 4311 (0.26%)
spills in affected programs: 333 -> 344 (3.30%)
helped: 1
HURT: 3
total fills in shared programs: 5370 -> 5378 (0.15%)
fills in affected programs: 274 -> 282 (2.92%)
helped: 1
HURT: 3
v2: Avoid duplicating register classes without grf127. Let's use a node
with a fixed assignation to grf127 and create interferences to send
message vgrf destinations. (Eric Anholt)
v3: Update reference to CTS VK_KHR_8bit_storage failing tests.
(Jose Maria Casanova)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Implement at brw_eu_validate the restriction from Intel Broadwell PRM,
vol 07, section "Instruction Set Reference", subsection "EUISA
Instructions", Send Message (page 990):
"r127 must not be used for return address when there is a src and
dest overlap in send instruction."
v2: Style fixes (Matt Turner)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
This uses the common compiler passes abstraction to help radv
avoid fixed cost compiler overheads. This uses a linked list per
thread stored in thread local storage, with an entry in the list
for each target machine.
This should remove all the fixed overheads setup costs of creating
the pass manager each time.
This takes a demo app time to compile the radv meta shaders on nocache
and exit from 1.7s to 1s. It also has been reported to take the startup
time of uncached shaders on RoTR from 12m24s to 11m35s (Alex)
v2: fix llvm6 build, inline emit function, handle multiple targets
in one thread
v3: rebase and port onto new structure
v4: rename some vars (Bas)
v5: drag all code into radv for now, we can refactor it out later
for radeonsi if we make it shareable
v6: use a bit more C++ in the wrapper
v7: logic bugs fixed so it actually runs again.
v8: rebase on top of radeonsi changes.
v9: drop some C++ headers, cleanup list entry
v10: use pop_back (didn't have enough caffeine)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We were not properly writing page tables when the virtual address
range spans multiple subtrees of the tables.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The logic was duplicated in a pretty gross way, when what we really need
is just a helper function for stuffing the values in the packet. This
will make implementing noperspective easier.
We weren't using the field yet, so it didn't affect anything.
Fixes: c0476d964a ("v3d: Express dithering mode in the same way that the CLIF parser does.")
Java2d opengl pipeline passes NULL piAttribList to
wglCreatePbufferARB(). So skip parsing the attribute list
if it is NULL.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
The implementation of CreateRenderPass2 uses the helpers we broke out in
previous commits. The implementations of the new vkCmd functions just
call the old versions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This makes certain checks a bit easier and means that we don't have
the attachment information duplicated in the attachment list and in
depth_stencil_attachment.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The OpenGL 3.1 specification, table Table 20.41 ("Implementation
Dependent Values"), defines the minimum-maximum value for
MAX_VERTEX_ATTRIB_STRIDE to be 2048.
So we shouldn't enable OpenGL ES 3.1 on implementations where this
isn't the case. Let's add a check for this
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The OpenGL 4.4 specification, table Table 23.55 ("Implementation
Dependent Values"), defines the minimum-maximum value for
MAX_VERTEX_ATTRIB_STRIDE to be 2048.
So we shouldn't enable OpenGL 4.4 on implementations where this isn't
the case. Let's add a check for this.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
OpenGL 4.4 requires a max vertex attrib of 2048 or higher, but
r600 only supports 2047. Technically, this makes it an GL4.3 GPU,
but it's currently exposing GL4.4.
To avoid regressing the GL version supported in the following
patches, let's just lie and pretend like we support 2048. Any
applications using 2048 are already broken anyway.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This helps us to compact original instruction:
mul(8) g3<1>D g6<8,8,1>UD 0x00000006UD { align1 1Q };
So now we emit:
mul(8) g3<1>UD g6<8,8,1>UD 0x00000006UD { align1 1Q compacted };
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The number of enabled descriptors for a given pipeline stage
can be computed at compile time.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
fold_assoc() called from fold_alu_op3() can lower the number of src to 2,
which then leads to an invalid access to n.src[2]->gvalue().
This didn't seem to have caused much harm in the past, but on Fedora 28
it will crash (presumably because -D_GLIBCXX_ASSERTIONS is used, although
with libstdc++ 4.8.5 this didn't do anything, -D_GLIBCXX_DEBUG was
needed to show the issue).
An alternative fix would be to instead call fold_alu_op2() from within
fold_assoc() when the number of src is reduced and return always TRUE
from fold_assoc() in this case, with the only actual difference being
the return value from fold_alu_op3() then. I'm not sure what the return
value actually should be in this case (or whether it even can make a
difference).
https://bugs.freedesktop.org/show_bug.cgi?id=106928
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dave Airlie <airlied@redhat.com>
pthread types are used in some files without explicitely including pthread.h.
This leads to compile errors on Android 7.x nougat-x86
e.g. in src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.h
In file included from external/mesa/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c:31:
In file included from external/mesa/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.h:32:
external/mesa/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.h:52:2: error: unknown type name 'pthread_mutex_t'
pthread_mutex_t global_bo_list_lock;
^
1 error generated.
Including pthread.h explicitely solves the building error
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
phi instructions don't have the same results by simply having the same sources.
They need to be inside the same BasicBlock or share an equal condition
resulting into a path through the shader selecting equal sources as well.
short example:
cond = ...;
const0 = 0;
const1 = 1;
if (cond) {
ssa_1 = const0;
} else {
ssa_2 = const1;
}
ssa_3 = phi ssa_1 ssa_2;
if (!cond) {
ssa_4 = const0;
} else {
ssa_5 = const1;
}
ssa_6 = phi ssa_4 ssa_5;
allthough both phis actually have sources with equal results, merging them
would be wrong due to having a different condition selecting which source to
take.
For now we also stick an assert into GlobalCSE, because it should never end up
having to merge phi instructions.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
total instructions in shared programs : 5804448 -> 5804690 (0.00%)
total gprs used in shared programs : 670065 -> 670065 (0.00%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21068 -> 21068 (0.00%)
local shared gpr inst bytes
helped 0 0 0 5 5
hurt 0 0 0 191 191
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Commit 5fb69daa6076e56b deleted support from nir_print for printing the
texture and sampler indices on texture instructions. This commit just
brings it back as best as we can.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
At the time of commit 7bc6e455e2 (i965: Add support for saturating
immediates.) we thought mixed type saturates would be impossible. We
were only thinking about type converting moves from D to F, for
example. However, type converting moves w/saturate from F to DF are
definitely possible. This change minimally relaxes the restriction to
allow cases that I have been able trigger via piglit tests.
Fixes new piglit tests:
- arb_gpu_shader_fp64/execution/built-in-functions/fs-sign-sat-neg-abs.shader_test
- arb_gpu_shader_fp64/execution/built-in-functions/vs-sign-sat-neg-abs.shader_test
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This is achived by copying the sign(abs(x)) optimization from the FS
backend.
On Gen7 an earlier platforms, this fixes new piglit tests:
- glsl-1.10/execution/vs-sign-neg-abs.shader_test
- glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We could have made this compatible with Python 3 by using:
except Exception as e:
But since none of this code actually uses the exception objects, let's
just drop them entirely.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In Python 2, `print` was a statement, but it became a function in
Python 3.
Using print functions everywhere makes the script compatible with Python
versions >= 2.6, including Python 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
per POSIX, limits.h may define PAGE_SIZE when the value is not indeterminate
v2: just change the variable name, since there's no intended correlation
here between this value and the machine's actual page size.
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
The bug fixed by the previous commit went undetected because extra
stderr messages are not flagged by the CI. Copy the solution from
fs_visitor::nir_emit_instr and mark the default case unreachable.
An alternate solution is to delete the default case so that the compiler
will issue a warning. That may require more work since there are other
(impossible) cases that exist.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Some of the lowering passes, nir_lower_locals_to_regs for example, can
cause some previously live code to be dead. This pass in particular
leaves a bunch of nir_instr_type_deref instructions floating around.
This causes shader-db runs on Gen5 through Haswell to spew tons of
messages like:
VS instruction not yet implemented by NIR->vec4
UnrealEngine4/EffectsCaveDemo/239.shader_test is one shader that
generates these messages. Cleaning up the dead code fixes that.
To verify, I did a shader-db before and after. Even though all the
messages are gone, the results make my brain hurt. :(
Haswell
total cycles in shared programs: 411890163 -> 411891145 (<.01%)
cycles in affected programs: 57016 -> 57998 (1.72%)
helped: 3
HURT: 11
helped stats (abs) min: 2 max: 154 x̄: 96.67 x̃: 134
helped stats (rel) min: 0.08% max: 2.23% x̄: 1.42% x̃: 1.96%
HURT stats (abs) min: 18 max: 686 x̄: 115.64 x̃: 20
HURT stats (rel) min: 0.81% max: 7.12% x̄: 1.87% x̃: 0.93%
95% mean confidence interval for cycles value: -51.39 191.67
95% mean confidence interval for cycles %-change: -0.14% 2.46%
Inconclusive result (value mean confidence interval includes 0).
Ivy Bridge
total cycles in shared programs: 259114802 -> 259115032 (<.01%)
cycles in affected programs: 24034 -> 24264 (0.96%)
helped: 1
HURT: 9
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%
HURT stats (abs) min: 18 max: 48 x̄: 25.78 x̃: 20
HURT stats (rel) min: 0.80% max: 1.94% x̄: 1.08% x̃: 0.80%
95% mean confidence interval for cycles value: 12.42 33.58
95% mean confidence interval for cycles %-change: 0.54% 1.38%
Cycles are HURT.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 5a02ffb733 nir: Rework lower_locals_to_regs to use deref instructions
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
From the ARB_color_buffer_float spec:
35. Should the clamping of fragment shader output gl_FragData[n]
be controlled by the fragment color clamp.
RESOLVED: Since the destination of the FragData is a color
buffer, the fragment color clamp control should apply.
Fixes arb_color_buffer_float-mrt mixed on v3d.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Cleans up the CL of fbo-drawbuffers2-blend a bit. We could do better on
more complicated cases by noticing if multiple RTs have the same blend
state and emitting them in a single packet.
The varying packing would result in st_nir_assign_var_locations() picking
new driver_locations, despite the pipe_stream_output already being set up
for the old driver location. This left the gallium driver with no way to
work back to what varying was referenced by pipe_stream_output.
Fixes these tests on V3D:
dEQP-GLES3.functional.transform_feedback.random.separate.points.3
dEQP-GLES3.functional.transform_feedback.random.separate.points.7
dEQP-GLES3.functional.transform_feedback.random.separate.points.9
dEQP-GLES3.functional.transform_feedback.random.separate.triangles.3
dEQP-GLES3.functional.transform_feedback.random.separate.triangles.8
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Set with_dri from with_gallium when DRI GLX is explicitly configured, as
well as when DRI GLX is chosen automatically.
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
If the given image doesn't enable CMASK, FMASK or DCC that's
useless to flush CB metadata.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
GCC 4.8 fails to compile with "static const", while GCC 8.1
fails to compile with only "static".
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
mesa/src/util/u_queue.c:242:15: error: address of array 'queue->name'
will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion]
Fixes: b238e33bc9 "kutil/queue: add a process name into a thread name"
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
On Python 2, the default JSON separators are ', ' for items and ': ' for
dicts.
On Python 3, the default is the same when no indent is specified, but if
one is (and we do specify one) then the default items separator becomes
',' (the dict separator remains unchanged).
This change explicitly specifies the Python 3 default, which helps
ensuring that the output is identical, whether it was generated by
Python 2 or 3.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
In Python, dictionaries and sets are unordered, and as a result their
is no guarantee that running this script twice will produce the same
output.
Using ordered dicts and explicitly sorting items makes the build more
reproducible, and will make it possible to verify that we're not
breaking anything when we move the build scripts to Python 3.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
For gen8+, write out PPGTT tables in aub files so that full 48-bit
addresses can be serialized.
v2: Fix handling of `end` index in map_ppgtt
v3: Correctly mark GGTT entry as present (Rafael)
Signed-off-by: Scott D Phillips <scott.d.phillips@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
With PPGTT mappings, our aubinator implementation can be quite slow if
we request a buffer that doesn't exist. Instead of doing a PPGTT walk
for invalid addresses (0 lengths), wait until we're sure we want to
decode the data.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
v2: by Lionel
Fix memfd_create compilation issue
Fix pml4 address stored on 32 instead of 64bits
Return no buffer if first ppgtt page is not mapped
v3: Drop additional memfd_create() (Rafael)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We use memfd to store physical pages as they get read/written to and
the GGTT entries translating virtual address to physical pages.
Based on a commit by Scott Phillips.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
This is a simple, invasive, liberally licensed red-black tree
implementation. It's an invasive data structure similar to the
Linux kernel linked-list where the intention is that you embed a
rb_node struct the data structure you intend to put into the
tree.
The implementation is mostly based on the one in "Introduction to
Algorithms", third edition, by Cormen, Leiserson, Rivest, and
Stein. There were a few other key design points:
* It's an invasive data structure similar to the [Linux kernel
linked list].
* It uses NULL for leaves instead of a sentinel. This means a few
algorithms differ a small bit from the ones in "Introduction to
Algorithms".
* All search operations are inlined so that the compiler can
optimize away the function pointer call.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Now that we're softpinning the address of our BOs in anv & i965, the
addresses selected start at the top of the addressing space. This is a
problem for the current implementation of aubinator which uses only a
40bit mmapped address space.
This change keeps track of all the memory writes from the aub file and
fetch them on request by the batch decoder. As a result we can get rid
of the 1<<40 mmapped address space and only rely on the mmap aub file
\o/
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
On a follow up commit in this series, we stop copying the data from
the mmap'ed file into our big gtt mmap, and start referencing data in
it directly. So reallocating the read buffer and adding more data from
stdin wouldn't work. For that reason, let's stop supporting stdin
process.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Commit f69bc797e1 did the following:
- if format.layout in ('bptc', 'astc'):
+ if format.layout in ('astc'):
The intention was to go from matching either 'bptc' or 'astc' to
matching only 'astc'.
But the new code doesn't respect this intention any more, because in
Python `('astc')` is not a tuple containing a string, it is just the
string. (the parentheses are simply ignored)
That means we now match any substring of 'astc', for example 'a'.
This commit fixes the test to respect the original intention.
Fixes: f69bc797e1 "gallium/auxiliary: Add helper support for
bptc format compress/decompress"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Always emitting a bottom-of-pipe event is quite dumb. Instead,
start to optimize these functions by syncing PFP for the
top-of-pipe and syncing ME for the post-index-fetch event.
This can still be improved by emitting EOS events for
syncing PS and CS stages.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This introduces radv_barrier() (same as the draw/dispatch codepath).
This helper is used for merging the code from CmdWaitEvents() and
CmdPipelineBarrier because it's quite similar.
We do ignore the source stage mask for CmdWaitEvents because
it's irrelevant when event objects are used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Structures might be padded by the compiler and these padding bytes remain
un-initialized which in turn makes memcmp return a difference where from
the logical point of view there is none.
Fixes valgrind:
Conditional jump or move depends on uninitialised value(s)
at 0x4C32CBA: __memcmp_sse4_1 (vg_replace_strmem.c:1099)
by 0xB8D2537: r600_set_vertex_buffers (r600_state_common.c:573)
by 0xB71D44A: u_vbuf_set_driver_vertex_buffers (u_vbuf.c:1129)
by 0xB71F7BB: u_vbuf_draw_vbo (u_vbuf.c:1153)
by 0xB3B92CB: st_draw_vbo (st_draw.c:235)
by 0xB36B1AE: vbo_draw_arrays (vbo_exec_array.c:391)
by 0xB36BB0D: vbo_exec_DrawArrays (vbo_exec_array.c:550)
by 0x10A989: piglit_display (textureSize.c:157)
by 0x4F8F174: run_test (piglit_fbo_framework.c:52)
by 0x4F7BA12: piglit_gl_test_run (piglit-framework-gl.c:229)
by 0x10A60A: main (textureSize.c:71)
Uninitialised value was created by a stack allocation
at 0xB3948FD: st_update_array (st_atom_array.c:388)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Below tests would fail with an error message
"Vertex format (R4G4B4A4|R5G5B5A1) not supported."
Add the formate to the translation routine to enable these formats.
Fixes:
dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgba4_2d
dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgba4_cube
dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgb5_a1_2d
dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgb5_a1_cube
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgba4_2d
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgba4_cube
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb5_a1_2d
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb5_a1_cube
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgba4_2d_array
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgba4_3d
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb5_a1_2d_array
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb5_a1_3d
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgba4_2d_array
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgba4_3d
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb5_a1_2d_array
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb5_a1_3d
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
For a texture that has only one LOD defined, but for which
GL_TEXTURE_MAX_LEVEL is the default (1000) and
GL_TEXTURE_MIN_LOD != GL_TEXTURE_MAX_LOD the reading from the texture does
not properly resolve the LOD level and texture lookup might fail. Hence,
when no mipmap filter is given (indicating that no mip-mapping takes place),
force the LOD range to contain only value.
Fixes:
dEQP-GLES3.functional.shaders.texture_functions.texture*.(i|u)sampler2d*
dEQP-GLES3.functional.texture.format.sized.cube.rgb*
out of VK_GL_CTS/android/cts/master/gles3-master.txt
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
restart_index is later always used in a comparison, so it should be
initialized properly.
Fixes valgrind warning:
Conditional jump or move depends on uninitialised value(s)
at 0xB8D682F: r600_draw_vbo (r600_state_common.c:2153)
by 0xB71F743: u_vbuf_draw_vbo (u_vbuf.c:1156)
by 0xB3B92DB: st_draw_vbo (st_draw.c:235)
by 0xB36B1AE: vbo_draw_arrays (vbo_exec_array.c:391)
by 0xB36BB0D: vbo_exec_DrawArrays (vbo_exec_array.c:550)
by 0x10A989: piglit_display (textureSize.c:157)
by 0x4F8F174: run_test (piglit_fbo_framework.c:52)
by 0x4F7BA12: piglit_gl_test_run (piglit-framework-gl.c:229)
by 0x10A60A: main (textureSize.c:71)
Uninitialised value was created by a stack allocation
at 0xB3B90B0: st_draw_vbo (st_draw.c:143)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Its unlikely anyone will add proper ARB_direct_state_access compat
support before we branch 18.2. Enabling the extension in 4.5 at
least allows users to make use of MESA_GL_VERSION_OVERRIDE=4.5COMPAT
for games like No Mans Sky.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The game forgets to enable multiple extensions in its shaders, one
of those extesions is EXT_texture_array. But enabling this config
entry fixes at least one other rendering issue that enabling
EXT_texture_array on its own doesn't fix.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There is a 15-character limit for thread names shared by the queue name
and process name. Shorten the thread name to make space for the process
name.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Compile times of simple shaders are reduced by ~20%.
Compile times of prologs and epilogs are reduced by up to 40%.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is basically LLVMTargetMachineEmitToMemoryBuffer inlined and reworked.
struct ac_compiler_passes (opaque type) contains the main pass manager.
ac_create_llvm_passes -- the result can go to thread local storage
ac_destroy_llvm_passes -- can be called by a destructor in TLS
ac_compile_module_to_binary -- from LLVMModuleRef to ac_shader_binary
The motivation is to do the expensive call addPassesToEmitFile once
per context or thread.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Changes in v2:
- make loadSuInfo32() protected without making the rest protected
- move NVC0_SU_INFO_* into nv50_ir_lowering_nvc0.h instead of duplicating
NVC0_SU_INFO_MS
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
In 6f5abf3146 this code was fixed to calculate the maximum size of
an attribute in a seperate pass and then allocate the registers to
that size. However this wasn’t taking into account ranges that overlap
but don’t have the same starting location. For example:
layout(location = 0, component = 0) out float a[4];
layout(location = 2, component = 1) out float b[4];
Previously, if ‘a’ was processed first then it would allocate a
register of size 4 for location 0 and it wouldn’t allocate another
register for location 2 because it would already be covered by the
range of 0. Then if something tries to write to b[2] it would try to
write past the end of the register allocated for ‘a’ and it would hit
an assert.
This patch changes it to scan for any overlapping ranges that start
within each range to calculate the maximum extent and allocate that
instead.
Fixed Piglit’s arb_enhanced_layouts/execution/component-layout/
vs-fs-array-interleave-range.shader_test
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: 6f5abf3146 "i965: Fix output register sizes when multiple variables
share a slot."
The current code causes:
/usr/include/c++/8/debug/safe_iterator.h:207:
Error: attempt to copy from a singular iterator.
This is due to the iterators getting invalidated, fix the
reverse iterator to use the return value from erase, and
cast it properly.
(used Mathias suggestion)
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
This ports radv to the shared code, however due to a bug in LLVM
version prior to 7, radv cannot add target info at this stage,
as it would leak one for every shader compile, however I'd prefer
to keep this llvm damage in the shared code, since it isn't the
driver at fault here. We just add a flag to denote if the driver
can support leaking the target info or not, and the common code
does the right thing depending on the llvm version.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We want to share this code with radv in the future, so port
it out of radeonsi.
Add a return value as radv will want that to know if this
succeeds
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
As precursor to moving init to common code, just rename the struct
and move it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This adds a inline always pass, but otherwise should work the
same.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This just splits out the non-shared code and reuses ac_get_llvm_target in radv.
v2: rebase on Marek's patch - fixup brace position/whitespace
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
From the SPIR-V 1.0 specification, section 3.32.18, "Atomic
Instructions":
"OpAtomicIDecrement:
<skip>
The instruction's result is the Original Value."
However, we were implementing it, for uniform atomic counters, as a
pre-decrement operation, as was the one available from GLSL.
Renamed the former nir intrinsic 'atomic_counter_dec*' to
'atomic_counter_pre_dec*' for clarification purposes, as it implements
a pre-decrement operation as specified for GLSL. From GLSL 4.50 spec,
section 8.10, "Atomic Counter Functions":
"uint atomicCounterDecrement (atomic_uint c)
Atomically
1. decrements the counter for c, and
2. returns the value resulting from the decrement operation.
These two steps are done atomically with respect to the atomic
counter functions in this table."
Added a new nir intrinsic 'atomic_counter_post_dec*' which implements
a post-decrement operation as required by SPIR-V.
v2: (Timothy Arceri)
* Add extra spec quotes on commit message
* Use "post" instead "pos" to avoid confusion with "position"
Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is mostly just a straight-forward conversion of
link_assign_atomic_counter_resources to C directly using nir variables
instead of GLSL IR variables.
It is based on the version of link_assign_atomic_counter_resources in
6b8909f2d1. I’m noting this here to make it easier to track changes
and keep the NIR version up-to-date.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Capability that informs if atomic counters are supported. From SPIR-V
1.0 spec, section 3.7, "Storage Class", item 10 from table:
(Column "Storage Class"):
"AtomicCounter For holding atomic counters. Visible across all
functions of the current invocation. Atomic counter-specific
memory."
(Column "Required Capability"):
"AtomicStorage"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is convenient when dealing with atomic counter uniforms. The
alternative would be doing that at vtn_handle_atomics.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
When constructing NIR if we have a SPIR-V uint variable and the
storage class is SpvStorageClassAtomicCounter, we store as NIR's
glsl_type an atomic_uint to reflect the fact that the variable is an
atomic counter.
However, we were tweaking the type only for atomic_uint scalars, we
have to do it as well for atomic_uint arrays and atomic_uint arrays of
arrays of any depth.
Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
v2: update after deref patches got pushed (Alejandro Piñeiro)
v3: simplify repair_atomic_type (suggested by Timothy Arceri, included
on the patch by Alejandro)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
GLSL types differentiates uint from atomic uint. On SPIR-V the type is
uint, and the variable has a specific storage class. So we need to
tweak the type based on the storage class.
Ideally we would like to get the proper type at vtn_handle_type, but
we don't have the storage class at that moment.
We tweak only the nir type, as is the one that really requires it.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Also initialize it on var_decoration_cb
This is equivalent to nir_variable.offset, used to store the location
an atomic counter is stored at.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
ARB_gl_spirv points that uniforms in general need explicit
location. But there are still some cases of uniforms without location,
like for example uniform atomic counters. Those doesn't have a
location from the OpenGL point of view (they are identified with a
binding and offset), but Mesa internally assigns it a location.
Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
v2: squash with another patch, minor variable name tweak (Timothy
Arceri)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This includes:
* Move the defition of empty_uniform_block to linker_util.h
* Move find_empty_block (with a rename) to linker_util.h
* Refactor some code at linker.cpp to a new method at linker_util.h
(link_util_update_empty_uniform_locations)
So all that code could be used by the GLSL linker and the NIR linker
used for ARB_gl_spirv.
v2: include just "ir_uniform.h" (Timothy Arceri)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is the wrong kind of dirty bit. Caught by GCC warnings, due to
64-bit values being truncated to 32 bits.
Fixes: b95b0e2918 (intel/anv,blorp,i965: Implement the SKL 16x MSAA SIMD32 workaround)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The Vulkan API provides a mechanism for applications to cache their own
shaders and manage on-disk pipeline caching themselves. Generally, this
is what I would recommend to application developers and I've resisted
implementing driver-side transparent caching in the Vulkan driver for a
long time. However, not all applications do this and, for some
use-cases, it's just not practical.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Before, we were only hashing the shader if we had a shader cache to
cache things in. This means that if we ever get it wrong, we could end
up trying to cache a shader with an undefined hash. Since not having a
shader cache is an extremely uncommon case, let's optimize for code
clarity and obvious correctness over avoiding a hash operation.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
If a client is dumb enough to not specify a pipeline cache, give it a
default. We have to create one anyway for blorp so we may as well let
the client cache shaders in it.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Previously, we just hashed the entire descriptor set layout verbatim.
This meant that a bunch of extra stuff such as pointers and reference
counts made its way into the cache. It also meant that we weren't
properly hashing in the Y'CbCr conversion information information from
bound immutable samplers.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
According to RenderDoc, this shaves 99.6% of the run time off of the
ambient occlusion pass in Skyrim Special Edition when running under DXVK
and shaves 92% off the runtime for a reasonably representative frame.
When running the actual game, Skyrim goes from being a slide-show to a
very stable and playable framerate on my SKL GT4e machine.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This pass searches for reasonably large local variables which can be
statically proven to be constant and moves them into shader constant
data. This is especially useful when large tables are baked into the
shader source code because they can be moved into a UBO by the driver to
reduce register pressure and make indirect access cheaper.
v2 (Jason Ekstrand):
- Use a size/align function to ensure we get the right alignments
- Use the newly added deref offset helpers
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit adds a concept to NIR of having a blob of constant data
associated with a shader. Instead of being a UBO or uniform that can be
manipulated by the client, this constant data considered part of the
shader and remains constant across all invocations of the given shader
until the end of time. To access this constant data from the shader, we
add a new load_constant intrinsic. The intention is that drivers will
eventually lower load_constant intrinsics to load_ubo, load_uniform, or
something similar. Constant data will be used by the optimization pass
in the next commit but this concept may also be useful for OpenCL.
v2 (Jason Ekstrand):
- Rename num_constants to constant_data_size (anholt)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are very similar to the related function in nir_lower_io except
that they don't handle per-vertex or packed things (that could be added,
in theory) and they take a more detailed size/align function pointer.
One day, we should consider switching nir_lower_io over to using the
more detailed size/align functions and then we could make it use these
helpers instead of having its own.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The size and alignment are "natural" in the sense that everything is
aligned to a scalar. This is a bit tighter than std430 where vec3s are
required to be aligned to a vec4.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Jason Ekstrand):
- Rename y to pot_align (Brian)
- Also use ALIGN_POT in build_id.c and slab.c (Brian)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes warning at screen creation. We store our outputs in normal temps
and just emit them to shader I/O at the end, due to our I/O ordering
requirements, so reading "outputs" in NIR is fine.
this will be needed for compatibility profiles
v2: handle tess shaders
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
When building with asserts enabled, we'll end up triggering an assert
in pipe_buffer_map_range down this code-path, due to trying to map
an empty range. Even if we avoid that, we'll trigger another assert
a bit later, because u_vbuf_get_minmax_index returns a min-index of
-1 here, which gets promoted to an unsigned value, and gives us an
out-of-bounds buffer-mapping offset.
Since we can't really have a well-defined min/max range here when
the range is empty anyway, we should just drop this dance in the
first place. After all, no rendering is going to be produced.
This fixes a crash in dEQP-GLES31.functional.draw_indirect.random.0
on VirGL for me.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
After performing a fast-clear eliminate, a FMASK decompress,
or a DCC decompress, we can reset the predicate to FALSE.
With that, the GPU should be able to skip unnecessary color
decompression passes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Performing a DCC decompression pass is currently pretty rare,
but using predication allows the GPU to skip unnecessary passes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Use caps to obtain the multisample sample positions for up to 16
positions and implement the according Gallium interface.
This implemenation (plus its counterpart in virglrenderer) assume that
the fixed sample position are always the same for a given number of samples
over the whole live time of a qemu session. It also assumes that sample
series are only given for 2, 4, 8, and 16 samples, and for intermediate
numbers N of samples the next higher supported set from above list is picked
and the sample positions for the first N samples are returned accordingly.
Fixes (when run on GL host):
dEQP-GLES31.functional.texture.multisample.samples_1.sample_position
dEQP-GLES31.functional.texture.multisample.samples_2.sample_position
dEQP-GLES31.functional.texture.multisample.samples_3.sample_position
dEQP-GLES31.functional.texture.multisample.samples_4.sample_position
dEQP-GLES31.functional.texture.multisample.samples_8.sample_position
dEQP-GLES31.functional.texture.multisample.samples_10.sample_position
dEQP-GLES31.functional.texture.multisample.samples_12.sample_position
dEQP-GLES31.functional.texture.multisample.samples_13.sample_position
dEQP-GLES31.functional.texture.multisample.samples_16.sample_position
v2: remove unrelated chunk (thanks Ilia Mirkin)
v3: - also return positions for intermediate sample counts
- fix unused varible warning
- update description
v4: explain better what this patch assumes and how it handles sample numbers
that are not directly advertised (thanks go to Erik Faye-Lund for making
me aware that this should be documented)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
and PIPE_FORMAT_R8G8B8A8_SRGB, as well.
The reason for this is that when Virgl runs with GLES on the host, it
cannot directly upload textures in BGRA.
So to avoid a conversion step, consider the RGB sRGB formats as well for
this extension.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If the driver doesn't support PIPE_FORMAT_B8G8R8A8_SRGB, fall back to
PIPE_FORMAT_R8G8B8A8_SRGB.
Drivers such as Virgl will have a hard time supporting
PIPE_FORMAT_B8G8R8A8_SRGB when the host runs GLES, as GL_BGRA isn't as
well suported there.
So go with PIPE_FORMAT_R8G8B8A8_SRGB so these drivers can avoid a
conversion copy.
v2: Fix typo in commit message
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When Mesa itself implements ETC2 decompression, it currently
decompresses to formats in the GL_BGRA component order.
That can be problematic for drivers which cannot upload the texture data
as GL_BGRA, such as Virgl when it's backed by GLES on the host.
So this commit adds a flag to _mesa_unpack_etc2_format so callers can
specify the optimal component order.
In Gallium's case, it will be requested if the format isn't in
PIPE_FORMAT_B8G8R8A8_SRGB format.
For i965, it will remain GL_BGRA, as before.
v2: * Remove unnecesary include (Emil Velikov)
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We need to add loop terminators to the list in the order we come
across them otherwise if two or more have the same exit condition
we will select that last one rather than the first one even though
its unreachable.
This fix is for simple unrolls where we only have a single exit
point. When unrolling these type of loops the unreachable
terminators and their unreachable branch are removed prior to
unrolling. Because of the logic change we also switch some
list access in the complex unrolling logic to avoid breakage.
Fixes: 6772a17acc ("nir: Add a loop analysis pass")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The spec doesn't explicitly say to generate an error but since
DrawArraysInstanced* and DrawElementsInstanced* do, it makes
sense to do it for these functions also.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The extension is enabled for compat profile but there is currently
no display list support.
I filed a spec bug and it has been agreed that
glDispatchComputeIndirect should generate an INVALID_OPERATION
error when called during display list compilation.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
this is required for Drivers which don't allow reading from outputs.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
We were overlapping it with the threadable/nan flags, resulting in
incorrect relocations (threadable/nan included in the offset) and wrong
ordering in the CLIF files.
The XML ends up noisier if you're only looking at one version, but from
the diffstat there's obvious wins in terms of deduplication. This will
get even more significant if we ever support 3.2 or 4.0.
The XML zipper wants one XML per version for filling out its tables, but
we want to do more than one GPU version per XML now. Assume that the
"gen" field will be the same as min_ver and look up our XML text assuming
that they're listed in increasing min_ver.
It turns out that most V3D versions change very few packets, so keeping
separate copies of the XML per version makes changing the XML a pain as
you have to replicate your changes to each one. This is the start of
changing it so that one XML can generate headers for multiple versions.
Running VK-CTS in batch execution mode was raising the
VK_ERROR_INITIALIZATION_FAILED error in multiple tests. But when the
same failing tests were run isolated they always passed.
createDevice and destroyDevice were called before and after every
tests. Because the binding_table_pool was never closed, we reached the
maximum number of open file descriptors (ulimit -n) and when that
happened every call to createDevice implied a
VK_ERROR_INITIALIZATION_FAILED error.
Fixes: c7db0ed4e9
("anv: Use a separate pool for binding tables when soft pinning")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Having this if statement here prevented the next if statement from being
reached in the case of image stores, which is needed for instructions with
indirect bindless handles like "STORE TEMP[ADDR[2].x+1](1) ...".
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
There is a parallel make build issue in src/egl/drivers/dri2/
for wayland builds. Can be reproduced with:
$ rm src/egl/drivers/dri2/*.h src/egl/drivers/dri2/platform_wayland.lo
$ make -C src/egl/ drivers/dri2/platform_wayland.lo
../../../mesa-18.1.2/src/egl/drivers/dri2/platform_wayland.c:50:10: fatal error: linux-dmabuf-unstable-v1-client-protocol.h: No such file or directory
This patch adds the missing dependency.
Fixes: 02cc359372 "egl/wayland: Use linux-dmabuf interface for buffers"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
[Eric: fixed up the commit title]
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Add support for glsl 'invariant' modifier for output data declarations.
Gallium drivers that use TGSI serialization currently loose invariant
modifiers in glsl shaders.
v2: use boolean for invariant instead of unsigned.
Tested: chromiumos on qemu with virglrenderer.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
There's no reason for us to emit it a pile of times and then have a
whole pass to clean it up. Just emit it once like we really want.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This generalizes the unlit centroid workaround so it's less code and now
supports SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2 (Jason Ekstrand):
- Disallow gl_SampleId in SIMD32 on gen7
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
And handle 32-wide payload register reads in fetch_payload_reg().
v2 (Jason Ekstrand);
- Fix some whitespace and brace placement
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
While we're here, we change to using horiz_offset() instead of abusing
half().
v2 (Jason Ekstrand):
- Use horiz_offset() instead of half()
Reviewed-by: Matt Turner <mattst88@gmail.com>
The original code manually handled splitting the MOVs to 8-wide to
handle various regioning restrictions. Now that we have a SIMD width
splitting pass that handles these things, we can just emit everything at
the full width and let the SIMD splitting pass handle it. We also now
have a useful "subscript" helper which is designed exactly for the case
where you want to take a W type and read it as a vector of Bs so we may
as well use that too.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On g4x through Sandy Bridge, src1 (the coordinates) of the PLN
instruction is required to be an even register number. When it's odd
(which can happen with SIMD32), we have to emit a LINE+MAC combination
instead. Unfortunately, we can't just fall through to the gen4 case
because the input registers are still set up for PLN which lays out the
four src1 registers differently in SIMD16 than LINE.
v2 (Jason Ekstrand):
- Take advantage of both accumulators and emit LINE LINE MAC MAC
(Based on a patch from Francisco Jerez)
- Unify the gen4 and gen4x-6 cases using a loop
v3 (Jason Ekstrand):
- Don't unify gen4 with gen4x-6 as this turns out to be more fragile
than first thought without reworking the gen4 barycentric coordinate
layout.
Reviewed-by: Matt Turner <mattst88@gmail.com>
When we don't have PLN (gen4 and gen11+), we implement LINTERP as either
LINE+MAC or a pair of MADs. In both cases, the accumulator is written
by the first of the two instructions and read by the second. Even
though the accumulator value isn't actually ever used from a logical
instruction perspective, it is trashed so we need to make the scheduler
aware. Otherwise, the scheduler could end up re-ordering instructions
and putting a LINTERP between another an instruction which writes the
accumulator and another which tries to use that result.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
This reworks INTERPOLATE_AT_PER_SLOT_OFFSET to work more like an ALU
operation and less like a send. This is less code over-all and, as a
side-effect, it now properly handles execution groups and lowering so
SIMD32 support just falls out.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We want consistent behavior in the meaning of the flag_subreg field
between SNB and IVB+.
v2 (Jason Ekstrand):
- Add some extra commentary
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This prevents a crash in some arb_enhanced_layouts tests that would be
caused by the next commit.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Current discard handling requires dedicating the second flag register to
discard. However, control-flow in SIMD32 requires both flag registers
so it's incompatible with the current discard handling. Just don't
support SIMD32+discard for now.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The hardware's control flow logic is 16-wide so we're out of luck
here. We could, in theory, support SIMD32 if we know the control-flow
is uniform but we don't have that information at this point.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Commit 0d905597f fixed an issue with the placement of the zip and unzip
instructions. However, as a side-effect, it reversed the order in which
we were emitting the split instructions so that they went from high
group to low instead of low to high. This is fine for most things like
texture instructions and the like but certain render target writes
really want to be emitted low to high. This commit just switches the
order back around to be low to high.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 0d905597f "intel/fs: Be more explicit about our placement of [un]zip"
The pixel shader dispatch table is kind-of a confusing mess. This adds
some helpers for dealing with it and for easily extracting the correct
data from wm_prog_data.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Doing instruction header setup in the generator is awful for a number
of reasons. For one, we can't schedule the header setup at all. For
another, it means lots of implied writes which the instruction scheduler
and other passes can't properly read about. The second isn't a huge
problem for FB writes since they always happen at the end. We made a
similar change to sampler handling in ff4726077d.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we have the implied header in src[0] for tracking purposes, we
may as well use it in the generator. This makes things a tiny bit more
general.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The FB write opcode on gen4-5 does implied copies from g0 and g1 to the
message payload. With this commit, we start tracking that as part of
the IR by having the FB write read from g0-1.
Reviewed-by: Matt Turner <mattst88@gmail.com>
It doesn't matter since we don't ever run replicated write shaders
through the optimizer but it's good to be complete.
Reviewed-by: Matt Turner <mattst88@gmail.com>
With this commit, things no longer break if NVC0_CB_AUX_TEX_INFO is
changed to anything other than 0x20.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Previously, TargetNVC0::insnCanLoadOffset() returned whether the offset
could be set to a specific value. The IndirectPropagation pass expected
it to return whether the offset could be increased by a specific value,
which is what TargetNV50::insnCanLoadOffset() does.
Fixes: 37b67db6ae
("nvc0/ir: be careful about propagating very large offsets into const load")
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Fix build error after llvm-7.0svn r332881 ("CodeGen: Add a dwo output
file argument to addPassesToEmitFile and hook it up to dwo output.").
CXX rasterizer/jitter/libmesaswr_la-JitManager.lo
rasterizer/jitter/JitManager.cpp:368:93: error: too few arguments to function call, expected at least 4, have 3
pTarget->addPassesToEmitFile(*pMPasses, filestream, TargetMachine::CGFT_AssemblyFile);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Chromium OS uses Autotools and pkg-config when building Mesa for
Android. The gallium drivers were failing to find the headers and
libraries for zlib and Android's libbacktrace.
v2:
- Don't add a check for zlib.pc. configure.ac already checks for
zlib.pc elsewhere. [for tfiga]
- Check for backtrace.pc separately from the other Android libs.
[for tfiga]
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The spec allows adding scalars with a vector or matrix. In this case
the opt was losing swizzle and size information.
This fixes a bug with Doom (2016) shaders.
Fixes: 34ec1a24d6 ("glsl: Optimize (x + y cmp 0) into (x cmp -y).")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The memzone and flags parameters were accidentally flipped in the call
from brw_bo_alloc_tiled_2d.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of encouraging the client to re-create the swapchain and keep
going with an OUT_OF_DATE error, tell the client that further use of
the current surface will not succeed as the associated kernel objects
are no longer valid.
In particular, when a DRM lease is revoked, then the client needs to
get another lease and create a new surface for that.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The always_active_io flag was only set according to the first variable
that got packed in, so NIR io compaction would end up compacting XFB
varyings that shouldn't move at that point.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Right now, we name these fields as "field name minus one" so that your C
code obviously states what the value should be. However, it's easy enough
to handle at the codegen level with another little XML attribute, meaning
less C code and easier-to-read values in CLIF dumping and gdb as well.
(The actual CLIF format for simulator and FPGA replay takes in
pre-minus-one values, so we need it there too).
The Vulkan spec says:
"pipelineBindPoint is a VkPipelineBindPoint indicating whether
the descriptors will be used by graphics pipelines or compute
pipelines. There is a separate set of bind points for each of
graphics and compute, so binding one does not disturb the other."
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Now that SSA values can be derefs and they have special rules, we have
to be a bit more careful about our LCSSA phis. In particular, we need
to clean up in case LCSSA ended up creating a phi node for a deref.
This fixes validation issues with some Vulkan CTS tests with the new
deref instructions.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Ported from RadeonSI.
This appears to fix some random fails with:
dEQP-VK.query_pool.statistics_query.*
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The driver already supports exporting the stencil value.
The following CTS test now pass:
dEQP-VK.pipeline.shader_stencil_export.op_replace
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We can not use the VUE Dereference flags combination for EOT
message under ILK and SNB because the threads are not initialized
there with initial VUE handle unlike Pre-IL.
So to avoid GPU hangs on SNB and ILK we need
to avoid usage of the VUE Dereference flags combination.
(Was tested only on SNB but according to the specification
SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6
the ILK must behave itself in the similar way)
v2: Approach to fix this issue was changed.
Instead of different EOT flags in the program end
we will create VUE every time even if GS produces no output.
v3: Clean up the patch.
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
The radeon winsys isn't linked against the ac code, I have vague
memories of this causing some problems before, for now fix the build
but just duplicating the code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Some fields shouldn't be initialized, like framebuffers_bound and other stats.
It's hopefully complete now.
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
This patch both adds support for probing & filtering DRM nodes
and switches away from using the GRALLOC_MODULE_PERFORM_GET_DRM_FD
gralloc call.
Currently the filtering is based just on the driver name,
and the desired name is supplied using the "drm.gpu.vendor_name"
Android property.
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Maintaining both flink names and prime fd support which are provided by
2 different gralloc implementations is problematic because we have a
dependency on a specific gralloc implementation header.
This mostly disables the dependency on the gralloc implementation and
headers. The dependency on GRALLOC_MODULE_PERFORM_GET_DRM_FD remains for
now, but the definition is added locally to remove the header
dependency.
drm_gralloc support can be enabled by setting
BOARD_USES_DRM_GRALLOC=true in BoardConfig.mk.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
If the driver ends up by performing a slow depthstencil clear,
the HTILE metadata won't be initialized correctly.
This fixes random VM faults on Polaris while running CTS
with Bas's runner. This doesn't seem to regress performance.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For instruction sequences that change the address register with every load
the current limit to bail out of the scheduler and reject the optimisation
was too tight, i.e. it was expected that at least one pending instruction
would be scheduled each time.
Give the scheduler more margin to sort out these load sequences by allowing
a number of rounds where no instruction is scheduled.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106163
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This patch is based on
https://lists.freedesktop.org/archives/mesa-dev/2018-February/185805.html
Dave Airlie:
"A bunch of CTS tests led me to write
tests/shaders/ssa/fs-while-loop-rotate-value.shader_test
which r600/sb always fell over on.
GCM seems to move some of the copies into other basic blocks,
if we don't allow this to happen then it doesn't seem to schedule
them badly.
Everything I've read on SSA/phi copies say they have to happen
in parallel, so keeping them in the same basic block seems like
a good way to keep some of that property."
This patch differs from the one proposed by Dave in that it only adds
the NF_DONT_MOVE flag to copy_move instructions that are created by split_phi*
and that are located in loops.
Fixes piglit: tests/shaders/ssa/fs-while-loop-rotate-value.shader_test
(no regressions in the shader set). It also fixes all failing tests from
dEQP-GLES3.functional.shaders.loops.*
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This was mistakenly exposed, even though we want atomic counters to be
lowered to atomic ops on an SSBO like nearly every other GPU. Which
somehow recently started getting segfaults due to calling a null
pipe->set_hw_atomic_buffers().
Fixes a crash in stk, and probably other things.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This extension provides fences and frame count information to direct
display contexts. It uses new kernel ioctls to provide 64-bits of
vblank sequence and nanosecond resolution.
v2:
Rework fence integration into the driver so that waiting for
any of a mixture of fence types (wsi, driver or syncobjs)
causes the driver to poll, while a list of just syncobjs or
just driver fences will block. When we get syncobjs for wsi
fences, we'll adapt to use them.
v3: Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace between
types and names. Wrap lines to 80 columns.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v4: Adapt to WSI fence API change. It now returns VkResult and
no longer has an option for relative timeouts.
v5: wsi_register_display_event and wsi_register_device_event now
use the default allocator when NULL is provided, so remove the
computation of 'alloc' here.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This extension provides fences and frame count information to direct
display contexts. It uses new kernel ioctls to provide 64-bits of
vblank sequence and nanosecond resolution.
v2: Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace between
types and names. Wrap lines to 80 columns.
Add extension to list in alphabetical order
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v3: Adapt to WSI fence API change. It now returns VkResult and
no longer has an option for relative timeouts.
v4: wsi_register_display_event and wsi_register_device_event now
use the default allocator when NULL is provided, so remove the
computation of 'alloc' here.
v5: use zalloc2 instead of alloc2 for the WSI fence.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This extension provides fences and frame count information to direct
display contexts. It uses new kernel ioctls to provide 64-bits of
vblank sequence and nanosecond resolution.
v2: Remove DRM_CRTC_SEQUENCE_FIRST_PIXEL_OUT flag. This has
been removed from the proposed kernel API.
Add NULL parameter to drmCrtcQueueSequence ioctl as we
don't care what sequence the event was actually queued to.
v3: Adapt to pthread clock switch to MONOTONIC
v4: Fix scope for wsi_display_mode andwsi_display_connector allocs
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
v5: Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace between
types and names. Wrap lines to 80 columns.
Use wsi_rel_to_abs_time helper function to convert relative
timeouts to absolute timeouts without causing overflow.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v6:
Change WSI fence wait function to return VkResult instead of
bool. This makes the meaning of the return value easier to
understand, and allows for the indication of failure.
Also change the WSI fence wait function to take only absolute
timeouts and not provide an option for a relative timeout. No
users wanted relative timeouts, and it's simpler if that option
isn't available.
Terminate the DPMS property loop once we've found the property.
Assert that the fence hasn't already been destroyed in
wsi_display_fence_destroy.
Rearrange the event handler function order in the file to place
routines in an easier to find order.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v7:
Adapt to API changes for surface_get_capabilities
v8:
Use wsi->alloc in register_display_event so that callers
don't have to dig out an allocator for us.
v9:
Fix a few minor formatting issues
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v10:
Use wsi->alloc if none provided in wsi_display_fence_alloc.
Now that drivers are expected to pass the allocator argument
straight through from the application, we need to check those
for NULL everywhere.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Handle the case where the set of fences to wait for is not all of the
same type by either waiting for them sequentially (waitAll), or
polling them until the timer has expired (!waitAll). We hope the
latter case is not common.
While the current code makes sure that it always has fences of only
one type, that will not be true when we add WSI fences. Split out this
refactoring to make merging that clearer.
v2: Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace between
types and names. Wrap lines to 80 columns.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2:
Cast INT64_MAX to uint64_t to make of its use as the maximum
possible timeout clearly unsigned to the reader.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Make anv_wait_for_fences with !waitAll check all fences at least
once, even if the requested timeout has already passed.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Note that this patch needs to come late in the series since this pass
can be run after any pass that damages nir_metadata_loop_analysis.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This deletes support for _var intrinsics and legacy deref chains in
favor of deref instructions. The internals are also reworked a bit to
use deref instructions directly.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No one is currently using so we can make this change irrespective of
driver. We may use it again in i965 so it's best to pretend to keep it
working.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit reworks nir_lower_vars_to_ssa to use deref instructions and
deref paths internally instead of deref chains. We also drop support
for the old load/store/copy_var intrinsics.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes us build the is_direct parameter as the nodes are constructed
rather than as we walk the chain. This will be useful later.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
vc4+vc5 is not really effected by the deref chain to deref instr
conversion, so it no longer needs this pass. For others, now that
all the passes mesa/st uses are using deref instructions, push the
lowering to deref chains back into driver.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We change glsl_to_nir to provide derefs for bot textures and samplers
while we're at it. This makes the lowering much easier since we only
either replace sources or remove them.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To simplify the transition, and make things bisectable, split out a
legacy copy or lower_samplers. This way the i965 and gallium drivers
can independently switch over to deref instructions.
Since the lower_samplers_as_deref pass is only used by gallium drivers,
it can be converted in lock-step with moving the lower_deref_instrs
pass, and so does not need a corresponding _legacy clone.
This legacy pass will be removed in a future commit.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since we had to rewrite the deref walking loop anyway, I took the
opportunity to make it a bit clearer and more efficient. In particular,
in the AoA case, we will now emit one minmax instead of one per array
level.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit removes most of the deref instruction lowering. Instead of
lowering early, we only lower textures and images and we only do so
right before any of the anv image lowering passes.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit completely reworks function calls in NIR. Instead of having
a set of variables for the parameters and return value, nir_call_instr
now has simply has a number of sources which get mapped to load_param
intrinsics inside the functions. It's up to the client API to build an
ABI on top of that. In SPIR-V, out parameters are handled by passing
the result of a deref through as an SSA value and storing to it.
This virtue of this approach can be seen by how much it allows us to
delete from core NIR. In particular, nir_inline_functions gets halved
and goes from a fairly difficult pass to understand in detail to almost
trivial. It also simplifies spirv_to_nir somewhat because NIR functions
never were a good fit for SPIR-V.
Unfortunately, there is no good way to do this without a mega-commit.
Core NIR and SPIR-V have to be changed at the same time. This also
requires changes to anv and radv because nir_inline_functions couldn't
handle deref instructions before this change and can't work without them
after this change.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were only initializing vtn_builder::func for the pre-walk where we
build the CFG. We were only initializing the nir_builder for the later
walk through the instructions even though were were setting b->cursor
for the pre-walk. Let's set both both places so that everything is
consistent. This useful because we handle OpFunctionParameter in the
pre-walk and we're going to need to be able to emit instructions.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, pointers fell into two categories: index/offset for UBOs,
SSBOs, etc. and var + access chain for logical pointers. This commit
adds another logical pointer mode that's deref + access chain.
It's tempting to think that we can just replace variable-based pointers
with deref-based or at least replace the access chain with a deref
chain. Unfortunately, there are a few sticky bits that prevent this:
1) We can't return deref-based pointers from OpVariable because those
opcodes may come outside of a function so there's no place to emit
the deref instructions.
2) We can't always use variable-based pointers because we may not
always know the variable. (We do now, but he upcoming function
rework will take that option away.)
3) We also can't replace the access chain struct with a deref. Due to
the re-ordering we do in order to handle loop continues, the derefs
we would emit as part of OpAccessChain may not dominate their uses.
We normally fix this up with nir_repair_ssa but that generates phi
nodes which we don't want in the middle of our deref chains.
All in all, we have no real better option than to support partial access
chains while also re-emitting the deref instructions on the spot.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Push constants have been a weird edge-case for a while in that they have
explitic offsets but we've been internally building access chains for
them. This mostly works but it means that passing pointers to push
constants through as function arguments is broken. The easy thing to do
for now is to just treat them like UBOs or SSBOs only without a block
index. This does loose a bit of information since we no longer have an
accurate access range and any indirect access will look like it could
read the whole block. Unfortunately, there's not much we can do about
that. Once NIR derefs get a bit more powerful, we can plumb these
through as derefs and be able to reason about them again.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before, we were doing structure splitting in spirv_to_nir.
Unfortunately, this doesn't really work when you think about passing
struct pointers into functions. Doing it later in NIR is a much better
plan.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This adds a concept of "members" to a variable with an interface type.
It allows you to specify the full variable data for each member of the
interface instead of once for the variable. We also add a lowering pass
to lower those variables to a sequence of variables and rewrite all the
derefs accordingly.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should make nir_lower_tex properly handle deref instructions as
well as make it more correct when texture arrays are used and it's
called after lowering samplers to binding table indices.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This pass doesn't handle deref instructions yet. Making it handle both
legacy derefs and deref instructions would be painful. Since it's not
important for correctness, just disable it for now.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit introduces a new nir_deref.h header for helpers that are
less common and really only needed by a few heavy-duty passes. In this
header is a new struct for representing a full deref path which can be
walked in either direction.
v2 (Jason Ekstrand):
- Assert that deref != NULL (Caio)
- Fill _short_path with 0xdeadbeef in debug builds when not used (Caio)
- Make nir_deref_path a typedef (Rob)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Non-intrinsic function handling has never actually been tested and
probably doesn't work. Just get rid of it for now. We can always add
it back in later if it's useful.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be removed at the end of the transition, but add some tracking
plus asserts to help ensure that lowering passes are called at the
correct point (pre or post deref instruction lowering) as passes are
converted and the point where lower_deref_instrs() is called is moved.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit adds a new instruction type to NIR for handling derefs.
Nothing uses it yet but this adds the data structure as well as all of
the code to validate, print, clone, and [de]serialize them.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This moves the switch statement for specific intrinsics above source and
destination validation. We also rework the source and destination
validation to use different bit_size values for each source and/or
destination.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
fix spilling regression introduced by 5428066f5e
this is just a minor mistake done while moving the code out into a new
function. The function contained a loop which might have been terminated
earlier and skipped setting noSpill to 1. After the refactoring it was always
set.
Fixes: 5428066f5e
("nv50/ir: make a copy of tex src if it's referenced multiple times")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
The same as the previous two patches, but for the libva state tracker.
Fixes: 724916c8a8
("meson: dedup gallium-xvmc logic")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This fixes the same problem as the previous patch did for vdpau, but for
xvmc.
Fixes: 724916c8a8
("meson: dedup gallium-xvmc logic")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Currently if vdpau is set to auto, it will be disabled only in cases
where gallium is disabled or the host OS is not supported (mac, haiku,
windows). However on (for example) Linux if libvdpau is not installed
then the build will error because of the unmet dependency. This corrects
auto to do the right thing, and not error if libvdpau is not installed.
Fixes: 992af0a4b8
("meson: dedup gallium-vdpau logic")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
From my point of view, when we aren't able to submit a CS
something terribly wrong happens and we are most likely
going to lost the device.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
the format of the CLEAR_COLOR register doesn't depend on the target format
this fixes clear color when rendering to 32-bit RGBA and 16-bit targets
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@gmail.com>
this patch adds support for a20x, which has some differences with a220:
-no VGT_MAX_VTX_INDX register
-no CLEAR_COLOR register
-set RB_BC_CONTROL in restore (hangs without)
-different CP_DRAW_INDX format
tested with kmscube and glmark2 scenes, on par with a220
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The offset field is 22 bit large.
11 bits are necessary because MaxVertexAttribRelativeOffset = 2047
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@gmail.com>
For a meson -Db_ndebug=true release build on x86_64, reduces text size of
libv3d.a from 53.0k to 51.6k. Inspired by 0d5329d626 ("anv: Disable
__gen_validate_value if NDEBUG is set.")
All of ARB_gpu_shader5 is most certainly not required for GLES 3.1
(most of it is in OES_gpu_shader5 on top of GLES 3.1).
Some of what is required from ARB_gpu_shader5 is provided by
ARB_texture_gather, so check for that. The remaining subset of
ARB_gpu_shader5 doesn't have individual extensions to check for,
but I guess it is unlikely that some driver has all of these
extensions but not, say, integer bitfield manipulation.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Similar to the combined limit for VS+FS, there is an upper limit for
shader size to run from internel memory.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Previously when setting up a uniform it would try to walk the uniform
storage slots and find one that matches the name of the given
variable. However, each variable already has a location which is an
index into the UniformStorage array so we can just directly jump to
the right slot. Some of the variables take up more than one slot so we
still need to calculate how many it uses.
The main reason to do this is to support ARB_gl_spirv because in that
case the uniforms don’t have names so the previous approach won’t
work.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Right now, the BRW linker code assumes nir_variable::name is always
non-NULL, but thanks to ARB_gl_spirv we will soon be linking SPIR-V
programs, and those explicitly require matching uniforms by location.
The name is just a debug hint.
Instead of checking for the name this patch makes it check for
var->num_state_slots on the assumption that everything that had an
internal name also had some state slots. This seems likely because the
two code paths that are taken when the name begins with "gl_" already
have an assert that var->state_slots is not NULL.
v2: simplified, most of it moved to glsl/nir/spirv (Neil Roberts)
v3: check for num_state_slots instead of the name. This is needed
because we do actually have nameless builtins with SPIR-V such as
PatchVerticesIn and we want them to hit the
_mesa_add_state_reference code path (Neil Roberts)
Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Otherwise if the shader is SPIR-V then SamplerUsed won’t have been
initialised yet so it will end up thinking no textures are used. This
was causing a crash later on if nothing causes it to regenerate
TexturesUsed before the next render.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This function is equivalent to the linker.cpp
build_program_resource_list() but will extract the resources from NIR
shaders instead.
For now, only uniforms and program inputs are implemented.
v2: move from compiler/nir to compiler/glsl (Timothy Arceri)
v3: remove support for inputs, that is still WIP (spotted by Timothy
Arceri)
Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
So it could be used by the GLSL and NIR linker.
v2: (Timothy Arceri)
* Moved from compiler to compiler/glsl
* Method renamed to link_util_add_program_resource
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is based on link_uniform_initializers.cpp.
v2: move from compiler/nir to compiler/glsl (Timothy Arceri)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This function will be the entry point for linking the uniforms from
the nir_shader objects associated with the gl_linked_shaders of a
program.
This patch includes initial support for linking uniforms from NIR
shaders. It is tailored for the ARB_gl_spirv needs, and it is far from
complete, but it should handle most cases of uniforms, array
uniforms, structs, samplers and images.
There are some FIXMEs related to specific features that will be
implemented in following patches, like atomic counters, UBOs and
SSBOs.
Also, note that ARB_gl_spirv makes mandatory explicit location for
normal uniforms, so this code only handles uniforms with explicit
location. But there are cases, like uniform atomic counters, that
doesn't have a location from the OpenGL point of view (they have a
binding), but that Mesa assign internally a location. That will be
handled on following patches.
A nir_linker.h file is also added. More NIR-linking related API will
be added in subsequent patches and those will include stuff from Mesa,
so reusing nir.h didn't seem a good idea.
v2: move from compiler/nir to compiler/glsl (Timothy Arceri)
v3: sets var->driver.location if the uniform was found from a previous
stage (Neil Roberts).
Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Linker utilities common to the GLSL IR and NIR linker (the latter to
be used for ARB_gl_spirv).
We need to move it to a new header as the NIR linker doesn't need to
know about ir_variable, and others, included at linker.h.
v2: move from src/compiler to src/compiler/glsl (Timothy Arceri)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
When SpvDecorationBinding is encountered in the SPIR-V source it now
sets explicit_binding on the nir_variable. This will be used to
determine whether to initialise sampler and image uniforms with the
binding value.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
vtn_variable_mode_image and _sampler are instead replaced with
vtn_variable_mode_uniform which encompasses both of them. In the few
places where it was neccessary to distinguish between the two, the
GLSL type of the pointer is used instead.
The main reason to do this is that on OpenGL it is permitted to put
images and samplers into structs and declare a uniform with them. That
means that variables can now have a mix of uniform, sampler and image
modes so picking a single one of those modes for a variable no longer
makes sense.
This fixes OpLoad on a sampler within a struct which was previously
using the variable mode to determine whether it was a sampler or not.
The type of the variable is a struct so it was not being considered to
be uniform mode even though the member being loaded should be sampler
mode.
The previous code appeared to be using var->interface_type as a place
to store the type of the variable without the enclosing array for
images and samplers. I guess this worked because opaque types can not
appear in interfaces so the interface_type is sort of unused. This
patch removes the overloading of var->interface_type and any places
that needed the type without the array can now just deduce it from
var->type.
v2: squash in this patch the changes to anv/nir (Timothy)
Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
They are supported by SPIR-V for ARB_gl_spirv.
v2 (changes on top of Nicolai's original patch):
* Handle UniformConstant storage class for uniforms other than
samplers and images. (Eduardo Lima)
* Handle location decoration also for samplers and images. (Eduardo
Lima)
* Rebase update (spirv_to_nir options added, logging changes, and
others) (Alejandro Piñeiro)
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Vulkan has the concept of separate image and sampler objects in the
SPIR-V code whereas GL conflates them into one. nir_lower_samplers
contains an assert to verify that sampler operand is not being set on
the nir instruction. However when the code comes from spirv_to_nir the
sampler operand is always set. GL_arb_gl_spirv explicitly states that
OpTypeSampler is not supported so it retains the GL behaviour of not
being able to seperate them. Therefore the sampler will always be the
same as the texture. This GL version of the lowering code ignores
instr->sampler and sets instr->sampler_index to the same value as
instr->texture_index. Some other places in the code (such as in
nir_print) assume that once the instruction is lowered then both
instr->texture and instr->sampler will be NULL, so to keep this
behaviour we now set instr->sampler to NULL after ignoring it to fill
in instr->sampler_index.
Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is copied from the corresponding value in ir_variable. The
intention is to eventually use it in a pure-NIR linker.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Since ARB_gl_spirv name reflection can be missing. piglit
shader_runner does several resource checking, so this commit is useful
to get even the more simple piglit tests running without crashing on
SPIR-V mode.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This will be used by the linker code to differentiate between programs
made out of SPIR-V or GLSL shaders.
This was rejected in the past, assuming that it was equivalent to
check for "shProg->_LinkedShaders[stage]->spirv_data != NULL". But:
* At some points of the linking process it would be needed to check
if _LinkerShaders[stage] is present, so the full check would be:
"shProg->_LinkedShaders[stage] != NULL &&
shProg->_LinkedShaders[stage]->spirv_data != NULL"
* Sometimes you would like to do some specific to SPIR-V
independently of the stage, or for any stage. For example, "link
all the uniforms, for all stages". In that case checking for the
flag would be equivalent to iterate all the _LinkedShaders and
check if there is any spirv_data available.
The former makes readibility really worse. Both could be solved by
adding two helpers. But adding a flag seems really more simple and
readable.
v2: added justification for the flag on the commit message (Alejandro)
Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
With the recent rework of converting the shell script to a python one
the check for actual tests was dropped.
Bring that back, since it was explicitly added considering we had a ~2
year period, during which the tests were not run.
v2: use raise Exception() over print() & return false (Dylan)
Fixes: db8cd8e367 ("glcpp/tests: Convert shell scripts to a python
script")
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Bring back the "detection" of the said variables, to allow
standalone execution.
Fixes: db8cd8e367 ("glcpp/tests: Convert shell scripts to a python
script")
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
As of recently both of these have been reworked so they invoke a python
script. At the same time the latter can be executed with the combined
arguments of both scripts.
AKA we no longer need to have them separate.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Virtually every driver that supports ATI_separate_stencil
also supports EXT_stencil_two_side.
Use the latter boolean for both extension. With that in mind we can drop
the explicit true from the drivers and the nasty comment in
compute_version().
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This allows to avoid having to see garbage in Dying Light loading screen
at least, which probably expects Windows/NV behavior of all allocations
being zeroed by default.
Analogous to radv flag with the same name.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Avoids a branch and reduces code size a tiny bit:
text data bss dec hex filename
10804563 398653 2070368 13273584 ca89f0 /tmp/radeonsi_dri.so.old
10804499 398653 2070368 13273520 ca89b0 /tmp/radeonsi_dri.so
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Ported from RadeonSI.
Not sure why this is needed but AMDVLK does something similar.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The sampleMaskIn workaround (b936f4d1ca)
tries to figure out if the shader is running at per-sample frequency, but
there's a typo bug so it will only recognize per-sample linar inputs,
not per-sample perspective ones.
Spotted by Eric Engestrom <eric.engestrom@intel.com>
Fixes: b936f4d1ca0d2ab1e828a "r600: partly fix sampleMaskIn value"
When VK_USE_PLATFORM_XLIB_XRANDR_EXT is defined, vulkan.h includes
X11/extensions/Xrandr.h for the RROutput typedef which is used in
the vkGetRandROutputDisplayEXT interface.
Make sure we have the required header by checking during the build,
and also set CFLAGS to point at the right directory.
We don't need to link against the library as we don't use any
functions from there, so don't add the _LIBS value in the autotools
build.
Signed-off-by: Keith Packard <keithp@keithp.com>
Fixes: dbac8e25f8 "radv: Add EXT_acquire_xlib_display to radv driver [v2]"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
There's a convenient "FTOC" instruction for generating the coverage now,
unlike vc4. This fixes
dEQP-GLES3.functional.multisample.fbo_4_samples.proportionality_alpha_to_coverage
Otherwise, a blit from separate stencil may fail to flush the job that
initialized it, or new drawing could fail to flush a blit reading from
stencil.
Fixes:
dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_basic
dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_scale
dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_stencil_only
dEQP-GLES3.functional.fbo.msaa.2_samples.depth32f_stencil8
dEQP-GLES3.functional.fbo.msaa.4_samples.depth32f_stencil8
We needed to convert from a -errno to a boolean success value. Fixes:
GTF-GLES3.gtf.GL3Tests.sync.sync_functionality_clientwaitsync_flush
GTF-GLES3.gtf.GL3Tests.sync.sync_functionality_clientwaitsync_signaled
As not every (upcoming) backend compiler is happy with
nir_lower_xxx_to_scalar lowerings do them only if the backend
is scalar (and not vec4) based.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
v1 -> v2:
- nv30 is _NOT_ scalar as suggested by Ilia Mirkin.
- Change from a screen cap to a shader cap as suggested
by Eric Anholt.
- radeonsi is scalar as suggested by Marek Olšák.
- Change missing ones to be scalar.
v2 -> v3:
- r600 prefers vec4 as suggested by Marek Olšák.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This extension is required to support EXT_display_control as it offers
a way to query whether the vblank counter is supported.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This extension is required to support EXT_display_control as it offers
a way to query whether the vblank counter is supported.
v2:
Add extension to list in alphabetical order
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This extension is required to support EXT_display_control as it offers a
way to query whether the vblank counter is supported. Internally, it is
implemented using a fake MESA extension which provides a chain-in to
GetSurfaceCapabilities2KHR which contains the one added field. This has
the advantage of reducing number of callbacks needed in the back-ends.
It also means that anything chained into GetSurfaceCapabilities2EXT
through VkSurfaceCapabilities2KHR::pNext so we only need to handle
crawling the pNext chain once per back-end.
Reviewed-by: Keith Packard <keithp@keithp.com>
Change the type of util_cpu_caps::nr_cpus to int because sysconfig
returns a signed value, fixes:
u_cpu_detect.c: In function 'util_cpu_detect':
u_cpu_detect.c:317:30: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
if (util_cpu_caps.nr_cpus == -1)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Only decorate function as noreturn when DEBUG is not defined, because
when compiled in DEBUG mode the function actually executes an int3 and
may return, fixes:
u_debug.c: In function '_debug_assert_fail':
u_debug.c:309:1: warning: 'noreturn' function does return
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
remove "type" from "match_or_expand_immediate64", fixes:
tgsi/tgsi_ureg.c: In function 'match_or_expand_immediate64':
tgsi/tgsi_ureg.c:837:34: warning: unused parameter 'type' [-Wunused-
parameter]
int type,
^~~~
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Integer propagation rules can sometimes be irritating. With
"unsigned x" "x + 1" gets propagated to a signed integer, so explicitely
assign the sum to an unsigned and use that for comaprison.
In file included from tgsi/tgsi_two_side.c:41:0:
tgsi/tgsi_two_side.c: In function 'xform_decl':
./util/u_math.h:660:29: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
#define MAX2( A, B ) ( (A)>(B) ? (A) : (B) )
^
tgsi/tgsi_two_side.c:86:24: note: in expansion of macro 'MAX2'
ts->num_inputs = MAX2(ts->num_inputs, decl->Range.Last + 1);
^~~~
./util/u_math.h:660:40: warning: signed and unsigned type in conditional
expression [-Wsign-compare]
#define MAX2( A, B ) ( (A)>(B) ? (A) : (B) )
^
tgsi/tgsi_two_side.c:86:24: note: in expansion of macro 'MAX2'
ts->num_inputs = MAX2(ts->num_inputs, decl->Range.Last + 1);
^~~~
./util/u_math.h:660:29: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
#define MAX2( A, B ) ( (A)>(B) ? (A) : (B) )
^
tgsi/tgsi_two_side.c:89:23: note: in expansion of macro 'MAX2'
ts->num_temps = MAX2(ts->num_temps, decl->Range.Last + 1);
^~~~
./util/u_math.h:660:40: warning: signed and unsigned type in conditional
expression [-Wsign-compare]
#define MAX2( A, B ) ( (A)>(B) ? (A) : (B) )
^
tgsi/tgsi_two_side.c:89:23: note: in expansion of macro 'MAX2'
ts->num_temps = MAX2(ts->num_temps, decl->Range.Last + 1);
^~~~
tgsi/tgsi_two_side.c: In function 'xform_inst':
tgsi/tgsi_two_side.c:184:45: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
if (inst->Src[i].Register.Index == ts-
>front_color_input[j]) {
^~
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
tgsi/tgsi_ureg.c: In function 'ureg_DECL_sampler':
tgsi/tgsi_ureg.c:721:34: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
if (ureg->sampler[i].Index == nr)
^~
tgsi/tgsi_ureg.c: In function 'match_or_expand_immediate64':
tgsi/tgsi_ureg.c:837:34: warning: unused parameter 'type' [-Wunused-
parameter]
int type,
^~~~
tgsi/tgsi_ureg.c: In function 'emit_decls':
tgsi/tgsi_ureg.c:1821:31: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
if (ureg->properties[i] != ~0)
^~
tgsi/tgsi_ureg.c: In function 'ureg_create_with_screen':
tgsi/tgsi_ureg.c:2193:18: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(ureg->properties); i++)
^
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
tgsi/tgsi_text.c: In function 'parse_identifier':
tgsi/tgsi_text.c:218:16: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
if (i == len - 1)
^~
tgsi/tgsi_text.c: In function 'parse_optional_swizzle':
tgsi/tgsi_text.c:873:21: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
for (i = 0; i < components; i++) {
^
tgsi/tgsi_text.c: In function 'parse_instruction':
tgsi/tgsi_text.c:1103:18: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
for (i = 0; i < info->num_dst + info->num_src + info->is_tex; i++) {
^
tgsi/tgsi_text.c:1118:18: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
else if (i < info->num_dst + info->num_src) {
^
tgsi/tgsi_text.c: In function 'parse_immediate':
tgsi/tgsi_text.c:1660:24: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
for (type = 0; type < ARRAY_SIZE(tgsi_immediate_type_names); ++type)
{
^
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
tgsi/tgsi_lowering.c: In function 'emit_twoside':
tgsi/tgsi_lowering.c:1179:18: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ctx->two_side_colors; i++) {
^
tgsi/tgsi_lowering.c:1208:18: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ctx->two_side_colors; i++) {
^
tgsi/tgsi_lowering.c:1216:18: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ctx->two_side_colors; i++) {
^
tgsi/tgsi_lowering.c: In function 'emit_decls':
tgsi/tgsi_lowering.c:1280:18: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ctx->numtmp; i++) {
^
tgsi/tgsi_lowering.c: In function 'rename_color_inputs':
tgsi/tgsi_lowering.c:1311:28: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
if (src->Index == ctx->two_side_idx[j]) {
^~
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
tgsi/tgsi_lowering.c: In function 'emit_twoside':
tgsi/tgsi_lowering.c:1179:18: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ctx->two_side_colors; i++) {
^
tgsi/tgsi_lowering.c:1208:18: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ctx->two_side_colors; i++) {
^
tgsi/tgsi_lowering.c:1216:18: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ctx->two_side_colors; i++) {
^
tgsi/tgsi_lowering.c: In function 'emit_decls':
tgsi/tgsi_lowering.c:1280:18: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ctx->numtmp; i++) {
^
tgsi/tgsi_lowering.c: In function 'rename_color_inputs':
tgsi/tgsi_lowering.c:1311:28: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
if (src->Index == ctx->two_side_idx[j]) {
^~
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
tgsi/tgsi_build.c: In function 'tgsi_build_full_immediate':
tgsi/tgsi_build.c:622:18: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
for( i = 0; i < full_imm->Immediate.NrTokens - 1; i++ ) {
^
tgsi/tgsi_build.c: In function 'tgsi_build_full_property':
tgsi/tgsi_build.c:1393:18: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
for( i = 0; i < full_prop->Property.NrTokens - 1; i++ ) {
^
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
tgsi/tgsi_exec.c: In function 'exec_tex':
tgsi/tgsi_exec.c:2254:46: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
assert(shadow_ref >= dim && shadow_ref < ARRAY_SIZE(args));
^
./util/u_debug.h:189:30: note: in definition of macro 'debug_assert'
#define debug_assert(expr) ((expr) ? (void)0 : _debug_assert_fail(#expr,
__FILE__, __LINE__, __FUNCTION__))
^~~~
tgsi/tgsi_exec.c:2254:7: note: in expansion of macro 'assert'
assert(shadow_ref >= dim && shadow_ref < ARRAY_SIZE(args));
^~~~~~
tgsi/tgsi_exec.c:2290:23: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
for (i = dim; i < ARRAY_SIZE(args); i++)
^
In file included from ./util/u_memory.h:39:0,
from tgsi/tgsi_exec.c:62:
tgsi/tgsi_exec.c: In function 'exec_lodq':
tgsi/tgsi_exec.c:2357:15: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
assert(dim <= ARRAY_SIZE(coords));
^
./util/u_debug.h:189:30: note: in definition of macro 'debug_assert'
#define debug_assert(expr) ((expr) ? (void)0 : _debug_assert_fail(#expr,
__FILE__, __LINE__, __FUNCTION__))
^~~~
tgsi/tgsi_exec.c:2357:4: note: in expansion of macro 'assert'
assert(dim <= ARRAY_SIZE(coords));
^~~~~~
tgsi/tgsi_exec.c:2363:20: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
for (i = dim; i < ARRAY_SIZE(coords); i++) {
^
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
tgsi/tgsi_aa_point.c:32:0:
tgsi/tgsi_aa_point.c: In Funktion »aa_decl«:
./util/u_math.h:660:29: Comparison between signed and unsigned in
conditional expressions [-Wsign-compare]
#define MAX2( A, B ) ( (A)>(B) ? (A) : (B) )
^
tgsi/tgsi_aa_point.c:76:21: Remark: when substituting of the macro
»MAX2«
ts->num_tmp = MAX2(ts->num_tmp, decl->Range.Last + 1);
^~~~
./util/u_math.h:660:40: Warning: signed and unsigned type in conditional
expression [-Wsign-compare]
#define MAX2( A, B ) ( (A)>(B) ? (A) : (B) )
^
tgsi/tgsi_aa_point.c:76:21: Remark: when substituting of the macro
»MAX2«
ts->num_tmp = MAX2(ts->num_tmp, decl->Range.Last + 1);
^~~~
tgsi/tgsi_aa_point.c: In Funktion »aa_inst«:
tgsi/tgsi_aa_point.c:220:31: Comparison between signed and unsigned in
conditional expressions [-Wsign-compare]
dst->Register.Index == ts->color_out) {
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
tgsi_sanity.c: In function 'iter_instruction':
tgsi_sanity.c:316:29: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
if (ctx->index_of_END != ~0) {
^~
tgsi_sanity.c: In function 'epilog':
tgsi_sanity.c:488:26: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
if (ctx->index_of_END == ~0) {
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
tgsi_parse.c: In function 'tgsi_parse_free':
tgsi_parse.c:54:31: warning: unused parameter 'ctx' [-Wunused-parameter]
struct tgsi_parse_context *ctx )
^~~
tgsi_parse.c: In function 'tgsi_parse_end_of_tokens':
tgsi_parse.c:62:25: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
return ctx->Position >=
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
tgsi_dump.c: In function 'iter_property':
tgsi_dump.c:443:18: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
for (i = 0; i < prop->Property.NrTokens - 1; ++i) {
^
tgsi_dump.c:459:13: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
if (i < prop->Property.NrTokens - 2)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This warning is misleading: When a struct is partially initialized without
assigning to the structure members by name, then the remaining fields
will be zeroed out, and this warning will be issued (if enabled). If, on the
other hand, the partial initialization is done by assigning to named members,
the remaining structure elements may hold random data, but the warning is not
issued. Since in Mesa the first approach to initialize structure elements is
used very often, and it is usually assumed that the remaining elements are
zeroed out, heeding this warning would be counter-productive.
v2: - add -Wno-missing-field-initializers to meson-build
- fix empty line error
(both Eric Engestrom)
v3: * check for -Wmissing-field-initializers warning and then disable it
because gcc and clang always accept -Wno-* (Dylan Baker)
* Also disable this warning for C++
v4: * meson.build add -Wno-missing-field-initializers to
c_args instead of no_override_init_args (Eric Engstrom)
v5: * configure.ac: Correct copy/paste error with CFLAGS/CXXFLAGS
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v2)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
AMDVLK also always uses CACHE_FLUSH_AND_INV_TS_EVENT. The other
workaround is to flush DB metadata after emitting the framebuffer,
but that seems slower.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
A case of making things worse while trying to fix something minor ...
Fixes: ef79457004 "radv: Merge the flush bits of CMASK & DCC clear."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This extension adds the ability to borrow an X RandR output for
temporary use directly by a Vulkan application to the radv driver.
v2:
Simplify addition of VK_USE_PLATFORM_XLIB_XRANDR_KHR to
vulkan_wsi_args
Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This extension adds the ability to borrow an X RandR output for
temporary use directly by a Vulkan application to the anv driver.
v2:
Simplify addition of VK_USE_PLATFORM_XLIB_XRANDR_KHR to
vulkan_wsi_args
Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
v3:
Add extension to list in alphabetical order
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This extension adds the ability to borrow an X RandR output for
temporary use directly by a Vulkan application. For DRM, we use the
Linux resource leasing mechanism.
v2:
Clean up xlib_lease detection
* Use separate temporary '_xlib_lease' variable to hold the
option value to avoid changin the type of a variable.
* Use boolean expressions instead of additional if statements
to compute resulting with_xlib_lease value.
* Simplify addition of VK_USE_PLATFORM_XLIB_XRANDR_KHR to
vulkan_wsi_args
Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Move mode list from wsi_display to wsi_display_connector
Fix scope for wsi_display_mode and wsi_display_connector allocs
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
v3:
Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace
between types and names. Wrap lines to 80 columns.
Explicitly forbid multiple DRM leases. Making the code support
this looks tricky and will require additional thought.
Use xcb_randr_output_t throughout the internals of the
implementation. Convert at the public API
(wsi_get_randr_output_display).
Clean up check for usable active_crtc (possible when only the
desired output is connected to the crtc).
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v4:
Move output resource fetching closer to use in
wsi_display_get_output. This simplifies the error returns in
earlier parts of the code a bit.
Return VK_ERROR_INITIALIZATION_FAILED from
wsi_acquire_xlib_display. Jason says this is the right error
message.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v5:
randr doesn't pass vscan over the wire, so we set vscan to 0
for randr-acquired modes, and test wsi modes for vscan <= 1
when comparing against randr modes.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Add support for the EXT_direct_mode_display extension. This just
provides the vkReleaseDisplayEXT function.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Add support for the EXT_direct_mode_display extension. This just
provides the vkReleaseDisplayEXT function.
v2: Add extension to list in alphabetical order
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Add support for the EXT_direct_mode_display extension. This just
provides the vkReleaseDisplayEXT function.
v2:
Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace
between types and names. Wrap lines to 80 columns.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This adds support for the KHR_display extension to the radv Vulkan
driver. The driver now attempts to open the master DRM node when the
KHR_display extension is requested so that the common winsys code can
perform the necessary operations.
v2:
* Simplify addition of VK_USE_PLATFORM_DISPLAY_KHR to
vulkan_wsi_args
Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
v3:
Adapt to new wsi_device_init API (added display_fd)
v4:
Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace
between types and names. Wrap lines to 80 columns.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v5:
Add vkCreateDisplayModeKHR. This doesn't actually create
new modes, it only looks to see if the requested parameters
matches an existing mode and returns that.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This adds support for the KHR_display extension to the anv Vulkan
driver. The driver now attempts to open the master DRM node when the
KHR_display extension is requested so that the common winsys code can
perform the necessary operations.
v2: Make sure primary fd is usable
When KHR_display is selected, we try to open the primary node
instead of the render node in case the user wants to use
KHR_display for presentation. However, if we're actually going
to end up using RandR leases, then we don't care if the
resulting fd can't be used for display, but the kernel also
prevents us from using it for drawing when someone else has
master.
v3:
Simplify addition of VK_USE_PLATFORM_DISPLAY_KHR to vulkan_wsi_args
Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
v4:
Adapt primary node usage to new wsi_device_init API
v5:
Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace between
types and names. Wrap lines to 80 columns.
Remove spurious MM_PER_PIXEL define
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v6:
Open DRM master before initializing WSI layer.
The DRM master FD is passed to the WSI layer during
initialization, so we need to open the device slightly earlier
in the function.
Close DRM master in device_finish.
Use anv_gem_get_param to detect working master_fd instead of
directly using the ioctl.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v7:
Add vkCreateDisplayModeKHR. This doesn't actually create
new modes, it only looks to see if the requested parameters
matches an existing mode and returns that.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This adds support for the KHR_display extension support to the vulkan
WSI layer. Driver support will be added separately.
v2:
* fix double ;; in wsi_common_display.c
* Move mode list from wsi_display to wsi_display_connector
* Fix scope for wsi_display_mode andwsi_display_connector
allocs
* Switch all allocations to vk_zalloc instead of vk_alloc.
* Fix DRM failure in
wsi_display_get_physical_device_display_properties
When DRM fails, or when we don't have a master fd
(presumably due to application errors), just return 0
properties from this function, which is at least a valid
response.
* Use vk_outarray for all property queries
This is a bit less error-prone than open-coding the same
stuff.
* Remove VK_COMPOSITE_ALPHA_INHERIT_BIT_KHR from surface caps
Until we have multi-plane support, we shouldn't pretend to
have any multi-plane semantics, even if undefined.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
* Simplify addition of VK_USE_PLATFORM_DISPLAY_KHR to
vulkan_wsi_args
Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
v3:
Add separate 'display_fd' and 'render_fd' arguments to
wsi_device_init API. This allows drivers to use different FDs
for the different aspects of the device.
Use largest mode as display size when no preferred mode.
If the display doesn't provide a preferred mode, we'll assume
that the largest supported mode is the "physical size" of the
device and report that.
v4:
Make wsi_image_state enumeration values uppercase.
Follow more common mesa conventions.
Remove 'render_fd' from wsi_device_init API. The
wsi_common_display code doesn't use this fd at all, so stop
passing it in. This avoids any potential confusion over which
fd to use when creating display-relative object handles.
Remove call to wsi_create_prime_image which would never have
been reached as the necessary condition (use_prime_blit) is
never set.
whitespace cleanups in wsi_common_display.c
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Add depth/bpp info to available surface formats. Instead of
hard-coding depth 24 bpp 32 in the drmModeAddFB call, use the
requested format to find suitable values.
Destroy kernel buffers and FBs when swapchain is destroyed. We
were leaking both of these kernel objects across swapchain
destruction.
Note that wsi_display_wait_for_event waits for anything to
happen. wsi_display_wait_for_event is simply a yield so that
the caller can then check to see if the desired state change
has occurred.
Record swapchain failures in chain for later return. If some
asynchronous swapchain activity fails, we need to tell the
application eventually. Record the failure in the swapchain
and report it at the next acquire_next_image or queue_present
call.
Fix error returns from wsi_display_setup_connector. If a
malloc failed, then the result should be
VK_ERROR_OUT_OF_HOST_MEMORY. Otherwise, the associated ioctl
failed and we're either VT switched away, or our lease has
been revoked, in which case we should return
VK_ERROR_OUT_OF_DATE_KHR.
Make sure both sides of if/else brace use matches
Note that we assume drmModeSetCrtc is synchronous. Add a
comment explaining why we can idle any previous displayed
image as soon as the mode set returns.
Note that EACCES from drmModePageFlip means VT inactive. When
vt switched away drmModePageFlip returns EACCES. Poll once a
second waiting until we get some other return value back.
Clean up after alloc failure in
wsi_display_surface_create_swapchain. Destroy any created
images, free the swapchain.
Remove physical_device from wsi_display_init_wsi. We never
need this value, so remove it from the API and from the
internal wsi_display structure.
Use drmModeAddFB2 in wsi_display_image_init. This takes a drm
format instead of depth/bpp, which provides more control over
the format of the data.
v5:
Set the 'currentStackIndex' member of the
VkDisplayPlanePropertiesKHR record to zero, instead of
indexing across all displays. This value is the stack depth of
the plane within an individual display, and as the current
code supports only a single plane per display, should be set
to zero for all elements
Discovered-by: David Mao <David.Mao@amd.com>
v6:
Remove 'platform_display' bits from the build and use the
existing 'platform_drm' instead.
v7:
Ensure VK_ICD_WSI_PLATFORM_MAX is large enough by
setting to VK_ICD_WSI_PLATFORM_DISPLAY + 1
v8:
Simplify wsi_device_init failure from wsi_display_init_wsi
by using the same pattern as the other wsi layers.
Adopt Jason Ekstrand's white space and variable declaration
suggestions. Declare variables at first use, eliminate extra
whitespace between types and names, add list iterator helpers,
switch to lower-case list_ macros.
Respond to Jason's April 8 review:
* Create a function to convert relative to absolute timeouts
to catch overflow issues in one place
* use VK_NULL_HANDLE to clear prop->currentDisplay
* Get rid of available_present_modes array.
* return OUT_OF_DATE_KHR when display_queue_next called after
display has been released.
* Make errors from mode setting fatal in display_queue_next
* Remove duplicate pthread_mutex_init call
* Add wsi_init_pthread_cond_monotonic helper function to
isolate pthread error handling from wsi_display_init_wsi
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v9:
Fix vscan handling by using MAX2(vscan, 1) everywhere. Vscan
can be zero anywhere, which is treated the same as 1.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v10:
Respond to Vulkan CTS failures.
1. Initialize planeReorderPossible in display_properties code
2. Only report connected displays in
get_display_plane_supported_displays
3. Return VK_ERROR_OUT_OF_HOST_MEMORY when pthread cond
initialization fails.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
4. Add vkCreateDisplayModeKHR. This doesn't actually create
new modes, it only looks to see if the requested parameters
matches an existing mode and returns that.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Probably won't be much different in practice, but still wrong.
Fixes Coverity issue 1435002.
Not CC'ing to stable since this is only hit if you enable MSAA
DCC via RADV_DEBUG.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The scratch registers move again in a6xx.. so for post-a4xx let's just
move this into the backend, and move the one place it used to be needed
in core into fd5_emit_ib(). For a6xx we will do similar, calling
emit_marker6() from fd6_emit_ib().
Signed-off-by: Rob Clark <robdclark@gmail.com>
This is kind of a hack, but really the only problem is the
debug_assert() in OUT_RELOC(). But the debug_assert() is
useful to catch real issues. So just add some #ifdef DEBUG
code to filter things out before we hit the assert.
Signed-off-by: Rob Clark <robdclark@gmail.com>
These never got updated in fd_context_all_dirty() so actually trying to
rely on them (in the case of fd5_emit_images()) ends up in some cases
where state is not emitted but should be. Best to just rip this out.
Signed-off-by: Rob Clark <robdclark@gmail.com>
I think this ends up just setting uniform/const memory. But we upload
x/y/z stride differently. At best this is unneeded, at worst it could
possibly clobber other uniform/const memory.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Unlike textures, this doesn't get lowered for us. (Would be nice
if they were.. at least until we are ready to deal w/ indirect
indexing..)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Run this pass late (after opt loop) to move load_const instructions back
into the basic blocks which use the result, in cases where a load_const
is only consumed in a single block.
This helps reduce register usage in cases where the backend driver
cannot lower the load_const to a uniform.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We can have arrays of images or samplers. But I forgot to handle that
case long ago. Suprised no one complained yet.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Save the next person from digging through the code to figure out what
the indirect_mask parameter actually does.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
AMDVLK uses 64 (distributed) and 16 (non-distributed).
radeonsi will use 63 and 16.
* This might improve tessellation performance on Hawaii, Bonaire, Tahiti,
Pitcairn. (they will use 16)
* I'm not sure if this matters for 1 SE configs.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This is the case for the simulator environment, and broke many blitter
tests by trying to texture from linear while the HW can only actually do
UIF/UBLINEAR/LT. Just make a temporary and copy into it with the CPU,
then blit from that.
This is the kind of path that should use the TFU, but I haven't exposed
that hardware yet.
Fixes dEQP-GLES3.functional.fbo.blit.default_framebuffer.*
Some fprintfs were probably left unintentionally a few years ago and are
a bit of a nuisance.
Fixes: 2d3301e4d5 ("virgl: fix reference counting of prime handles")
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The prog->Shaders[i]->IsES check was accidentally removed causing
ES linking rules to be applied to desktop GLSL.
Fixes: 725b1a406d ("mesa/util: add allow_glsl_relaxed_es driconfig override")
This relaxes a number of ES shader restrictions allowing shaders
to follow more desktop GLSL like rules.
This initial implementation relaxes the following:
- allows linking ES shaders with desktop shaders
- allows mismatching precision qualifiers
- always enables standard derivative builtins
These relaxations allow Google Earth VR shaders to compile.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Google Earth VR shaders uses builtins in constant expressions with
GLSL 1.10. That feature wasn't allowed until GLSL 1.20.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Glibc has the same code to get program_invocation_short_name. However
for some reason the short name gets mangled for some wine apps.
For example with Google Earth VR I get:
program_invocation_name:
"/home/tarceri/.local/share/Steam/steamapps/common/EarthVR/Earth.exe"
program_invocation_short_name:
"e"
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
This should fix TF across a glFlush() or TF pause/restart. Fixes
dEQP-GLES3.functional.transform_feedback.array.interleaved.lines.highp_float
and many, many others.
If we are on gen8+ and have context isolation support, just make that
constant buffer address be absolute, so we can use it for push UBOs too.
v2: Do not duplicate constant_buffer_0_is_relative flag (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Drops the number of time we set the scissor by 4x for F1 2017,
which results in a consistent performance improvement of about 4%.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Shouldn't make any functional difference, just that `liblibanv_gen90.a`
will now be called `libanv_gen90.a`.
Fixes: 3218056e0e "meson: Build i965 and dri stack"
Fixes: d1992255bb "meson: Add build Intel "anv" vulkan driver"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
As the previous use of shuffle_32bit_load_result_to_64bit_data
had a source/destination overlap for 64-bit. Now a temporary destination
is used for 64-bit cases to use shuffle_from_32bit_read that doesn't
handle src/dst overlaps.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously, the shuffle function had a source/destination overlap that
needs to be avoided to use shuffle_from_32bit_read. As we can use for
the shuffle destination the destination of removed MOVs.
This change also avoids the internal MOVs done by the previous shuffle
to deal with possible overlaps.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
shuffle_from_32bit_read manages 32-bit reads to 32-bit destination
in the same way that the previous loop so now we just call the new
function for all bitsizes, simplifying also the 64-bit load_input.
v2: Add comment about future 16-bit support (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This implementation avoids two unneeded MOVs for each 64-bit
component. One was done in the old shuffle, to avoid cases of
src/dst overlap but this is not the case. And the removed MOV
was already being being done in the shuffle.
Copy propagation wasn't able to remove them because shuffle
destination values are defined with partial writes because they
have stride == 2.
v2: Reword commit log summary (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
do_untyped_vector_read is used at load_ssbo and load_shared.
The previous MOVs are removed because shuffle_from_32bit_read
can handle storing the shuffle results in the expected destination
just using the proper offset.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Using shuffle_from_32bit_read instead of 16-bit shuffle functions
avoids the need of retype. At the same time new function are
ready for 8-bit type SSBO reads.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
shuffle_from_32bit_read can manage the shuffle/unshuffle needed
for different 8/16/32/64 bit-sizes at VARYING PULL CONSTANT LOAD.
To get the specific component the first_component parameter is used.
In the case of the previous 16-bit shuffle, the shuffle operation was
generating not needed MOVs where its results where never used. This
behaviour passed unnoticed on SIMD16 because dead_code_eliminate
pass removed the generated instructions but for SIMD8 they cound't be
removed because of being partial writes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
These new shuffle functions deal with the shuffle/unshuffle operations
needed for read/write operations using 32-bit components when the
read/written components have a different bit-size (8, 16, 64-bits).
Shuffle from 32-bit to 32-bit becomes a simple MOV.
shuffle_src_to_dst takes care of doing a shuffle when source type is
smaller than destination type and an unshuffle when source type is
bigger than destination. So this new read/write functions just need
to call shuffle_src_to_dst assuming that writes use a 32-bit
destination and reads use a 32-bit source.
As shuffle_for_32bit_write/from_32bit_read components take components
in unit of source/destination types and shuffle_src_to_dst takes units
of the smallest type component, we adjust components and first_component
parameters.
To enable this new functions it is needed than there is no
source/destination overlap in the case of shuffle_from_32bit_read.
That never happens on shuffle_for_32bit_write as it allocates a new
destination register as it was at shuffle_64bit_data_for_32bit_write.
v2: Reword commit log and add comments to explain why first_component
and components parameters are adjusted. (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This new function takes care of shuffle/unshuffle components of a
particular bit-size in components with a different bit-size.
If source type size is smaller than destination type size the operation
needed is a component shuffle. The opposite case would be an unshuffle.
Component units are measured in terms of the smaller type between
source and destination. As we are un/shuffling the smaller components
from/into a bigger one.
The operation allows to skip first_component number of components from
the source.
Shuffle MOVs are retyped using integer types avoiding problems with
denorms and float types if source and destination bitsize is different.
This allows to simplify uses of shuffle functions that are dealing with
these retypes individually.
Now there is a new restriction so source and destination can not overlap
anymore when calling this shuffle function. Following patches that migrate
to use this new function will take care individually of avoiding source
and destination overlaps.
v2: (Jason Ekstrand)
- Rewrite overlap asserts.
- Manage type_sz(src.type) == type_sz(dst.type) case using MOVs
from source to dest. This works for 64-bit to 64-bits
operation that on Gen7 as it doesn't support Q registers.
- Explain that components units are based in the smallest type.
v3: - Fix unshuffle overlap assert (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
nir_ssa_def::parent_instr and nir_src::parent_instr have the same name,
but they mean really different things. I choose to save the next person
the hour+ that I just spent figuring that out. Even now that I know, I
doubt I'd notice in code review that someone typed foo->parent_instr
when they actually meant foo->ssa->parent_instr.
v2: Minor wording tweak in nir_ssa_def::parent_instr. Suggested by
Jason.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
All of the affected shaders are geometry shaders... the same ones from
the similar fs changes.
The "No changes on any other platforms" comment below is not quite
right. Without the previous change to register coalescing, this
optimization caused quite a few regressions in tests that either used
gl_ClipVertex or used different interpolation modes. I observed that
with both patches applied,
glsl-1.10/execution/interpolation/interpolation-none-gl_BackSecondaryColor-smooth-vertex.shader_test
was one instruction shorter. I suspect other shaders would be similarly
affected. Since this is all based on NOS, shader-db does not reflect
it.
Haswell
total instructions in shared programs: 12954955 -> 12954918 (<.01%)
instructions in affected programs: 3603 -> 3566 (-1.03%)
helped: 37
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.21% max: 2.50% x̄: 1.99% x̃: 2.50%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.30% -1.69%
Instructions are helped.
total cycles in shared programs: 410012108 -> 410012098 (<.01%)
cycles in affected programs: 3540 -> 3530 (-0.28%)
helped: 5
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.28% max: 0.28% x̄: 0.28% x̃: 0.28%
95% mean confidence interval for cycles value: -2.00 -2.00
95% mean confidence interval for cycles %-change: -0.28% -0.28%
Cycles are helped.
Ivy Bridge
total instructions in shared programs: 11679387 -> 11679351 (<.01%)
instructions in affected programs: 3292 -> 3256 (-1.09%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.21% max: 2.50% x̄: 2.04% x̃: 2.50%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.34% -1.74%
Instructions are helped.
No changes on any other platforms.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This prevents regressions in a bunch of clipping and interpolation tests
caused by the next patch (i965/vec4: Optimize OR with 0 into a MOV).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
fs_visitor::set_gs_stream_control_data_bits generates some code like
"control_data_bits | stream_id << ((2 * (vertex_count - 1)) % 32)" as
part of EmitVertex. The first time this (dynamically) occurs in the
shader, control_data_bits is zero. Many times we can determine this
statically and various optimizations will collaborate to make one of the
OR operands literal zero.
Converting the OR to a MOV usually allows it to be copy-propagated away.
However, this does not happen in at least some shaders (in the assembly
output of shaders/closed/UnrealEngine4/EffectsCaveDemo/301.shader_test,
search for shl).
All of the affected shaders are geometry shaders.
Broadwell and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14375452 -> 14375413 (<.01%)
instructions in affected programs: 6422 -> 6383 (-0.61%)
helped: 39
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.14% max: 2.56% x̄: 1.91% x̃: 2.56%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.26% -1.57%
Instructions are helped.
total cycles in shared programs: 531981179 -> 531980555 (<.01%)
cycles in affected programs: 27493 -> 26869 (-2.27%)
helped: 39
HURT: 0
helped stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16
helped stats (rel) min: 0.60% max: 7.92% x̄: 5.94% x̃: 7.92%
95% mean confidence interval for cycles value: -16.00 -16.00
95% mean confidence interval for cycles %-change: -6.98% -4.90%
Cycles are helped.
No changes on earlier platforms.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The min/maxes ended up producing a negative clip width/height for
dEQP-GLES3.functional.fragment_ops.scissor.outside_render_line. Just make
sure they stay at 0 (or v3d 3.x's workaround) if that happens.
Fixes simulator assertion failures in
dEQP-GLES3.functional.shaders.texture_functions.texture.samplercubeshadow_bias_fragment
and similar complicated cases.
The docs called this field "uses both center W and centroid W", but
actually it's "do you need center W even if varyings don't obviously call
for it?"
Fixes dEQP-GLES3.functional.shaders.builtin_variable.fragcoord_w
getopt_long flag parameter is an int pointer, so if we use bool to store
those values, when getopt_long writes to one of them, it might end up
overwriting the next one.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We don't enable CMASK for linear surfaces and addrlib only
enables DCC for tiling surfaces.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I don't think that matter much to emit both values and that
makes the code a bit simpler.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's unnecessary to update the fast depth/stencil clear values
if the fast cleared depth/stencil image isn't currently bound.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's unnecessary to update the fast color clear values if the
fast cleared color image isn't currently bound.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
BITSET_FFS(x) macro makes use of ARRAY_SIZE(x) macro which is
defined in util/macro.h. Include it directy to make usage more
straightforward.
Fixes: 692bd4a1ab ("util: replace Elements() with ARRAY_SIZE()")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
I noticed that the generated pkg-config files will include
glx and x11 dependencies even when x11 isn't a selected platform.
This fixes the private libs and was tested by building kmscube
V2:
- check if gallium-xlib is being used for glx
Fixes: 108d257a16 "meson: build libEGL"
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
We need to have the RCL start with EZ enabled, since those undecided draws
had EZ enabled. But we do need to update from UNDECIDED to LT or GT as
necessary still.
Fixes many simulator assertion fails in deqp
fragment_ops/interaction/basic_shader/*
The UBO push analysis pass incorrectly assumed that all values would fit
within a 32B chunk, and only recorded a bit for the 32B chunk containing
the starting offset.
For example, if a UBO contained the following, tightly packed:
vec4 a; // [0, 16)
float b; // [16, 20)
vec4 c; // [20, 36)
then, c would start at offset 20 / 32 = 0 and end at 36 / 32 = 1,
which means that we ought to record two 32B chunks in the bitfield.
Similarly, dvec4s would suffer from the same problem.
v2: Rewrite the accounting, my calculations were wrong.
v3: Write a comment about partial values (requested by Jason).
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v3]
Since SSBOs can be written by a different GPU thread, copy propagating a
read can cause the value to magically change. SSBO reads are also very
expensive, so doing it twice will be slower.
The same shader was helped by this patch and the previous.
Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14399119 -> 14399113 (<.01%)
instructions in affected programs: 683 -> 677 (-0.88%)
helped: 1
HURT: 0
total cycles in shared programs: 532973113 -> 532971865 (<.01%)
cycles in affected programs: 524666 -> 523418 (-0.24%)
helped: 1
HURT: 0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106774
Since SSBOs can be written by other GPU threads, copy propagating a read
can cause the value to magically change. SSBO reads are also very
expensive, so doing it twice will be slower.
Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14399120 -> 14399119 (<.01%)
instructions in affected programs: 684 -> 683 (-0.15%)
helped: 1
HURT: 0
total cycles in shared programs: 532978931 -> 532973113 (<.01%)
cycles in affected programs: 530484 -> 524666 (-1.10%)
helped: 1
HURT: 0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106774
This seems to have been missed in the move from autotools
This fixes the following build issue:
../src/gallium/auxiliary/vl/vl_winsys_dri.c:34:10: fatal error: X11/Xlib-xcb.h: No such file or directory
#include <X11/Xlib-xcb.h>
^~~~~~~~~~~~~~~~
Fixes: b1b65397d0
("meson: Build gallium auxiliary")
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
On GFX8+, there is a bug that affects TC-compatible depth surfaces
when the ZRange is not reset after LateZ kills pixels.
The workaround is to always set DB_Z_INFO.ZRANGE_PRECISION to match
the last fast clear value. Because the value is set to 1 by default,
we only need to update it when clearing Z to 0.0.
We also need to set the depth clear regs and to update
ZRANGE_PRECISION when initializing a TC-compat depth image to 0.
Original patch from James Legg.
This fixes random CTS fails with
dEQP-VK.renderpass.suballocation.formats.d32_sfloat_s8_uint.input.*
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105396
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the application asks for the maximum number of fragment input
components (128), use all of them plus some builtins that are
passed in the VUE, then we exceed the maximum number of used VUE
slots (32) and we break one assert that checks this limit.
Also, with separate shader objects, we add CLIP_DIST0, CLIP_DIST1
builtins in brw_compute_vue_map() because we don't know if
gl_ClipDistance is going to be read/write by an adjacent stage.
Fixes VK-GL-CTS CL#2569.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
They have a different frequency of updates and don't change when scissors
change.
I think this even fixes something in si_update_vs_viewport_state.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This should add all the pieces to enable tess shaders on virgl.
v2: fixup transform to handle tess and strip out precise.
set default for max patch varyings to work around issue when
tess gets enabled from v1 caps but v2 caps aren't in place. (Elie)
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
GLSL 4.60 offically added this but games and older CTS suites actually
had shaders that did this, we may as well enable it everywhere.
Adding stable because it appears apps in the wild do this.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Commit 54ba73ef10 (configure.ac/meson.build: Fix -latomic test) fixed
some checks for -latomic, and then commit 54bbe600ec (configure.ac:
rework -latomic check) further extended the fixes in configure.ac but
not in Meson. This commit extends those fixes to the Meson tests.
Fixes: 54bbe600ec (configure.ac: rework -latomic check)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
meson 0.43 gained support for optional modules, which clover wold like
to use. Since we require 0.44.1 now we can rely on them being available
for clover.
compile tested only.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: - Use -mpower8-vector in compiler test for altivec
- rename altivec option to power8
- reword power8 option description to be more clear, originally I
had made it a boolean, but replaced it with an auto option.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The UBO push analysis pass incorrectly assumed that all values would fit
within a 32B chunk, and only recorded a bit for the 32B chunk containing
the starting offset.
For example, if a UBO contained the following, tightly packed:
vec4 a; // [0, 16)
float b; // [16, 20)
vec4 c; // [20, 36)
then, c would start at offset 20 / 32 = 0 and end at 36 / 32 = 1,
which means that we ought to record two 32B chunks in the bitfield.
Similarly, dvec4s would suffer from the same problem.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Fixes to avoid building error after change in image->planes[] structure,
{bo,bo_offset} has to be replaced by address.{bo,offset}
and update is needed also in the assert() for debug builds.
external/mesa/src/intel/vulkan/anv_android.c:188:21:
error: no member named 'bo' in 'struct anv_image::(anonymous at external/mesa/src/intel/vulkan/anv_private.h:2647:4)'
image->planes[0].bo = bo;
~~~~~~~~~~~~~~~~ ^
1 error generated.
Fixes: bf34ef16ac ("anv: Use an address for each anv_image plane")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Changes to avoid building error:
external/mesa/src/intel/vulkan/anv_android.c:131:72:
error: too few arguments to function call, expected 5, have 4
result = anv_bo_cache_import(device, &device->bo_cache, dma_buf, &bo);
~~~~~~~~~~~~~~~~~~~ ^
1 error generated.
(v2) Set the correct bo_flags based on support of 48bit addresses and soft-pin
Fixes: b0d50247a7 ("anv/allocator: Set the BO flags in bo_cache_alloc/import")
Fixes: e7d0378bd9 ("anv: Soft-pin client-allocated memory")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We were enabling undefined memory checking for genxml values based on
Valgrind being installed at build time, even for release builds. This
generates piles and piles of assembly whenever you touch genxml.
With gcc 7.3.1 and -O3 and -march=native on a Kabylake with Valgrind
installed at build time:
text data bss dec hex filename
5978385 262884 13488 6254757 5f70a5 libvulkan_intel.so
3799377 262884 13488 4075749 3e30e5 libvulkan_intel.so
That's a 36% reduction in text size.
Fixes: 047ed02723 (vk/emit: Use valgrind to validate every packed field)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now that we're using GitLab, let's take advantage of the "landing page"
README feature with some minimal information, mostly to point people to
the right resources.
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
v2: intel_miptree_release() already takes care of the planes, no need
to hand-code the loop (Lionel)
Coverity ID: 1436909
Fixes: 3352f2d746 "i965: Create multiple miptrees for planar YUV images"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
At least for PIPE_BUFFER, we could get the resource used as (for
example) R32F imageBuffer. So using cpp=1 from the rsc is wrong.
Signed-off-by: Rob Clark <robdclark@gmail.com>
In some cases we get plain tex opcodes (but w/ a lod argument).. in this
case always use the saml instruction.
Signed-off-by: Rob Clark <robdclark@gmail.com>
If using a fanin (collect) to collect of consecutive registers together,
we can CP mov's into the fanin, but not (abs) or (neg). No places that
allow those modifiers are consuming a fanin anyways. But this caused an
absneg to be lost between a ldgb and stgb for shaders like:
outputs[n] = abs(input[n])
Signed-off-by: Rob Clark <robdclark@gmail.com>
If we have a fanout (split) meta instruction to split the result of a
vector instruction, propagate the HALF flag back to the original
instruction. Otherwise result ends up in a full precision register
while instruction(s) that use the result look in a half-precision
register.
Signed-off-by: Rob Clark <robdclark@gmail.com>
image reads are handled via tex state, whereas image writes and atomics
are handled via SSBO state block. Previously we were only considering
image write, and not image atomics which also uses the SSBO state block.
Signed-off-by: Rob Clark <robdclark@gmail.com>
If FindProcIndex in egldispatchstubs.c is called with a name that's less than
the first entry in the array, it would end up trying to store an index of -1 in
an unsigned integer, wrap around to 2^32, and then crash when it tries to look
that up.
Change FindProcIndex so that it uses bsearch(3) instead of implementing its own
binary search, like the GLX equivalent FindGLXFunction does.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Workaround for bug in llvm that causes the GPU to hang in presence
of nested loops because there is an exec mask issue. The proper
solution is to fix LLVM but this might require a bunch of work.
This fixes a bunch of GPU hangs that happen with DXVK.
Vega10:
Totals from affected shaders:
SGPRS: 110456 -> 110456 (0.00 %)
VGPRS: 122800 -> 122800 (0.00 %)
Spilled SGPRs: 7478 -> 7478 (0.00 %)
Spilled VGPRs: 36 -> 36 (0.00 %)
Code Size: 9901104 -> 9922928 (0.22 %) bytes
Max Waves: 7143 -> 7143 (0.00 %)
Code size slightly increases because it inserts more branch
instructions but that's expected. I don't see any real performance
changes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105613
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
ZRANGE_PRECISION(1) seems to be the default optimal value, but
it was only set for VI and older chips.
This fixes a rendering issue with Banished through DXVK, and
might fix more than that.
There is still the ZRANGE_PRECISION bug that we need to handle
but that can be fixed later.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2:
An attempt to support SpvExecutionModeStencilRefReplacingEXT's behavior
also follows, with the interpretation to said mode being we prevent
writes to the built-in FragStencilRefEXT variable when the execution
mode isn't set.
v3:
A more cautious reading of 1db44252d0 led
me to a missing change that would stop (what I later discovered were)
GPU hangs on the CTS test written to exercise this.
v4:
Turn FragStencilRefEXT decoration usage without StencilRefReplacingEXT
mode into a warning, instead of trying to make the variable read-only.
If we are to follow the originating extension on GL, the built-in
variable in question should never be readable anyway.
v5/v6: rebases.
v7:
Fix check for gen9 lost in rebase. (Ilia)
Reduce the scope of the bool used to track whether
SpvExecutionModeStencilRefReplacingEXT was used. Was in shader_info,
moved to vtn_builder. (Jason)
v8:
Assert for fragment shader handling StencilRefReplacingEXT execution
mode. (Caio)
Remove warning logic, since an entry point might not have
StencilRefReplacingEXT execution mode, but the global output variable
might still exist for another entry point in the module. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The queue_manager thread can access the images from x11_present_to_x11,
hence this reorder prevents dereferencing of dangling pointers.
Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Fixes: e73d136a02 ("vulkan/wsi/x11: Implement FIFO mode.")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Remembering latest states of registers to eliminate redunant SET_CONTEXT_REG packets
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Remembering latest states of registers to eliminate redunant SET_CONTEXT_REG packets
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Remembering latest states of registers to eliminate redunant SET_CONTEXT_REG packets
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Remembering latest states of registers to eliminate redunant SET_CONTEXT_REG packets
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Remembering latest states of registers to eliminate redunant SET_CONTEXT_REG packets
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Remembering latest states of registers to eliminate redunant SET_CONTEXT_REG packets
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Use of void * in pointer arithmetic is illegal, use char * instead.
Fixes: cf54bd5e83 ("drisw: use shared memory when possible")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Fixes truncation warning in gcc 8.1
Fixes: 8539c9bf31 ("gallium/radeon: add the kernel version into the renderer string")
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes the gcc warning:
snprintf’ output between 26 and 33 bytes into a destination of size 32
Fixes: d5f7ebda3e ("ac: add LLVM build functions for subgroup instrinsics")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The LLVM 6 code reduced it to a non-array call. We need to do that
with the new code too.
This fixes dEQP-VK.glsl.texture_functions.query.texturequerylod.*array* for radv.
Fixes: a9a7993441 "amd/common: use the dimension-aware image intrinsics on LLVM 7+"
Reviewed-by: Dave Airlie <airlied@redhat.com>
This hopefully adds virgl to the correct places and current statuses
of various extensions.
virgl of course relies on two external things
a) host driver that can support the features
b) up to date host virglrenderer library that can support the features.
This list will be maintained as latest (a) + (b) + mesa.
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Gentoo's ebuild system always adds -m32 to the compiler for doing x86_64
-> x86 cross builds, while meson expects it not to do that. This
results in an x86 -> x86 cross build, and assembly gets disabled.
Fixes: 2d62fc0646
("meson: disable x86 asm in fewer cases.")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Not all of the MESA_FORMAT and ISL_FORMAT helpers we use can properly
handle RGBX formats. Also, we don't want to make decisions based on
those in the first place because we can't render to RGBA and we use the
non-sRGB version to determine whether or not to allow CCS_E.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This reworks it to work like query_dma_buf_modifiers and, in particular,
makes it more flexible so that we can disallow a non-static set of
formats.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We follow the same convention as isl_format_get_layout in having two
assertions to ensure that only valid formats are passed in. We also
check against the array size of the table because some valid formats
such as CCS formats will may be past the end of the table. This fixes
some potential out-of-bounds array access even in valid cases.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We add two assertions instead of one because the first assertion that
format != ISL_FORMAT_UNSUPPORTED is more descriptive and checks for a
real but unsupported enumerant while the second ensures that they don't
pass in garbage values. We also update some other helpers to use
isl_format_get_layout instead of using the table directly so that they
get bounds checking too.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This makes the reasoning for why a cross compile is not using asm
clearer (hopefully).
v2: - fix typos
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
There were some places that were calling anv_semaphore_impl_cleanup and
neither deleting the semaphore nor setting the type back to NONE. Just
set it to NONE in impl_cleanup to avoid these issues.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106643
Fixes: 031f57eba "anv: Add a basic implementation of VK_KHX_external..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This isn't strictly necessary, but anyone running Cannonlake will
already have Kernel 4.5 or later, so there's no reason to support
the relocation model on Gen10+.
This will let us avoid dealing with them for new features.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This patch enables soft-pinning of all buffers, allowing us to skip
relocation processing entirely. All systems with full PPGTT and > 4GB
of VMA should gain these benefits. This should be most Gen8+.
Unfortunately, this excludes a few systems:
- Cherryview (only has 32-bit addressing, despite 48-bit pointers)
- Broadwell with a 32-bit kernel
- Anybody running pre-4.5 kernel.
We may enable it for Cherryview in the future, but it would require
some tweaks to the memory zone.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
commit 92f01fc5f9 made i965 start emitting
VF cache invalidates when the high bits of vertex buffers change. But
we were not tracking vertex buffers emitted by BLORP. This was papered
over by a mistake where I emitted VF cache invalidates all the time,
which Chris fixed in commit 3ac5fbadfd.
This patch adds a new hook which allows the driver to track addresses
and request a VF cache invalidate as appropriate.
v2: Make the driver do the PIPE_CONTROL so it can apply workarounds
(caught by Jason Ekstrand). Rebase on anv bug fix.
v3: Don't screw up the boolean (caught by Jason Ekstrand).
Fixes: 92f01fc5f9 ("i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This pass detects potential loop terminators and moves intructions
from the non breaking branch after the if-statement.
This enables both the new opt_if_simplification() pass and loop
unrolling to potentially progress further.
Unexpectedly this change speed up shader-db run times by ~3%
Ivy Bridge shader-db results (all changes in dolphin/ubershaders):
total instructions in shared programs: 9995662 -> 9995338 (-0.00%)
instructions in affected programs: 87845 -> 87521 (-0.37%)
helped: 27
HURT: 0
total cycles in shared programs: 230931495 -> 230925015 (-0.00%)
cycles in affected programs: 56391385 -> 56384905 (-0.01%)
helped: 27
HURT: 0
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
On gen8+, we have to VF cache flush whenever a vertex binding aliases a
previous binding at the same index modulo 4GiB. We deal with this in
Vulkan by ensuring that vertex buffers and the dynamic state (from which
BLORP pulls its vertex buffers) are in the same 4GiB region of the
address space. That doesn't work if we're reading clear colors with the
VF unit. In order to work around this we switch to using MI commands to
copy the clear value into the vertex buffer we allocate for the normal
constant data.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
i965 advertises the 16-bit R and RG formats through
eglQueryDmaBufFormatsEXT but falls over when a client tries to use or
asks more information about such a format because
driImageFormatToGLFormat returns MESA_FORMAT_NONE.
Found by Eero Tamminen.
v2: Add G16R16 formats (Lionel)
v3: Fix G16R16 mapping to mesa format (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106642
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
mesa/st decides whether to update samplers after a program change based on
whether num_textures is nonzero. By not counting samplers in a uniform
struct, we would segfault in
KHR-GLES3.shaders.struct.uniform.sampler_vertex if it was run in the same
context after a non-vertex-shader-uniform testcase (as is the case during
a full conformance run).
v2: Implement using two separate pure functions instead of updating
pointers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This doesn't seem to have done anything to my test results. However,
given that we've still got a class of GPU hangs, following the workarounds
that the closed driver does so that we get the same command sequences
seems like a good idea.
These together get the GLSL 3.00 unorm/snorm pack functions and
MESA_shader_integer operations working.
v2: Fix commit message typo.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is basically the same as the GLSL lowering path.
v2: Fix typo in the link
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is based on the glsl/lower_instructions.cpp implementation, but
should be much more readable.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
ufind_msb is easily expressed in terms of clz, and we can reduce ifind_msb
to that.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
V3D doesn't have opcodes for ibfe/ubfe, so we need to lower similarly to
glsl/lower_instructions.cpp.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If you don't have HW to do bfi, then lowering bitfieldInsert to bfi makes
things harder than keeping the "bits" argument around.
This still uses bfm, but I've added the obvious lowering of bfm if you
need it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Discovered by Roland Scheidegger.
The resource_create code uses GPU memory for PIPE_BIND_CUSTOM, but
malloc'd memory otherwise. Vertex and index buffers should use malloc'd
memory.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
You'd need src/broadcom/cle/ in the -I previously, for srcdir != builddir.
nir was fine at that, but automake didn't have it.
Bugzilla: https://github.com/anholt/mesa/issues/104
We were trying to read twice as many as the X server sent us, which
upset XCB:
[xcb] Too much data requested from _XRead
[xcb] This is most likely caused by a broken X extension library
[xcb] Aborting, sorry about that.
glx-free-context: ../../src/xcb_io.c:732: _XRead: Assertion `!xcb_xlib_too_much_data_requested' failed.
Fixing this takes 3 GLX piglit tests from crash to pass.
Fixes: 0852162950 "glx: Be more tolerant in glXImportContext (v2)"
Reviewed-by: Adam Jackson <ajax@redhat.com>
I'd like to eventually drop support for the confusing "an array of
a single empty string is meant to be interpreted as an empty array", so
let's start by not using it anymore.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
The recent patch
mesa: Remove FLUSH_VERTICES from VAO state changes.
Pending draw calls on immediate mode or display list calls do
not depend on changes of the VAO state. So, remove calls to
FLUSH_VERTICES and flag _NEW_ARRAY as appropriate.
uncovered a problem that non immediate mode draw calls do only
flush outstanding immediate mode draws if FLUSH_UPDATE_CURRENT
is set in ctx->Driver.NeedFlush.
In that case, due to the sequence of _mesa_set_draw_vao commands
we could end up with the VAO from the FLUSH_VERTICES call set
into gl_context::Array._DrawVAO when the array draw is executed.
So the change pulls FLUSH_CURRENT out of _mesa_validate_* calls
into the array draw calls being validated.
The change introduces a new macro FLUSH_FOR_DRAW beside FLUSH_VERTICES
and FLUSH_CURRENT that flushes on changed current attributes as well
as on outstanding immediate mode draw calls. Use FLUSH_FOR_DRAW
in the non immediate mode draw code paths.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106594
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
This is the SSBO analogue to fe0647. User supplied data must
be a multiple of GL_SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT.
This fixes 44 GLES31 tests on airlied@'s GLES31 sketch branches with
Nvidia hardware, but this patch standalone can applied to master. The
alignment restriction on Nvidia is 32, hence the default value.
Example tests:
dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.0
dEQP-GLES31.functional.ssbo.layout.multi_basic_types.single_buffer.std430
v2: Move to a better place in case statement
v3: Rebase
Reviewed-by: Dave Airlie <airlied@redhat.com>
If EXEC_OBJECT_PINNED is set, we don't want to emit any relocations.
We simply want to add the BO to the validation list, and possibly mark
it as writeable. The new brw_use_pinned_bo() interface does just that.
To avoid having to make every caller consider both the relocation and
softpin cases, we make emit_reloc() call brw_use_pinned_bo() when given
a softpinned buffer.
We also can't grow buffers that are softpinned - the mechanism places a
larger BO at the same offset as the original, which requires moving BOs
around in the VMA. With softpin, we only allocate enough VMA for the
original size of the BO.
v2: Assert that BOs aren't pinned if the kernel says we should move them
(feedback from Chris Wilson)
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
This introduces a new fast virtual memory allocator integrated with our
BO cache bucketing. For larger objects, it falls back to the simple
free-list allocator (util_vma).
This puts the allocators in place but doesn't enable softpin yet.
v2:
(feedback from Chris Wilson)
- Check (bo->kflags & EXEC_OBJECT_PINNED) instead of a global flag
- Avoid vma_free(0ull) on the err_free path.
- Only enable if the kernel says we have full PPGTT support
- Make bucketing allocators more resistant to failing to grow arrays
(feedback from Scott Phillips)
- Don't use node after popping it from the list.
- Avoid undefined behavior in canonicalization by reusing new helper
- Comment updates
(feedback from myself)
- Avoid __vma_alloc vs. vma_alloc by making a zero_high_bits helper
to return a non-canonical address with the high bits zeroed.
- Don't shadow loop variable 'i' when destroying things (ugly; worked)
v3:
- Replace zero_high_bits with new common gen_48b_address helper.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
If window system supports Y-tiling but not CCS_E, we currently create an
internal CCS for any window system buffers and then resolve right before
handing it off to X or Wayland. In the case of the single-sampled
shadow of a multi-sampled window system buffer, this is pointless
because the only thing we do with it is use it as a MSAA resolve target
so we do MSAA resolve -> CCS resolve -> hand to the window system.
Instead, just disable CCS for the shadow and then the MSAA resolve will
write uncompressed directly into it. If the window system supports
CCS_E, we will still use CCS_E, we just won't do internal CCS.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of having it be a general "is this a winsys image" boolean, make
it more specific to the actual purpose.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of the state stack that's based on copying a dummy instruction
around, we start using a logical stack of brw_insn_states. This uses a
bit less memory and is way less conceptually bogus.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prior to gen8, the flag [sub]register number is in a different spot on
3src instructions than on other instructions. Starting with Broadwell,
they made it consistent. This commit fixes bugs that occur when a
conditional modifier gets propagated into a 3src instruction such as a
MAD.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of doing a memcpy, this moves us to start with a blank
instruction (memset to zero) and copy the fields over one at a time.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The emitted buffer_subdata/texture_subdata call didn't match the
respective signatures.
v2: Actually emit buffer_subdata call.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
On pre-4.13 kernels, which don't support I915_EXEC_BATCH_FIRST, we move
the validation list entry to the end...but incorrectly left the exec_bo
array alone, causing a mismatch where exec_bos[0] no longer corresponded
with validation_list[0] (and similarly for the last entry).
One example of resulting breakage is that we'd update bo->gtt_offset
based on the wrong buffer. This wreaked total havoc when trying to use
softpin, and likely caused unnecessary relocations in the normal case.
Fixes: 29ba502a4e (i965: Use I915_EXEC_BATCH_FIRST when available.)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
When the i-th target format is set, all previous target formats
must be non-zero to avoid hangs. In other words, without this
if a fragment shader exports mrt0, mrt2 and mrt3, the GPU hangs
because the target format of mrt1 is zero.
This fixes DXVK GPU hangs with "Seven: The Days Long Gone",
"GTA V" and probably more games.
Cc: "18.0" 18.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Otherwise on pre-GFX9, if the constant layout allows both TESS_EVAL and
GEOMETRY shaders, but the PIPELINE has only GEOMETRY, it would return the
GEOMETRY shader for the TESS_EVAL shader.
This would cause the flush_constants code to emit the GEOMETRY constants
to the TESS_EVAL registers and then conclude that it did not need to set
the GEOMETRY shader registers.
Fixes: dfff9fb6f8 "radv: Handle GFX9 merged shaders in radv_flush_constants()"
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This pass turns:
if (cond) {
} else {
do_work();
}
into:
if (!cond) {
do_work();
} else {
}
Here's the vkpipeline-db stats (from affected shaders) on Polaris10:
Totals from affected shaders:
SGPRS: 17272 -> 17296 (0.14 %)
VGPRS: 18712 -> 18740 (0.15 %)
Spilled SGPRs: 1179 -> 1142 (-3.14 %)
Code Size: 1503364 -> 1515176 (0.79 %) bytes
Max Waves: 916 -> 911 (-0.55 %)
This pass only affects Serious Sam 2017 (Vulkan) on my side. The
stats are not really good for now. Some shaders look quite dumb
but this will be improved with further NIR passes, like ifs
combination.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Rename and change the prototype for consistency regarding
nir_tex_instr_is_query(). This function will be used in the
following patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This just separates the reloc list vs. BO set cases and lets us avoid an
allocation if relocs->deps->entries == 0.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Previously, we did this weird thing where we left space and an empty
relocation for use in a hypothetical MI_BATCH_BUFFER_START that would be
added to the secondary later. Then, when it came time to chain it into
the primary, we would back that out and emit an MI_BATCH_BUFFER_START.
This worked well but it was always a bit hacky, fragile and ugly. This
commit instead adds a helper for rewriting the MI_BATCH_BUFFER_START at
the end of an anv_batch_bo and we use that helper for both batch bo list
cloning and handling returns from secondaries. The new helper doesn't
actually modify the batch in any way but instead just adjusts the
relocation as needed.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
The only reason we were calling it in the middle was that one of the
cases for figuring out the secondary command buffer execution type
wanted batch_bo->length which gets set by batch_bo_finish. It's easy
enough to recalculate and now batch_bo_finish is called in a sensible
location.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Now that we've done all that refactoring, addresses are now being
directly written into surface states by ISL and BLORP whenever a BO is
pinned so there's really nothing to do besides enable it.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
It's safer to set them there because we have the opportunity to properly
handle combining flags if a BO is imported more than once.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
References to pinned BOs won't need to be relocated at a later
point, so just write the final value of the reference into the bo
directly.
Add a `set` to the relocation lists for tracking dependencies that
were previously tracked by relocations. When a batch is executed, we
add the referenced pinned BOs to the exec list.
v2: - visit bos from the dependency set in a deterministic order (Jason)
v3: - compar => compare, drat (Jason)
- Reworded commit message, provided by (Jordan)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Soft pinning lets us satisfy the binding table address
requirements without using both sides of a growing state_pool.
If you do use both sides of a state pool, then you need to read
the state pool's center_bo_offset (with the device mutex held) to
know the final offset of relocations that target the state pool
bo.
By having a separate pool for binding tables that only grows in
the forward direction, the center_bo_offset is always 0 and
relocations don't need an update pass to adjust relocations with
the mutex held.
v2: - don't introduce a separate state flag for separate binding tables (Jason)
- replace bo and map accessors with a single binding_table_pool accessor (Jason)
v3: - assert bt_block->offset >= 0 for the separate binding table (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The state_pools reserve virtual address space of the full
BLOCK_POOL_MEMFD_SIZE, but maintain the current behavior of
growing from the middle.
v2: - rename block_pool::offset to block_pool::start_address (Jason)
- assign state pool start_address statically (Jason)
v3: - remove unnecessary bo_flags tampering for the dynamic pool (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Some trivial help now, but it also prevents ~40 regressions caused by
Samuel's "nir: implement the GLSL equivalent of if simplication in
nir_opt_if" patch.
All Gen4+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 14369557 -> 14369555 (<.01%)
instructions in affected programs: 442 -> 440 (-0.45%)
helped: 2
HURT: 0
total cycles in shared programs: 532425772 -> 532425743 (<.01%)
cycles in affected programs: 6086 -> 6057 (-0.48%)
helped: 2
HURT: 0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
d8d18516b0 and 03fb13f646 added some patterns to undo conversions like
(('ior', ('flt', a, b), ('flt', a, c)), ('flt', a, ('fmax', b, c)))
If further optimization cause some of the operands to either be the same
or be constants, undoing the transformation can lead to further savings.
I don't know why these patterns were not added in those patches. I did
not check to see which specific patterns actually helped. I just added
all of them for symmetry. This prevents some loop unrolling regressions
Plane Shift caused by Samuel's "nir: implement the GLSL equivalent of if
simplication in nir_opt_if" patch.
Skylake and Broadwell had similar results. (Skylake shown)
total instructions in shared programs: 14369768 -> 14369557 (<.01%)
instructions in affected programs: 44076 -> 43865 (-0.48%)
helped: 141
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.50 x̃: 1
helped stats (rel) min: 0.07% max: 1.52% x̄: 0.66% x̃: 0.60%
95% mean confidence interval for instructions value: -1.67 -1.32
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.
total cycles in shared programs: 532430629 -> 532425772 (<.01%)
cycles in affected programs: 1170832 -> 1165975 (-0.41%)
helped: 101
HURT: 5
helped stats (abs) min: 1 max: 160 x̄: 48.54 x̃: 32
helped stats (rel) min: <.01% max: 8.49% x̄: 2.76% x̃: 2.03%
HURT stats (abs) min: 2 max: 22 x̄: 9.20 x̃: 4
HURT stats (rel) min: <.01% max: 0.05% x̄: 0.02% x̃: <.01%
95% mean confidence interval for cycles value: -53.64 -38.00
95% mean confidence interval for cycles %-change: -3.06% -2.20%
Cycles are helped.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Implement ir_binop_vector_extract using NIR operations. Based on SPIR-V
to NIR approach.
This fixes:
dEQP-GLES3.functional.shaders.indexing.moredynamic.with_value_from_indexing_expression_fragment
Piglit's glsl-fs-vec4-indexing-8.shader_test
CC: mesa-stable@lists.freedesktop.org
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral <itoral@igalia.com>
Adds suppport for ARB_fragment_shader_interlock. We achieve
the interlock and fragment ordering by issuing a memory fence
via sendc.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This extension provides new GLSL built-in functions
beginInvocationInterlockARB() and endInvocationInterlockARB()
that delimit a critical section of fragment shader code. For
pairs of shader invocations with "overlapping" coverage in a
given pixel, the OpenGL implementation will guarantee that the
critical section of the fragment shader will be executed for
only one fragment at a time.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
After bebe3d626e, b->fail_jump is prepared after vtn_create_builder
which can longjmp(3) to it through its vtx_assert()s. This corrupts
the stack and creates confusing core dumps, so we need to avoid it.
While there, I decided to print the offending values for debugability.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The driver must support at least one of
PIPE_CAP_TGSI_FS_COORD_ORIGIN_UPPER_LEFT
PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT
and one of
PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_HALF_INTEGER
PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_INTEGER
otherwise glsl_to_tgsi will fire an assert.
ORIGIN_UPPER_LEFT is the default convention, and is supported by
all mesa drivers, hence it seems reasonable to always report the caps
to be enabled. On gles ORIGIN_LOWER_LEFT is generally not supported,
so we rely on the caps reported by the host that depend on whether we
run on an GL or an EGL host.
For PIXEL_CENTER it is completely host driver dependend on what is
supported, and since we do not report the actual host driver capabilities
it is best to mark both as supported, this is how it works for a GL
host too.
Fixes:
dEQP-GLES3.functional.shaders.builtin_variable.fragcoord_xyz
dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_1
dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_2
Reviewed-by: Gurchetan Singh <gurcetansingh@chromium.org>
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>
The value returned by tgsi_util_get_texture_coord_dim() does not
account for the sample index. This means image_fetch_coords() will not
fetch it, leading to a null deref in ac_build_image_opcode() which
expects it to be present (the return value of ac_num_coords() *does*
include the sample index).
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This was not previously handled correctly. For example,
push_constant_stages might only contain MESA_SHADER_VERTEX because
only that stage was changed by CmdPushConstants or
CmdBindDescriptorSets.
In that case, if vertex has been merged with tess control, then the
push constant address wouldn't be updated since
pipeline->shaders[MESA_SHADER_VERTEX] would be NULL.
Use radv_get_shader() instead of getting the shader directly so that
we get the right shader if merged. Also, skip emitting the address
redundantly - if two merged stages are set in push_constant_stages
this change would have made the address get emitted twice.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
With GFX9 merged shaders, active_stages would be set to the original
stages specified if shaders were not cached, but to the stages still
present after merging if they were.
Be consistent and use the original stages.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "18.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2 (Jason Ekstrand):
- Split the blorp bit into it's own patch and re-order a bit
- Use anv_address helpers
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This commit renames add_surface_state_reloc to add_surface_reloc and
makes it takes an address. We also rename add_image_view_relocs to
add_surface_state_relocs because it takes an anv_surface_state and
doesn't really care about the image view anymore.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Instead of storing a BO and offset separately, use an anv_address. This
changes anv_fill_buffer_surface_state to use anv_address and we now call
anv_address_physical and pass that into ISL.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
This refactors surface state filling to work entirely in terms of
anv_addresses instead of offsets. This should make things simpler for
when we go to soft-pin image buffers. Among other things,
add_image_view_relocs now only cares about the addresses in the surface
state and doesn't really need the image view anymore.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
These will be used to assign virtual addresses to soft pinned
buffers in a later patch.
Two allocators are added for separate 'low' and 'high' virtual
memory areas. Another alternative would have been to add a
double-sided allocator, which wasn't done here just because it
didn't appear to give any code complexity advantages.
v2 (Scott Phillips):
- rename has_exec_softpin to use_softpin (Jason)
- Only remove bottom one page and top 4 GiB from virt (Jason)
- refer to comment in anv_allocator about state address + size
overflowing 48 bits (Jason)
- Mention hi/lo allocators vs double-sided allocator in
commit message (Chris)
- assign state pool memory ranges statically (Jason)
v3 (Jason Ekstrand):
- Use (LOW|HIGH)_HEAP_(MIN|MAX)_ADDRESS rather than (1 << 31) for
determining which heap to use in anv_vma_free
- Only return de-canonicalized addresses to the heap
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
The test pseudo-randomly makes allocations and deallocations with
the virtual memory allocator and checks that the results are
consistent. Specifically, we test that:
* no result from the allocator overlaps an already allocated range
* allocated memory fulfills the stated alignment requirement
* a failed result from the allocator could not have been fulfilled
* memory freed to the allocator can later be allocated again
v2: - fix if() in test() to actually run fill()
v3: - add c++11 build flag (Jason)
- test the full 64-bit range (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is simple linear-walk first-fit allocator roughly based on the
allocator in the radeon winsys code. This allocator has two primary
functional differences:
1) It cleanly returns 0 on allocation failure
2) It allocates addresses top-down instead of bottom-up.
The second one is needed for Intel because high addresses (with bit 47
set) need to be canonicalized in order to work properly. If we allocate
bottom-up, then high addresses will be very rare (if they ever happen).
We'd rather always have high addresses so that the canonicalization code
gets better testing.
v2: - [scott-ph] remove _heap_validate() if NDEBUG is defined (Jordan)
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Tested-by: Scott D Phillips <scott.d.phillips@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This adds a RADV_DEBUG=startup option to dump more info about
instance creation and device enumeration.
A common question end users have is why the direver is not loading
for them, and this has two common reasons:
1) They did not install the driver.
2) AMDGPU is not used for the card in the kernel.
This adds some info messages so we can easily get a some useful
output from end users.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Errors are not that common of a case so we can eat a slight perf
hit in having to call a function and do a runtime check.
In turn this makes debugging random errors happening for end users
easier, because they don't have to have a debug build on hand.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Totals from affected shaders:
SGPRS: 80 -> 80 (0.00 %)
VGPRS: 48 -> 48 (0.00 %)
Code Size: 2120 -> 2096 (-1.13 %) bytes
Max Waves: 16 -> 16 (0.00 %)
Only two Rise of Tomb Raider shaders are affected on my side.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Patch skips useless and possibly dangerous calls down to the driver
in case invalid arguments were given. I noticed this would be happening
with demo of Darwinia game. AFAIK this does not fix anything but makes
this path safer and more like how other API functions are implemented.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
CXXLD gallium_dri.la
../../../../src/broadcom/.libs/libbroadcom.a(clif_dump.o): In function `clif_dump_packet':
src/broadcom/clif/clif_dump.c:87: undefined reference to `v3d33_clif_dump_packet'
src/broadcom/clif/clif_dump.c:85: undefined reference to `v3d41_clif_dump_packet'
../../../../src/broadcom/.libs/libbroadcom.a(clif_dump.o): In function `clif_process_worklist':
src/broadcom/clif/clif_dump.c:140: undefined reference to `v3d41_clif_dump_gl_shader_state_record'
src/broadcom/clif/clif_dump.c:144: undefined reference to `v3d33_clif_dump_gl_shader_state_record'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes this use all 32 bits, so future sets need to be
defined in a new struct.
Reviewed-by: Jakob Bornecrantz <jakob@collabora.com>
Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>
This avoids loop unrolling regressions in Wolfenstein II on DXVK
with an upcoming optimisation series from Samuel.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This was terribly wrong, I forced use of 32-bit pointers when
emitting shader descriptor pointers. This fixes GPU hangs with
LLVM 5&6 because 32-bit pointers are only supported with LLVM 7.
Fixes: 88d1ed0f81 ("radv: emit shader descriptor pointers consecutively")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Gallium drivers don't expose this yet due to:
"st/mesa: use PIPE_CAP_GLSL_FEATURE_LEVEL_COMPATIBILITY"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This requires layered FBOs from GL 3.2.
Gallium drivers don't expose this yet due to:
"st/mesa: use PIPE_CAP_GLSL_FEATURE_LEVEL_COMPATIBILITY"
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Gallium drivers don't expose this yet due to:
"st/mesa: use PIPE_CAP_GLSL_FEATURE_LEVEL_COMPATIBILITY"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bindless texture handles can be passed via vertex attribs using this type.
They use the double codepath, so don't use st_pipe_vertex_format.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Bindless texture handles can be passed via vertex attribs using this type.
This fixes a bunch of bindless piglit tests on radeonsi.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
I disliked removing the const here, function tables are meant
to be const just to avoid having to think about them,
make a second table for the shm vs non-shm paths to use.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Implements putImageShm from DRIswrastLoaderExtension.
If XShm extension is not available, or fails, it will fallback on
regular XPutImage().
Tested on Linux only with 16bpp and 32bpp visual.
(airlied: tested on 24bpp as well)
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
If drisw_loader_funcs implements put_image_shm, allocates display
target data with shared memory and display with put_image_shm().
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
If the DRIswrastLoaderExtension implements putImageShm, bind it to
drisw_loader_funcs.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Add new API to put and get an image using shared memory. Instead of only
passing the data pointer, 3 arguments are given: the shmid, the data
offset and the shmaddr.
Bump interface version.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
This just renames this as we want to add an shm handle which
isn't really drm related.
Originally by: Marc-André Lureau <marcandre.lureau@gmail.com>
(airlied: I used this sed script instead)
This was generated with:
git grep -l 'DRM_API_' | xargs sed -i 's/DRM_API_/WINSYS_/g'
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When using multiple RT write messages to the same RT such as for
dual-source blending or all RT writes in SIMD32, we have to set the
"Last Render Target Select" bit on all write messages that target the
last RT but only set EOT on the last RT write in the shader.
Special-casing for dual-source blend works today because that is the
only case which requires multiple RT write messages per RT. When we
start doing SIMD32, this will become much more common so we add a
dedicated bit for it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The only reason it was it's own opcode was so that we could detect it
and adjust the source register based on the payload setup. Now that
we're using the ATTR file for FS inputs, there's no point in having a
magic opcode for this.
v2 (Jason Ekstrand):
- Break the bit which removes the CINTERP opcode into its own patch
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This replaces the special magic opcodes which implicitly read inputs
with explicit use of the ATTR file.
v2 (Jason Ekstrand):
- Break into multiple patches
- Change the units of the FS ATTR to be in logical scalars
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is better than compression control because it naturally extends to
SIMD32.
v2:
- Push/pop instruction state around adjusted codegen (Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The fall-back does not work correctly in SIMD16 mode and the register
allocator should ensure that we never hit this case anyway.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Commit 92f01fc5f9 ("i965: Emit VF cache invalidates for 48-bit
addressing bugs with softpin.") tried to only emit the VF invalidate if
the high bits changed, but it accidentally always set need_invalidate to
true; causing it to emit unconditionally emit the pipe control before
every primitive.
Fixes: 92f01fc5f9 ("i965: Emit VF cache invalidates for 48-bit addressing bugs with softpin.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106708
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The modifiers array hasn't been initialised by then, much less with data
that would need freeing.
Move the label after the loop to fix this.
Fixes: c80c08e226 ("vulkan/wsi/x11: Add support for DRI3 v1.2")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
---
v2: rebased on top of 432df741e0 "dri_util: Add
R10G10B10{A,X}2 translation between DRI and mesa_format."
0 is not a valid value for the __DRI_IMAGE_FORMAT_* enum.
It is, however, the value of MESA_FORMAT_NONE, which two of the callers
(i915 & i965) checked for.
The other callers (that check for errors, ie. st/dri) already check for
__DRI_IMAGE_FORMAT_NONE.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Resources created with modifiers are treated as scanout because there is
no way for applications to specify the usage (though that capability may
be useful to have in the future). Currently all the resources created by
applications with modifiers are for scanout, so make sure they have bind
flags set accordingly.
This is necessary in order to properly export buffers for such resources
so that they can be shared with scanout hardware.
Tested-by: Daniel Kolesa <daniel@octaforge.org>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Thierry Reding <treding@nvidia.com>
Resources created for scanout but without modifiers need to be treated
as pitch-linear. This is because applications that don't use modifiers
to create resources must be assumed to not understand modifiers and in
turn won't be able to create a DRM framebuffer and passing along which
modifiers were picked by the implementation.
Tested-by: Daniel Kolesa <daniel@octaforge.org>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Thierry Reding <treding@nvidia.com>
favicon.png is just gears.png resized to 64x64, and favicon.ico is
generated using this command, adapted from the ImageMagick example [1]:
$ convert favicon.png -background black \
\( -clone 0 -resize 16x16 \) \
\( -clone 0 -resize 32x32 \) \
\( -clone 0 -resize 48x48 \) \
\( -clone 0 -resize 64x64 \) \
-delete 0 -alpha off -colors 256 favicon.ico
We could edit every html page to add `<link rel="icon" href="favicon.ico" />`,
but there's not much point as pretty much every browser will pick it up
automatically if the file is named `favicon.ico` and is in the root folder.
[1] http://www.imagemagick.org/Usage/thumbnails/#favicon
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
This reduces the number of SET_SH_REG packets which are emitted
for applications that use more than one descriptor set per stage.
We should be able to emit more SET_SH_REG packets consecutively
(like push constants and vertex buffers for the vertex stage),
but this will be improved later.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will allow to emit consecutive shader pointers for
reducing the number of emitted SET_SH_REG packets, which
is recommended.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Previously, findFirstUse() only considered reads "uses". This fixes that
by making it check both an instruction's sources and definitions. It
also shortens both findFistUse() and findFirstDef() along the way.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Literally the same as the AMD ext.
Passes *indirect_draw_count* CTS tests.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This improves dota2 performance for me by 11% when I force the
GPU DPM level to low (otherwise dota2 is CPU limited for 4k on my
threadripper), which should be a large part of the radv-amdvlk gap.
(For me with that was radv 60.3 -> 66.6, while AMDVLK does about 68
fps)
It looks like dota2 rendered the GUI with a bunch of draws with
a SetScissors before almost each draw, causing a lot of pipeline
stalls.
I'm not really happy with the duplication of code, but overriding
radeon_set_context_reg would also be messy since we have the
pre-recorded pipelines and a bunch of si_cmd_buffer code, as well
as some memory->context reg loads for which things would be more
complicated.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
A later patch will make use of this in other places. Also, remove
dependency on undefined behavior of left-shifting a signed value.
v2: - move function into a separate header (Chris)
v3: (by Ken) Add new header to the various build systems.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Make sure only those components are written to that are specified in the
write mask.
Fixes:
dEQP-GLES2.functional.shaders.operator.common_functions.sign.lowp_float_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.lowp_float_fragment
dEQP-GLES2.functional.shaders.operator.common_functions.sign.mediump_float_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.mediump_float_fragment
dEQP-GLES2.functional.shaders.operator.common_functions.sign.highp_float_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.highp_float_fragment
dEQP-GLES2.functional.shaders.operator.common_functions.sign.lowp_vec3_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.lowp_vec3_fragment
dEQP-GLES2.functional.shaders.operator.common_functions.sign.mediump_vec3_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.mediump_vec3_fragment
dEQP-GLES2.functional.shaders.operator.common_functions.sign.highp_vec3_vertex
dEQP-GLES2.functional.shaders.operator.common_functions.sign.highp_vec3_fragment
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
In cases like
IDIV TEMP[0].xy TEMP[0].xx TEMP[1].yy
the result will be written to the same register that is also a source register.
Since the components are evaluated one by one, this may result in overwriting
the source value for a later operation. Work around this by adding another
temporary to store the result if the destination temporary index is equal to
one of the source temporary indices.
Fixes:
dEQP-GLES2.functional.shaders.operator.binary_operator.div.*
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This reverts commit 79fe00efb4.
This reverts commit f5e8b13f78.
This reverts commit d21c086d81.
They broke the Android build and I'd rather not leave it broken
for the long holiday weekend.
Rename the (un)map_gtt functions to (un)map_map (map by
returning a map) and add new functions (un)map_tiled_memcpy that
return a shadow buffer populated with the intel_tiled_memcpy
functions.
Tiling/detiling with the cpu will be the only way to handle Yf/Ys
tiling, when support is added for those formats.
v2: Compute extents properly in the x|y-rounded-down case (Chris Wilson)
v3: Add units to parameter names of tile_extents (Nanley Chery)
Use _mesa_align_malloc for the shadow copy (Nanley)
Continue using gtt maps on gen4 (Nanley)
v4: Use streaming_load_memcpy when detiling
v5: (edited by Ken) Move map_tiled_memcpy above map_movntdqa, so it
takes precedence. Add intel_miptree_access_raw, needed after
rebasing on commit b499b85b0f.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We stream from a tiled and aligned source into an unaligned user buffer,
so we need to use _mm_storeu_si128.
Fixes: d21c086d81 (i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For certain EGLImage cases, we represent a single slice or LOD of an
image with a byte offset to a tile and X/Y intratile offsets to the
given slice. Most of i965 is fine with this but it breaks blorp. This
is a terrible way to represent slices of a surface in EGL and we should
stop some day but that's a very scary and thorny path. This gets blorp
to start working with those surfaces and fixes some dEQP EGL test bugs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106629
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GS is tested, tessellation is untested.
Have outputs_written_before_ps for HW VS and outputs_written for other
stages. The reason is that COLOR and BCOLOR alias for HW VS, which
drives elimination of VS outputs based on PS inputs.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This fixes shader images where we always bind stObj->pt and not individual
gl_texture_images.
Roughly based on i965 commit 845ad2667a
which does a similar thing but for a different reason.
This fixes GL CTS assertion failures introduced by Ilia.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The reference for MOVNTDQA says:
For WC memory type, the nontemporal hint may be implemented by
loading a temporary internal buffer with the equivalent of an
aligned cache line without filling this data to the cache.
[...] Subsequent MOVNTDQA reads to unread portions of the WC
cache line will receive data from the temporary internal
buffer if data is available.
This hidden cache line sized temporary buffer can improve the
read performance from wc maps.
v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Optimize AVX-512 PA Assemble (PA_STATE_OPT). Reduced generated code by
about 4x, MSVC compiler was going crazy making temporaries and
split-loading inputs onto the stack unless explicit AVX-512 load ops
were added
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Added two new files for a wrapper function for initialization
v2: added missing include for single architecture builds
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
It's recommended by the instruction combining pass, and
RadeonSI also runs it.
This pass used to segfault with one shader of F12017 in the
past, but it no longer crashes. Maybe the LLVM IR generated
by RADV has changed.
Polaris10:
Totals from affected shaders:
SGPRS: 441352 -> 441648 (0.07 %)
VGPRS: 310888 -> 300784 (-3.25 %)
Spilled SGPRs: 13576 -> 12983 (-4.37 %)
Code Size: 22560328 -> 22420544 (-0.62 %) bytes
Max Waves: 40755 -> 41366 (1.50 %)
Vega10:
Totals from affected shaders:
SGPRS: 442848 -> 442000 (-0.19 %)
VGPRS: 310396 -> 300460 (-3.20 %)
Spilled SGPRs: 13708 -> 12906 (-5.85 %)
Code Size: 22479428 -> 22336216 (-0.64 %) bytes
Max Waves: 45783 -> 46506 (1.58 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When glUseProgram is used, references to the included shaders are
added in ctx->Shader.ReferencedProgram. But those references are not
decreased when the shader data is deallocated. Thus, those shaders
are leaked.
Explicitely remove the pending references to these shaders.
Fixes: e6506b3cd2 ("mesa: retain gl_shader_programs after glDeleteProgram if they are in use")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Functionality already covered by ARB_texture_view, patch also
adds missing 'gles guard' for enums (added in f1563e6392).
Tested via arb_texture_view.*_gles3 tests and individual app
utilizing texture view with ETC2.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This doesn't nothing special currently because we don't create
any copy_var instructions, but this is needed for the next patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Instead of directly using intel_obj->buffer. Among other things
intel_bufferobj_buffer() will update intel_buffer_object::
gpu_active_start/end, which are used by glBufferSubData() to decide
which path to take. Fixes a failure in the Piglit
ARB_shader_image_load_store-host-mem-barrier Buffer Update/WaW tests,
which could be reproduced with a non-standard glGetTexSubImage
implementation (see bug report).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105351
Reported-by: Nanley Chery <nanleychery@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Otherwise the specified surface state will allow the GPU to access
memory up to BufferOffset bytes past the end of the buffer. Found by
inspection.
v2: Protect against out-of-range BufferOffset (Nanley).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The buffer texture size calculations (should be easy enough, right?)
are repeated in three different places, each of them subtly broken in
a different way. E.g. the image load/store path was never fixed to
clamp to MaxTextureBufferSize, and none of them are taking into
account the buffer offset correctly. It's easier to fix it all in one
place.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106481
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This reverts commit c0ed52f614. It was
preventing the image format validation from being done on buffer
textures, which is required to ensure that the application doesn't
attempt to bind a buffer texture with an internal format incompatible
with the image unit format (e.g. of different texel size), which is
not allowed by the spec (it's not allowed for *any* texture target,
whether or not there is spec wording restricting this behavior
specifically for buffer textures) and will cause the driver to
calculate texel bounds incorrectly and potentially crash instead of
the expected behavior.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106465
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
WQM is pretty reliable now on LLVM 7, so let us just use
DPP + WQM.
This gives approximately a 1.5% performance increase on the
vrcompositor built-in benchmark.
v2: Use ac_build_quad_swizzle.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
I'm guessing an earlier version of the website used to have the page
contents in <frames>, but this isn't the case anymore so just drop the
unnecessary `target="_main"` :)
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This unifies the explicit rasterization discard as well as the implicit
rasterization disabled logic (which we need for another state tracker),
which really should do the exact same thing.
We'll now toss out the prims early on in setup with (implicit or
explicit) discard, rather than do setup and binning with them, which
was entirely pointless.
(We should eventually get rid of implicit discard, which should also
enable us to discard stuff already in draw, hence draw would be
able to skip the pointless clip and fallback stages in this case.)
We still need separate logic for only null ps - this is not the same
as rasterization discard. But simplify the logic there and don't count
primitives simply when there's an empty fs, regardless of depth/stencil
tests, which seems perfectly acceptable by d3d10.
While here, also fix statistics for primitives if face culling is
enabled.
No piglit changes.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The only function that doesn't need to call access_raw is map_blit. If
it takes the blitter path, it will happen as part of intel_miptree_copy.
If map_blit takes the blorp path, brw_blorp_copy_miptrees will handle
doing whatever resolves are needed. This should save us resolves in
quite a few cases and will probably help performance a bit.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's faster than the blitter and can handle things like stencil properly
so it doesn't require software fallbacks.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The blorp path (called first) can do anything the blitter path can do so
it's just dead code.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On gen4-5, we try the blitter before we even try blorp. On newer
platforms, blorp can do everything the blitter can so there's no point
in even having the blitter fall-back path.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Using meta for anything is fairly aweful and definitely has more CPU
overhead. However, it also uses the 3D pipe and is therefore likely
faster in terms of GPU time than the blitter. Also, the blitter code
has so many early returns that it's probably not buying us that much.
We may as well just use meta all the time instead of working over-time
to find the tiny case where we can use the blitter. We keep gen4-5
using the old blit paths to avoid perturbing old hardware too much.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We'd like to start using soft-pin to assign BO addresses up front, and
never move them again. Our previous plan for dealing with 48-bit VF
cache bugs was to relocate vertex buffers to the low 4GB, so we'd never
have addresses that alias in the low 32 bits. But that requires moving
buffers dynamically.
This patch tracks the last seen BO address for each vertex/index buffer,
and emits a VF cache invalidate if the high bits change. (Ideally, we
won't hit this case very often.) This should work for the soft-pin
case, but unfortunately won't work in the relocation case, as we don't
actually know the addresses. So, we have to use both methods.
v2: Mention that the cache uses a <VertexBufferIndex, Address> tuple
more explicitly (suggested by Scott). Mention "single batch" too
(suggested by Chris).
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
We're planning to start managing the PPGTT in userspace in the near
future, rather than relying on the kernel to assign addresses. While
most buffers can go anywhere, some need to be restricted to within 4GB
of a base address.
This commit adds a "memory zone" parameter to the BO allocation
functions, which lets the caller specify which base address the BO will
be associated with, or BRW_MEMZONE_OTHER for the full 48-bit VMA.
Eventually, I hope to create a 4GB memory zone corresponding to each
state base address.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Prevents corrupting the upper 32 bits of draw->recv_sbc when
draw->send_sbc resets to 0 (which currently happens when the window is
unbound from a context and bound to one again), which in turn caused
loader_dri3_swap_buffers_msc to calculate target_msc with corrupted
upper 32 bits. This resulted in hangs with the Xorg modesetting driver
as of xserver 1.20 (older versions and other drivers ignored the upper
32 bits of the target MSC, which is why this wasn't noticed earlier).
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/106351
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Fix build error.
CC v3d_blit.lo
In file included from v3d_blit.c:27:0:
v3d_context.h:39:10: fatal error: v3d_drm.h: No such file or directory
#include "v3d_drm.h"
^~~~~~~~~~~
Fixes: 8a793d42f1 ("v3d: Switch the vc5 driver to using the finalized V3D UABI.")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we have the common WSI code, we use vkCmdCopyImageToBuffer
instead.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
SRGB stores are broken. We had compensation code in the
resolve path but none in the copy path. Since we don't
want any conversion and it does not matter for DCC,
just make everything UNORM instead.
This happened to cause wrong colors for the PRIME path, as
that uses image->buffer copies which always use the compute
path.
CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106587
Reviewed-by: Dave Airlie <airlied@redhat.com>
Patch changes entrypoints generator to not skip this extension even
though it is set as disabled in the xml. We also need compilation
flag VK_USE_PLATFORM_ANDROID_KHR to be enabled.
It looks like this extension got disabled in commit 69f447553c.
v2: just remove the whole 'supported' attrib check + remove
vk_icd.h compilation fix (fix in VulkanHeaders instead)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The host side hasn't got support for this feature yet, so don't enable it
unless we get the caps from the host.
This makes the texture buffer range piglit tests skip now.
Fixes: fe0647df5a (virgl: add offset alignment values to to v2 caps struct)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Just let the extension detection do its job as we will be adding
compat profile support in future, also we want these to work
with compat profile version overrides.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We're not sharing 32_32_32 formats between different GPUs, so we
do not have to align for vega on pre-vega cards.
Fixes: e361970ed7 "radv: Add support for IMG_DATA_FORMAT_32_32_32."
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
I've confirmed after 77554d220d we no
longer need this to pass some tests from another api (as we no longer
generate the bogus extra null tris in the first place).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This pass is required by the Midgard compiler; our instruction set uses
NIR-style booleans (~0 for true) but lacks a dedicated b2f instruction.
Normally, this lowering pass would be implemented in a backend-specific
algebraic pass, but this conflicts with the existing iand->b2f pass in
nir_opt_algebraic.py, hanging the compiler. This patch thus makes the
existing pass optional (default on -- all other backends should remain
unaffected), adding an optional pass for lowering the opposite
direction.
v2: Defer lowering until late algebraic optimisations to allow
optimising the b2f instruction itself.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This is useful for every user of ISL. Drop the comment along the way to
match similar functions in ISL.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
intel_miptree_supports_{ccs,mcs,hiz} ensures the format is valid for the
color or depth miptree before the miptree is assigned an aux_usage.
alloc_aux switches on the aux_usage so don't assert that the format is
valid.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Synchronize the requirements listed in isl_surf_get_ccs_surf with
intel_miptree_supports_ccs by importing a restriction from ISL. Some
implications:
* We successfully create every aux_surf in alloc_aux
* We only return false from alloc_aux if we run out of memory
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The tgsi_info.num_tokens fix broke llvmpipe's detection of no-op shaders.
Fix the code to check for num_instructions <= 1 instead.
Fixes: 8fde9429c3 ("tgsi: fix incorrect tgsi_shader_info::num_tokens
computation")
Tested-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Currently GLSL 1.4 is defined for all gallium drivers even only
GLSL 1.2 is supported as seen on etnaviv.
v1 -> v2:
- use _min(..) as suggested by Lucas Stach and Michel Dänzer
Fixes: 4560aad780 ("mesa: add GLSLVersionCompat constant")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Some drivers (virgl) don't support GL4.4 or GLES3.1 yet,
so never fill in this const.
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We were incrementing num_tokens in each loop iteration while parsing
the shader. But each call to tgsi_parse_token() can consume more than
one token (and often does). Instead, just call the tgsi_num_tokens()
function.
Luckily, this issue doesn't seem to effect any current users of this
field (llvmpipe just checks for <= 1, for example).
Reviewed-by: Neha Bhende<bhenden@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
All the shader program dependent handling is done on the level
of the gl_Context::Array._DrawVAO/_DrawVAOEnabledAttribs.
So, skip array element invalidation on _NEW_PROGRAM.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
For the VAO internal helper functions that may be called
with a non current VAO, flag the _NEW_ARRAY state only
if it is the current ctx->Array.VAO.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Pending draw calls on immediate mode or display list calls do
not depend on changes of the VAO state. So, remove calls to
FLUSH_VERTICES and flag _NEW_ARRAY as appropriate.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Mesa 18.1 series has not been released yet, so let's extend 18.0 lifetime.
v2: Add missing closing TR tags (Eric Engestrom)
CC: Andres Gomez <agomez@igalia.com>
CC: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
With the syncobj support in place, lets use it to implement the
EGL_ANDROID_native_fence_sync extension. This mostly follows previous
implementations in freedreno and etnaviv.
v2: Drop the flags (Eric)
Handle in_fence_fd already in job_submit (Eric)
Drop extra vc4_fence_context_init (Eric)
Dup fds with CLOEXEC (Eric)
Mention exact extension name (Eric)
Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This gives us access to the fence created for the render job.
v2: Drop flag (Eric)
Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to know if the kernel supports syncobj submission since otherwise
all the DRM syncobj calls fail.
v2: Use drmGetCap to detect syncobj support (Eric)
Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Require a version of libdrm with syncobj support.
v2: Don't require a libdrm_vc4, just bump core libdrm if vc4 enabled (by
anholt)
Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Synchronized with kernel v2
v3: Update for the finalized kernel ABI (pad2 field)
Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This was missed in the move back to the local uapi copy.
libdrm_vc4 only seems to consist of headers that also exist in the
Mesa tree.
Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
With the previous patches, we now update the indirect clear color buffer
every time the clear color changes. Avoid redundant updates.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Allow callers to handle updating the indirect clear color buffer
themselves. This can reduce the number of clear color updates in the
case where a caller performs multiple fast clears with the same clear
color.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If the aux state is CLEAR and clear color value has changed, only the
surface state must be updated. The bit-pattern in the aux buffer is
exactly the same.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This comment made more sense when it was above the calls to
intel_miptree_slice_set_needs_depth_resolve(). We stopped using these
functions at commit 554f7d6d02
("i965: Move depth to the new resolve functions").
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For depth buffers, we avoid fast-clearing if the aux_state is already
CLEAR. We do the same for color buffers only if the clear color
doesn't change. We require that the clear colors match because, in
that case, we don't update the indirect clear color outside of BLORP.
Update the indirect clear color for color buffers as well. We'll
enable the same depth buffer optimization for color buffers in a
later patch.
Note that we're now actually updating the indirect clear color twice
in the case where we use BLORP to perform the fast-clear. This is
only temporary. In later patches, we'll prevent BLORP from performing
the update.
v2: Add more context to the commit message (Topi).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reduce complexity and allow the next patch to delete some code. With
this change, clear operations will still be skipped and setting the
aux_state will cause no side-effects.
Remove the associated comment which implies an early return.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This reverts commit 1d94aa1987.
The next patch will make depth miptrees use the clear color setter that
was originally being used for color miptrees. Go back to using the
isl_color_value parameter because it's the same type as the
fast_clear_color field used by color and depth miptrees.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There isn't much that changes between the aux allocation functions.
Remove the duplicated code.
v2: Inline the switch statement (Jason).
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We're going to delete intel_miptree_alloc_ccs() in the next commit. With
that in mind, replace the use of this function in
do_single_blorp_clear() with intel_miptree_alloc_aux() and move the
delayed allocation logic to it's callers.
v2: Duplicate the delayed allocation comment (Topi Pohjolainen).
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The indirect clear color isn't correctly tracked in
intel_miptree::fast_clear_color. The initial value of ::fast_clear_color
is zero, while that of the indirect clear color is undefined.
Topi Pohjolainen discovered this issue with MCS buffers. This issue is
apparent when fast-clearing an MCS buffer for the first time with
glClearColor = {0.0,}. Although the indirect clear color is undefined,
the initial aux state of the MCS is CLEAR and the tracked clear color is
zero, so we avoid updating the indirect clear color with {0.0,}.
Make the indirect clear color match the initial value of
::fast_clear_color.
Note: although we only have to drop HiZ's BO_ALLOC_BUSY flag for gen10+,
we also drop it pre-gen10 to keep things simple. We add this flag back
for pre-gen10 in a later patch.
v2: Add a note about dropping HiZ's BO_ALLOC_BUSY flag (Topi).
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Add infrastructure for initializing the clear color BO.
intel_miptree_init_mcs is no longer needed with change.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Before this patch, the aux_state was actually AUX_INVALID because the BO
was never defined. This was fine on single slice miptrees because we
would fast-clear the resource right after creation. For multi-slice
miptrees on SKL+ however, this results in undefined behavior when
accessing a non-base slice. Here's a specific example:
1) Fast clear level 0
* Undefined CCS_D buffer allocated in "PASS_THROUGH" state.
* Level 0 transitions to the CLEAR state.
2) Render to level 1
* Level 1 may have a 2-bit pattern of 2's.
* Rendering with a 2 in the CCS is undefined.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Before this patch, if we failed to initialize an MCS buffer, we'd
end up in a state in which the miptree thinks it has an MCS buffer,
but doesn't. We also leaked the clear_color_bo if it existed.
With this patch, we now free the miptree aux buffer resources and let
intel_miptree_alloc_mcs() know that the MCS buffer no longer exists.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
That way the winsys might use a faster path when the global
BO list is NULL.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Having an entrypoint different than "main" doesn't mean we
have multiple shaders per module.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This allows the driver to load against the merged kernel DRM driver. In
the process, rename most of the build system variables and gallium
plumbing functions.
In the process of merging to the kernel, I renamed the driver to the
general product line's name (since we have both vc5 and vc6 supported
already). Since the ABI is finalized, move the header to include/drm-uapi.
At buffer resource validation time, if the resource handle is not yet
created and if the initial buffer bind flags and the tobind flags are
incompatible, just use the tobind flags to create the resource handle.
On the other hand, if the bind flags are compatible, we can combine
the bind flags for the resource handle creation.
Fixes piglit gl-3.1-buffer-bindings crash.
Reviewed-by: Brian Paul <brianp@vmware.com>
Cast the enum to GLint to avoid the compile warning:
/src/mesa/main/get.c:3005:19:
warning: comparison of constant -32768 with expression of type
'GLenum16' (aka 'unsigned short') is always false
-Wtautologicalia-constant-out-of-range-compare
Tests: compilation without this warning
Signed-off-by: jenny.q.cao <jenny.q.cao@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Seems that when the rnndb files for etniviv were updated/included back
in Nov 2017, hw/texdesc_3d.xml.h was missed from Makefile.sources and
meson.build. This was all during the conversion to meson, so it apears
to have slipped through the cracks. As such, this file has been missing
from the official tarballs since inclusion in Mesa, so the git trees
and tarballs differ.
Found due to lintian errors in the Debian packages.
Fixes: f1e1c60ff6 ("etnaviv: Update from rnndb")
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Thanks for your comment. This version has an additional boolean in the
fps_info struct to distinguish between fps and frame time calculation.
The struct is initialised in the respecting install functions for this
purpose.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Use pipe_reference to release old RAT surfaces.
RAT surface adds a reference to pool bo, so use reference counting for pool->bo
as well.
v2: Use the same pattern for both defrag paths
Drop confusing comment
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Use a single allocation of array type instead of the old-style array
allocation for the temp and immediate arrays.
Probably only makes a difference if they aren't used indirectly (so,
if we used them solely because there's too many temps or immediates).
In this case the sroa and early-cse passes can sometimes do some
optimizations which they otherwise cannot.
(As a side note, for the temp reg array, we actually really should
use one allocation per array id, not just one for everything.)
Note that the instcombine pass would actually promote such
allocations to single alloc of array type as well, but it's too late
for some artificial shaders we've seen to help (we don't want to run
instcombine at the beginning due to its cost, hence would need
another sroa/cse pass after instcombine). sroa/early-cse help there
because they can actually eliminate all of the huge shader, reducing
it to a single const output (don't ask...).
(Interestingly, instcombine also removes all the bitcasts we do on that
allocation for single-value gathering, and in the end directly indexes
into the single vector elements, which according to spec is only
semi-valid, but this happens regardless. Another thing instcombine also
does is use inbound GEPs, which is probably something we should do
manually as well - for indirectly indexed reg files llvm may not be
able to figure it out on its own, but we should be able to guarantee
all pointers are always inbound. In any case, by the looks of it
using single allocation with array type seems to be the right thing
to do even for ordinary shaders.)
No piglit change.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We should stop walking through the CFG when the inner loop's
break block ends up as the same block as the outer loop's
continue block because we are already going to visit it.
This fixes the following assertion which ends up by crashing
in RADV or ANV:
SPIR-V parsing FAILED:
In file ../src/compiler/spirv/vtn_cfg.c:381
block->node.link.next == NULL
0 bytes into the SPIR-V binary
This also fixes a crash with a camera shader from SteamVR.
v2: make use of vtn_get_branch_type() and add an assertion
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106090
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106504
CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The code didn't expect that, leading to crashes.
Fixes: 86d63b53a2 "gallium: remove aux_vertex_buffer_slot code"
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Since the fence can outlive the context, and all it really needs to wait
on a fence is the pipe, use the new fd_pipe reference counting to hold a
ref to the pipe and drop the ctx pointer.
This fixes a crash seen with (for example) glmark2:
#0 fd_pipe_wait_timeout (pipe=0xbf48678b3cd7b32b, timestamp=0, timeout=18446744073709551615) at freedreno_pipe.c:101
#1 0x0000ffffbdf75914 in fd_fence_finish (pscreen=0x561110, ctx=0x0, fence=0xc55c10, timeout=18446744073709551615) at ../src/gallium/drivers/freedreno/freedreno_fence.c:96
#2 0x0000ffffbde154e4 in dri_flush (cPriv=0xb1ff80, dPriv=0x556660, flags=3, reason=__DRI2_THROTTLE_SWAPBUFFER) at ../src/gallium/state_trackers/dri/dri_drawable.c:569
#3 0x0000ffffbecd8b44 in loader_dri3_flush (draw=0x558a28, flags=3, throttle_reason=__DRI2_THROTTLE_SWAPBUFFER) at ../src/loader/loader_dri3_helper.c:656
#4 0x0000ffffbecbc36c in glx_dri3_flush_drawable (draw=0x558a28, flags=3) at ../src/glx/dri3_glx.c:132
#5 0x0000ffffbecd91e8 in loader_dri3_swap_buffers_msc (draw=0x558a28, target_msc=0, divisor=0, remainder=0, flush_flags=3, force_copy=false) at ../src/loader/loader_dri3_helper.c:827
#6 0x0000ffffbecbcfc4 in dri3_swap_buffers (pdraw=0x5589f0, target_msc=0, divisor=0, remainder=0, flush=1) at ../src/glx/dri3_glx.c:587
#7 0x0000ffffbec98218 in glXSwapBuffers (dpy=0x502bb0, drawable=2097154) at ../src/glx/glxcmds.c:840
#8 0x000000000040994c in CanvasGeneric::update (this=0xfffffffff400) at ../src/canvas-generic.cpp:114
#9 0x0000000000411594 in MainLoop::step (this=this@entry=0x5728f0) at ../src/main-loop.cpp:108
#10 0x0000000000409498 in do_benchmark (canvas=...) at ../src/main.cpp:117
#11 0x00000000004071b0 in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.cpp:210
Signed-off-by: Rob Clark <robdclark@gmail.com>
The cache doesn't hold a (strong) reference to the batch. So we
shouldn't be trying to drop a reference, as that leads to:
#0 0x0000ffffbecb37a0 in raise () from /lib64/libc.so.6
#1 0x0000ffffbeca159c in abort () from /lib64/libc.so.6
#2 0x0000ffffbecacf48 in __assert_fail_base () from /lib64/libc.so.6
#3 0x0000ffffbecacfa8 in __assert_fail () from /lib64/libc.so.6
#4 0x0000ffffbd28def0 in pipe_reference_described (ptr=0x4f47130, reference=0x0, get_desc=0xffffbd2e0f08 <__fd_batch_describe>) at ../src/gallium/auxiliary/util/u_inlines.h:88
#5 0x0000ffffbd28e188 in fd_batch_reference_locked (ptr=0x4f40de0, batch=0x0) at ../src/gallium/drivers/freedreno/freedreno_batch.h:258
#6 0x0000ffffbd28e9a8 in fd_bc_invalidate_resource (rsc=0x4f40ca0, destroy=true) at ../src/gallium/drivers/freedreno/freedreno_batch_cache.c:244
#7 0x0000ffffbd293778 in fd_resource_destroy (pscreen=0xedc170, prsc=0x4f40ca0) at ../src/gallium/drivers/freedreno/freedreno_resource.c:644
#8 0x0000ffffbd922674 in u_transfer_helper_resource_destroy (pscreen=0xedc170, prsc=0x4f40ca0) at ../src/gallium/auxiliary/util/u_transfer_helper.c:144
#9 0x0000ffffbd29527c in pipe_resource_reference (ptr=0x4f455d8, tex=0x0) at ../src/gallium/auxiliary/util/u_inlines.h:144
#10 0x0000ffffbd29548c in fd_surface_destroy (pctx=0x1012720, psurf=0x4f455d0) at ../src/gallium/drivers/freedreno/freedreno_surface.c:78
#11 0x0000ffffbd1f9c48 in pipe_surface_reference (ptr=0x4f471d0, surf=0x0) at ../src/gallium/auxiliary/util/u_inlines.h:113
#12 0x0000ffffbd1f9ef4 in util_copy_framebuffer_state (dst=0x4f471c8, src=0x0) at ../src/gallium/auxiliary/util/u_framebuffer.c:114
#13 0x0000ffffbd2e0e30 in __fd_batch_destroy (batch=0x4f47130) at ../src/gallium/drivers/freedreno/freedreno_batch.c:225
#14 0x0000ffffbd28e1b0 in fd_batch_reference_locked (ptr=0xfffffffff010, batch=0x0) at ../src/gallium/drivers/freedreno/freedreno_batch.h:262
#15 0x0000ffffbd28e6b0 in fd_bc_invalidate_context (ctx=0x1012720) at ../src/gallium/drivers/freedreno/freedreno_batch_cache.c:190
#16 0x0000ffffbd2e2b6c in fd_context_destroy (pctx=0x1012720) at ../src/gallium/drivers/freedreno/freedreno_context.c:139
#17 0x0000ffffbd2c3280 in fd5_context_destroy (pctx=0x1012720) at ../src/gallium/drivers/freedreno/a5xx/fd5_context.c:56
#18 0x0000ffffbd5b7a8c in st_destroy_context_priv (st=0xfd72f0, destroy_pipe=true) at ../src/mesa/state_tracker/st_context.c:281
Signed-off-by: Rob Clark <robdclark@gmail.com>
Bump xa minor to signal that the underlying mesa version is suitable for dri3.
This is a bit ugly since it doesn't relate to a specific xa interface change.
Recently there has been a number of fixes in mesa that helps enabling dri3
without any significant regressions in automated testing and common desktop
usage latency. However, the xf86-video-vmware driver has no other way to tell
but inspecting the xa version.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
By using the geometry shader output usage mask.
This improves all Vulkan demos that use a geometry shader
(ie. geometryshader, deferredshadows, viewportarray).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For reducing the number of parameters that are exported by
the GS copy shader.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
An upcoming patch will run the shader info pass on the
geometry shader just before emitting the GS copy shader.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's clear that the original code meant to do this and there is even a
10-line comment explaining why. Originally, we had a simple function
for packing the clear colors which was unaware of sRGB. However, in
a6b66a7b26, when we started using ISL to do the packing, the wrong
format was used.
Fixes: a6b66a7b26 "intel/blorp: Use ISL instead of bitcast_color..."
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This moves the extra queries to after the main query ended, instead
of doing it after the begin and hence doing nesting.
We also emit only (view count - 1) extra queries, as the main query
is already there for the first view.
This fixes the CTS occasionally getting stuck in
dEQP-VK.multiview.queries* waiting on results.
Fixes: 32b4f3c38d "radv/query: handle multiview queries properly. (v3)"
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
`dep_valgrind != []` now (0.45) produces a warning that is quite explicit:
WARNING: Trying to compare values of different types (DependencyHolder, list) using !=.
The result of this is undefined and will become a hard error in a future Meson release.
`dep_valgrind = []` used to be the recommended way to deal with
non-existant dependency, but these don't work with `.found()`, so now
the recommended way is to declare a impossible dependency, which
null_dep does for us in Mesa.
In short, we don't need and shouldn't check for `!= []` anywhere anymore.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
These are going to be crazy and we are probably going to add
more scan stuff in the future. Also use switch cases instead.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
There is a compile warning from Android 8 (API version 26) from "include cutils/log.h"
warning: "Deprecated: don't include cutils/log.h, use either android/log.h or log/log.h"-W#warnings,
Change to include "log/log.h" on Android 8 or later major version to avoid this warning
Signed-off-by: jenny.q.cao <jenny.q.cao@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
We were never producing negative numbers for signed types.
Also fix only producing half the valid range for uint32, and
properly clamp signed values.
Because this now also properly tests snorm with actually negative
values, need to increase eps for such conversions. I believe these
cannot actually be hit in ordinary operation (e.g. if a snorm texture
is sampled and output to snorm RT, it will still go through snorm->float
and float->snorm conversion), so don't bother to do anything to fix
the bad accuracy (might be quite complex).
Basically, the issue is for something like snorm16->snorm8 that in the
end this will just use a 8 bit arithmetic right shift.
But the math behind it says we should actually do a division by 32767 / 127, which
is ~258, not 256. So the result can be one bit off (values have too large
magnitude), and furthermore, the shift has incorrect rounding (always rounds
down). For positive numbers, these errors have different direction, but
for negative ones they have the same, hence for some values the error will
be 2 bit in the end.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=106232
I don't think the hw resolve path can't handle multi-layer images.
This fixes all the:
dEQP-VK.renderpass.multisample_resolve.layers_*
tests on my VI card.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: <mesa-stable@lists.freedesktop.org>
This path should iterate across all layers, I've some ideas
for doing this in a single pass, but this is simpler for now.
This passes the tests because we don't use the fragment path
unless we have DCC, and we don't have DCC on layered images.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: <mesa-stable@lists.freedesktop.org>
When VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT is set we skip NIR
linking optimisations and only run over the NIR optimisation loop
once similar to the GLSLOptimizeConservatively constant used by
some GL drivers.
We need to run over the opts at least once to avoid errors in LLVM
(e.g. dead vars it can't handle) and also to reduce the time spent
compiling the IR in LLVM.
With this change the Blacksmith Unity demos compilation times
go from 329760 ms -> 299881 ms when using Wine and DXVK.
V2: add bit to radv_pipeline_key
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106246
Fix SCons build error.
Linking build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1.5 ...
build/linux-x86_64-debug/mesa/libmesa.a(st_program.os): In function `st_translate_prog_to_nir':
src/mesa/state_tracker/st_program.c:392: undefined reference to `prog_to_nir'
Fixes: 5c33e8c772 ("st/nir: use NIR for asm programs")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
All vce firmwares with major version greater than or equal to 53 are supported
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
A fence can outlive the ctx it was created from (see glmark2).. etnaviv
doesn't actually need fence->ctx so lets remove it before someone makes
the mistake of assuming it is a valid pointer.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
- Change tilemgr TILE_ID encoding to use Morton-order (Z-order).
- Change locked tiles set to bitset. Makes clear, set, get much faster.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Previously was using the draw topology, which may change if GS or Tess
are active. Only affected attributes marked with constant interpolation,
which limited the impact.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
platform_x11 with dri3 needs inc_loader.
In file included from ../src/egl/drivers/dri2/platform_x11_dri3.c:35:0:
../src/egl/drivers/dri2/egl_dri2.h:41:32: fatal error: loader_dri3_helper.h: No such file or directory
In file included from ../src/egl/drivers/dri2/platform_x11.c:46:0:
../src/egl/drivers/dri2/egl_dri2.h:41:32: fatal error: loader_dri3_helper.h: No such file or directory
In file included from ../src/egl/drivers/dri2/egl_dri2.c:61:0:
../src/egl/drivers/dri2/egl_dri2.h:41:32: fatal error: loader_dri3_helper.h: No such file or directory
Cc: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
This results in better 16x and 8x quality when using these locations.
Verified with the piglit MSAA accuracy test.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Discovered by luck. Verified with the piglit MSAA accuracy test.
It also shows that the worst case EQAA 16s4f results in very good 4x MSAA
in the worst case.
Nine might not like these positions, but they are prettier to the eye and
GL doesn't care.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This can result in 2x increase in performance on non-harvested Kaveris.
v2: don't do it on radeon
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The configure.ac logic added in commit
2ef7f23820 ("configure: check if
-latomic is needed for __atomic_*") makes the assumption that if a
64-bit atomic intrinsic test program fails to link without -latomic,
it is because we must use -latomic.
Unfortunately, this is not completely correct: libatomic only appeared
in gcc 4.8, and therefore gcc versions before that will not have
libatomic, and therefore don't provide atomic intrinsics for all
architectures. This issue was for example encountered on PowerPC with
a gcc 4.7 toolchain, where the build fails with:
powerpc-ctng_e500v2-linux-gnuspe/bin/ld: cannot find -latomic
This commit aims at fixing that, by not assuming -latomic is
available. The commit re-organizes the atomic intrinsics detection as
follows:
(1) Test if a program using 64-bit atomic intrinsics links properly,
without -latomic. If this is the case, we have atomic intrinsics,
and we're good to go.
(2) If (1) has failed, then test to link the same program, but this
time with -latomic in LDFLAGS. If this is the case, then we have
atomic intrinsics, provided we link with -latomic.
This has been tested in three situations:
- On x86-64, where atomic instrinsics are all built-in, with no need
for libatomic. In this case, config.log contains:
GCC_ATOMIC_BUILTINS_SUPPORTED_FALSE='#'
GCC_ATOMIC_BUILTINS_SUPPORTED_TRUE=''
LIBATOMIC_LIBS=''
This means: atomic intrinsics are available, and we don't need to
link with libatomic.
- On NIOS2, where atomic intrinsics are available, but some of them
(64-bit ones) require using libatomic. In this case, config.log
contains:
GCC_ATOMIC_BUILTINS_SUPPORTED_FALSE='#'
GCC_ATOMIC_BUILTINS_SUPPORTED_TRUE=''
LIBATOMIC_LIBS='-latomic'
This means: atomic intrinsics are available, and we need to link
with libatomic.
- On PowerPC with an old gcc 4.7 toolchain, where 32-bit atomic
instrinsics are available, but not 64-bit atomic instrinsics, and
there is no libatomic. In this case, config.log contains:
GCC_ATOMIC_BUILTINS_SUPPORTED_FALSE=''
GCC_ATOMIC_BUILTINS_SUPPORTED_TRUE='#'
With means that atomic intrinsics are not usable.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
The vertex array Size and Stride attributes are now ubyte and short,
respectively. The glGet code needed to be updated to handle those
types, but wasn't.
Fixes the new piglit test gl-1.5-get-array-attribs test.
v2: fix inadvertant whitespace change, change COLOR_ARRAY_SIZE to UBYTE,
misc fixes suggested by Justin
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106450
Fixes: d5f42f96e1 ("mesa: shrink size of gl_array_attributes (v2)")
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
It's a per-application optimization, so it makes more sense
to do that in radv_handle_per_app_options().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The only remaining users of gl_vertex_array are tnl based
drivers. So move everything related to that into tnl and
rename it accordingly.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Only tnl based drivers still use this array. So remove it
from core mesa and use Array._DrawVAO instead.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
For now store binding and attrib in brw_vertex_element.
The i965 driver still provides lots of opportunity to make use
of the unique binding information in the VAO which is currently not
taken from the VAO.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Instead of playing with Array._DrawArrays, make the feedback draw
path use Array._DrawVAO. Also st_RasterPos needs to use the VAO then.
v2: Use helper methods to get the offset values for array and binding.
Update comments.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Finally make use of the binding information in the VAO when
setting up arrays for draw.
v2: Emit less relocations also for interleaved userspace arrays.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The input_to_index array is already available internally
when preparing vertex programs. Store the map in
struct st_vertex_program.
Also store the bitmask of mesa vertex processing inputs in
struct st_vp_variant.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Compute VAO buffer binding information past the position/generic0 mapping.
Scan for duplicate buffer bindings and collapse them into derived
effective buffer binding index and effective attribute mask variables.
Provide a set of helper functions to access the distilled
information in the VAO. All of them prefixed with _mesa_draw_...
to indicate that they are meant to query draw information.
v2: Also group user space arrays containing interleaved arrays.
Add _Eff*Offset to be copied on attribute and binding copy.
Update comments.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
This is needed for fixing CTS:
dEQP-GLES3.functional.occlusion_query.conservative*
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
If you have an indirect access to a constant buffer on r600/eg
use a vertex fetch in the shader. However apps have expected
behaviour on those out of bounds accessess (even if illegal).
If the constants were being uploaded as part of a larger
upload buffer, we'd set the range of allowed access to a lot
larger than required so apps would get values back from
other parts of the upload buffer instead of the expected out
of bounds access.
This fixes rendering bugs in Trine and Witcher 1, thanks
to iive for nagging me effectively until I figured it out :-)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91808
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
From the bspec docs for "Indirect State Pointers Disable":
"At the completion of the post-sync operation associated with this
pipe control packet, the indirect state pointers in the hardware are
considered invalid"
So the ISP disable is a post-sync type of operation which means that it
should be combined with a CS stall. Without this, the simulator throws
an error.
Fixes: 766d801ca "anv: emit pixel scoreboard stall before ISP disable"
Fixes: f536097f6 "i965: require pixel scoreboard stall prior to ISP disable"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Perhaps with a new version of autoconf, I began seeing:
| checking the name lister (/usr/bin/nm -B) interface... ./configure: line 6973: External.*some_variable: command not found
| BSD nm
This is because AC_PROG_NM expands to
...
if $GREP 'External.*some_variable' conftest.out > /dev/null; then
lt_cv_nm_interface="MS dumpbin"
fi
...
I'm not sure if it's a bug in AC_PROG_NM that it doesn't call
AC_PROG_GREP, but it's easy enough for us to do it.
Out of tree builds can try to write into a directory that doesn't exist yet:
| Traceback (most recent call last):
| File "../../../mesa-18.0.2/src/intel/vulkan/anv_icd.py", line 46, in <module>
| with open(args.out, 'w') as f:
| IOError: [Errno 2] No such file or directory: 'vulkan/intel_icd.x86_64.json'
| Makefile:4882: recipe for target 'vulkan/intel_icd.x86_64.json' failed
Add missing MKDIR_GEN calls to solve this.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We want to make sure that all indirect state data has been loaded into
the EUs before disable the pointers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Fixes: 78c125af39 ("anv/gen10: Ignore push constant packets during context restore.")
Invalidating the indirect state pointers might affect a previously
scheduled & still running 3DPRIMITIVE (causing page fault). So stall
on pixel scoreboard before that.
v2: Fix compile issue :(
v3: Stall on pixel scoreboard
v4: Drop the post sync operation (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Fixes: ca19ee33d7 ("i965/gen10: Ignore push constant packets during context restore.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106243
On CNL and above, CCS_E supports 1010102 formats and R11G11B10F. We had
shut them off during early enabling because blorp_copy couldn't handle
them. Now it can handle 1010102 formats so we can turn them back on.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
nir_format_bitcast_uint_vec_unmasked can only be used to cast between
formats with uniform channel sizes. In particular, it cannot handle
10_10_10_2 formats. By making use of the NIR helper for uint vector
casts, we should now be able to bitcast between any two uint formats so
long as their channels are in RGBA order (possibly with channels
missing). In order to do this we need to rework the key a bit to pass
the actual formats instead of just the number of bits in each.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This is a fairly direct port from blorp. The only real change is that
the nir_format_convert version doesn't assume that everything is a vec4.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This adds helpers to ISL to convert an isl_color_value to and from
binary data encoded with a given isl_format. The conversion is done
using ISL's built-in format introspection so it's fairly slow as format
conversions go but it should be fine for a single pixel value. In
particular, we can use this to convert clear colors.
As a side-effect, we now rely on the sRGB helpers in libmesautil so we
need to tweak the build system a bit. All prior uses of src/util in ISL
were header-only.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Alpha-only formats are just linear. There's no need to specially
deliminate them as being in their own colorspace.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Previously, blorp could only blit into something that was renderable.
Thanks to recent additions to blorp, it can now blit into basically
anything so long as it isn't compressed.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
BLORP has supported 16x MSAA for quite a while now, we just never
bothered to enable it for CopyTexSubImage.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that blorp handles all the cases, why not? The only real change we
have to make is to stop using anv_swizzle_for_render() in blorp_blit
because it doesn't work for B4G4R4A4 and blorp now natively handles that.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Previously we only supported UINT formats because that's what blorp_copy
required. If we want to use it in blorp_blit, however, we need to
support everything.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This commit adds support for the following formats as destination
formats even though the hardware does not support rendering to them:
- ISL_FORMAT_R24_UNORM_X8_TYPELESS
- ISL_FORMAT_A4B4G4R4_UNORM
- ISL_FORMAT_L8_UNORM_SRGB
- ISL_FORMAT_R9G9B9E5_SHAREDEXP
This is done by using a different format and emitting shader code to
fake it the rest of the way.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
nir_mask_shift_or is now defined in nir_format_convert.h so we can
delete the copy in blorp_blit.c.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This commit makes blorp capable of swizzling anything even on hardware
that doesn't support texture swizzle.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This helper encodes more details, specifically about Haswell, than the
previous asserts in isl_surface_state.c.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The previous version was sort-of strapped on in that it just adjusted
the blit rectangle and trusted in the fact that we would use texelFetch
and round to the nearest integer to ensure that the component positions
matched. This new version, while slightly more complicated, is more
accurate because all three components end up with exactly the same
dst_pos and so they will get interpolated and sampled at the same
texture coordinate. This makes the workaround suitable for using with
scaled blits.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
According to Vulkan spec:
"pColorBlendState is a pointer to an instance of the
VkPipelineColorBlendStateCreateInfo structure, and is ignored if the
pipeline has rasterization disabled or if the subpass of the render pass the
pipeline is created against does not use any color attachments."
Fixes tests from CL#2505:
dEQP-VK.renderpass.*.simple.color_unused_omit_blend_state
v2:
- Check that blend is not NULL before usage.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Just let validate_context_version() do it instead. This fixes
MESA_GL_VERSION_OVERRIDE for compat, it will also allow us to
enable new compat versions on a per driver bases in future.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This allows drivers to define what version of GLSL they support
in compat. This will be needed in order to support compat 3.2
without breaking drivers that wont support it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
All drivers that support GLSL will later set their default GLSL versions
overriding this override call. They currently all call
_mesa_override_glsl_version() again later in order to support overrides.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Just leave it as 0 and let the drivers set it (as they already do)
to avoid redundantly initialising it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is really useful when debugging any sort of buffer management
issues, so just printing it during INTEL_DEBUG=bat,submit seems
reasonable. With bat, we're already spamming so much output that
it doesn't really hurt. With submit, it's still easy to grep for
the older information, and the new information is nice too.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Now that we're using ISL, a good chunk of brw_emit_depthstencil is
pointless checks which ISL will do for us anyway. Since we only have
one manual depth buffer emit function, move the useful bits into it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We leave gen4-5 alone because the ISL code hasn't really been well-
tested on gen4-5 or with combined depth-stencil because we don't use
BLORP for depth operations on gen4-5. Also, the gen4-5 code has to deal
with intratile offsets for LOD hacks and ISL doesn't handle those yet.
We could make ISL handle gen4-5 capable or we could just not bother.
Among other things, this should make future platform enabling easier
because it means we don't have to update multiple (or hand-rolled!)
depth stencil emit paths.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The only reason why we had two atoms was that the one we used for gen7+
depended on _NEW_DEPTH and _NEW_STENCIL as well as _NEW_BUFFERS. Since
this is no longer true, we can combine them into one atom. We do add a
dependence on BRW_NEW_AUX_STATE but that should never get set on gen4-5
so adding it is a no-op for those platforms.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The hardware will AND these fields with the corresponding fields in
DEPTH_STENCIL_STATE so there's no real reason to toggle them on and off
based on state bits. This removes our reliance on the _NEW_DEPTH and
_NEW_STENCIL state bits and better matches what ISL does.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Certain things can change the aux usage or fast clear color of a depth
surface and we want to re-emit if that happens. For instance, if you do
a fast depth clear of an already clear depth surface, we will just set
the clear color and not do anything else. In that case, we could fail
to re-emit 3DSTATE_CLEAR_PARAMS and not get the new fast-clear color.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I forgot to change the assert in the second helper function in a
previous change.
This hit the assert() on a Broadwell platform with 1 slice, 3
subslices but all EUs disabled in subslice 1 & 2.
Fixes: c1900f5b0f ("intel: devinfo: add helper functions to fill fusing masks values")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
is_resource_supported returns if the combination of
target/internalformat is supported in at least one operation. Online
compression is only mandatory for glTexImage2D. Some formats doesn't
support online compression, but can be used in any case, with
glCompressed*D methods.
Without this commit, ETC2 internalformats were returning FALSE, even
for the drivers supporting it. So any other query (like
TEXTURE_COMPRESSED) was returning FALSE/NONE instead of the proper
value.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Chris recently fixed a bunch of genxml end < start bugs, as well as
booleans that are wider than a bit. These are way too easy to write, so
asserting that the fields are sane is a good plan.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
None of these are actually booleans. Tile Parameter is a tiling mode
enum. Display pipes take plane numbers. Predicate Enable has some
operations (and the default value of 6 was particular bogus).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Python's assert can take both a condition and a string, which will cause
it to print the string if the assertion trips. (You can't use parens as
that creates a tuple.) Doing "condition and string" works in C, but
doesn't have the desired effect in Python.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We used to only initialize BLORP on Gen6+. When we added it on Gen4-5,
we forgot to destroy it unconditionally.
Fixes: 752d7af77a (i965: Add blorp support for gen4-5)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Noticed while reviewing Tim Arceri's NIR inlining series.
Without his series:
instructions in affected programs: 16 -> 14 (-12.50%)
helped: 2
With his series:
instructions in affected programs: 196 -> 174 (-11.22%)
helped: 22
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When compiling with LLVM 6.0 on x86 (32-bit) for Android, the test
fails to detect that -latomic is actually required, as the atomic
call is inlined.
In the code itself (src/util/disk_cache.c), we see this pattern:
p_atomic_add(cache->size, - (uint64_t)size);
where cache->size is an uint64_t *, and results in the following
link time error without -latomic:
src/util/disk_cache.c:628: error: undefined reference to '__atomic_fetch_add_8'
Fix the configure/meson test to replicate this pattern, which then
correctly realizes the need for -latomic.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
This simplifies kflag initialization, by creating a bufmgr-wide setting
for initial kflags, and just applying it whenever we create a new BO.
This also properly allows 48-bit addresses for imported BOs (via prime
or flink), which I had missed in my earlier 48-bit support series.
This will be useful when adding softpin support, as we'll want to add
EXEC_OBJECT_PINNED to initial_kflags as well.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
A couple of typos found by inspecting field.end - field.start, revealed
a few wide integers declared as bool and some that ended before they
started.
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fix build error after llvm-7.0.0svn r330669 ("InstCombine: Fix layering
by not including Scalar.h in InstCombine").
CXX rasterizer/jitter/libmesaswr_la-blend_jit.lo
rasterizer/jitter/blend_jit.cpp:816:20: error: use of undeclared identifier 'createInstructionCombiningPass'; did you mean 'createInstructionSimplifierPass'?
passes.add(createInstructionCombiningPass());
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
createInstructionSimplifierPass
Suggested-by: George Kyriazis <george.kyriazis@intel.com>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
It is needed to destroy the v vector in scalar_argument
Fixes memory leaks on parameter set/bind.
v2: Drop redundant sclara_argument destructor
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This rollbacks the revert of this same patch introduced in
commit 7b9c15628a.
And also squahes the following patch to prevent a piglit regression caused
by this change:
intel/compiler: Fix lower_conversions for 8-bit types.
Author: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
For 8-bit types the execution type is word. A byte raw MOV has 16-bit
execution type and 8-bit destination and it shouldn't be considered
a conversion case. So there is no need to change alignment and enter
in lower_conversions for these instructions.
Fixes a regresion in the piglit test "glsl-fs-shader-stencil-export"
that is introduced with this patch from the Vulkan shaderInt16 series:
'i965/compiler: handle conversion to smaller type in the lowering
pass for that'. The problem is caused because there is already a case
in the driver that injects Byte instructions like this:
mov(8) g127<1>UB g2<32,8,4>UB
And the aforementioned pass was not accounting for the special
handling of the execution size of Byte instructions. This patch
fixes this.
v2: (Jason Ekstrand)
- Simplify is_byte_raw_mov, include reference to PRM and not
consider B <-> UB conversions as raw movs.
v3: (Matt Turner)
- Indentation style fixes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106393
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
These are subject to the general restriction that anything that is converted
to 64-bit needs to be aligned to 64-bit. We had this already in place for
32-bit to 64-bit conversions, so this patch generalizes the implementation
to take effect on any conversion to 64-bit from a source smaller than
64-bit.
Fixes assembly validation errors in the following CTS tests in BSW:
dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_int64
dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint16_to_uint64
dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_uint64
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Remove the need of converting values that are documented in
hexadecimal. This patch would allow writing
<field name="3D Command Sub Opcode" ... default="0x1B"/>
instead of
<field name="3D Command Sub Opcode" ... default="27"/>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Gallium drivers use _mesa_remove_output_reads() via st_program to lower
output reads away. It seems better to just generate the right thing in
the first place.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Make sure that clamping in the pixel transfer operations is enabled/disabled
for packed floating point values just like it is done for single normal and
half precision floating point values.
This fixes a series of CTS tests with virgl that use r11f_g11f_b10f
buffers as target, and where virglrenderer reads these surfaces back
using the format GL_UNSIGNED_INT_10F_11F_11F_REV.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fix build error after llvm-7.0svn r325155 ("Pass a reference to a module
to the bitcode writer.").
CXX rasterizer/jitter/libmesaswr_la-JitManager.lo
rasterizer/jitter/JitManager.cpp:548:30: error: reference to type 'const llvm::Module' could not bind to an lvalue of type 'const llvm::Module *'
llvm::WriteBitcodeToFile(M, bitcodeStream);
^
Suggested-by: George Kyriazis <george.kyriazis@intel.com>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
Similar to swap_available path send invalidate to the driver because
egl/X11 is not watching for for server's invalidate events. The
dri2_copy_region path is trigerred when server supports DRI2 version
minor 1.
Tested with piglit egl tests for regression.
V2: Move invalidate from dri2_copy_region to swap_buffer common.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
According to EGL 1.4 spec, section 3.5.1 ("Creating On-Screen Rendering
Surfaces"), if config does not support the colorspace or alpha format
attributes specified in attrib_list (as defined for
eglCreateWindowSurface), an EGL_BAD_MATCH error is generated.
This fixes dEQP-EGL.functional.wide_color.*_888_colorspace_srgb (still
not merged,
https://android-review.googlesource.com/c/platform/external/deqp/+/667322),
which is crashing when trying to create a windows surface with RGB888
configuration and sRGB colorspace.
v2: Handle the fix in other backends (Tapani)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
With 16-bit support we can now do 32-bit packing, a follow-up patch will
rename the pass to something more generic.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Noitice that we don't need 'split' versions of the 64-bit to / from
16-bit opcodes which we require during pack lowering to implement these
operations. This is because these operations can be expressed as a
collection of 32-bit from / to 16-bit and 64-bit to / from 32-bit
operations, so we don't need new opcodes specifically for them.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
NIR assumes that booleans are always 32-bit, but Intel hardware produces
16-bit booleans for 16-bit comparisons. This means that we need to convert
the 16-bit result to 32-bit.
In the future we want to add an optimization pass to clean this up and
hopefully remove the conversions.
v2 (Jason): use the type of the source for the temporary and use
brw_reg_type_from_bit_size for the conversion to 32-bit.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
These are not supported in hardware for 16-bit integers.
We do the lowering pass after the optimization loop to ensure that we
lower ALU operations injected by algebraic optimizations too.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Not all bit-sizes may be supported natively in hardware for all operations.
This pass allows drivers to lower such operations to a bit-size that is
actually supported and then converts the result back to the original
bit-size.
Compiler backends control which operations and wich bit-sizes require
the lowering through a callback function.
v2: generalize this pass and make it available in NIR core (Rob, Jason)
v3: remove some temporaries and reduce nesting in instruction loop using
a continue statement (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
16-bit immediates need to replicate the 16-bit immediate value
in both words of the 32-bit value. This needs to be careful
to avoid sign-extension, which the previous implementation was
not handling properly.
For example, with the previous implementation, storing the value
-3 would generate imm.d = 0xfffffffd due to signed integer sign
extension, which is not correct. Instead, we should cast to
uint16_t, which gives us the correct result: imm.ud = 0xfffdfffd.
We only had a couple of cases hitting this path in the driver
until now, one with value -1, which would work since all bits are
one in this case, and another with value -2 in brw_clip_tri(),
which would hit the aforementioned issue (this case only affects
gen4 although we are not aware of whether this was causing an
actual bug somewhere).
v2: Make explicit uint32_t casting for left shift (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "18.0 18.1" <mesa-stable@lists.freedesktop.org>
From Intel Skylake PRM, vol 07, "Immediate" section (page 768):
"For a word, unsigned word, or half-float immediate data,
software must replicate the same 16-bit immediate value to both
the lower word and the high word of the 32-bit immediate field
in a GEN instruction."
This fixes the int16/uint16 negate and abs immediates that weren't
taking into account the replication in lower and upper words.
v2: Integer cases are different to Float cases. (Jason Ekstrand)
Included reference to PRM (Jose Maria Casanova)
v3: Make explicit uint32_t casting for left shift (Jason Ekstrand)
Split half float implementation. (Jason Ekstrand)
Fix brw_abs_immediate (Jose Maria Casanova)
Cc: "18.0 18.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The lowering pass was specialized to act on 64-bit to 32-bit conversions only,
but the implementation is valid for other cases.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We need to use 16-bit constants with 16-bit instructions,
otherwise we get the following validation error:
"Destination stride must be equal to the ratio of the sizes of
the execution data type to the destination type"
Because the execution data type is 4B due to the 32-bit integer
constant.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Always enable use of HW logical contexts to preserve GPU state between
batches when the kernel supports such constructs, continuing to enforce
the required support for gen6+.
At runtime, this effectively removes the BRW_NEW_CONTEXT flag (and the
upload of invariant state) from the start of every batch for any kernel
supporting contexts. So long as the older atoms are correctly listening
to the right flag (NEW_CONTEXT rather than NEW_BATCH) this should
eliminate a few redundant state uploads for the older platforms.
No piglits were harmed on ctg and ilk, both with and without logical
contexts.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This behaviour was changed in 1e5b09f42f. The commit message
for that says it is just a “tidy up” so my assumption is that the
behaviour change was a mistake. It’s a little hard to decipher looking
at the diff, but the previous code before that patch was:
if (builtin == SpvBuiltInFragCoord || builtin == SpvBuiltInSamplePosition)
nir_var->data.origin_upper_left = b->origin_upper_left;
if (builtin == SpvBuiltInFragCoord)
nir_var->data.pixel_center_integer = b->pixel_center_integer;
After the patch the code was:
case SpvBuiltInSamplePosition:
nir_var->data.origin_upper_left = b->origin_upper_left;
/* fallthrough */
case SpvBuiltInFragCoord:
nir_var->data.pixel_center_integer = b->pixel_center_integer;
break;
Before the patch origin_upper_left affected both builtins and
pixel_center_integer only affected FragCoord. After the patch
origin_upper_left only affects SamplePosition and pixel_center_integer
affects both variables.
This patch tries to restore the previous behaviour by changing the
code to:
case SpvBuiltInFragCoord:
nir_var->data.pixel_center_integer = b->pixel_center_integer;
/* fallthrough */
case SpvBuiltInSamplePosition:
nir_var->data.origin_upper_left = b->origin_upper_left;
break;
This change will be important for ARB_gl_spirv which is meant to
support OriginLowerLeft.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes: 1e5b09f42f "spirv: Tidy some repeated if checks..."
SPIR-V allows to define the shift, offset and count operands for
shift and bitfield opcodes with a bit-size different than 32 bits,
but in NIR the opcodes have that limitation. As agreed in the
mailing list, this patch adds a conversion to 32 bits to fix this.
For more info, see:
https://lists.freedesktop.org/archives/mesa-dev/2018-April/193026.html
v2:
- src_bit_size will have zero value for variable bit-size operands (Jason).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Apparently the somewhere between 1.1.70 and 1.1.73 the loader started
depending on this. The loader then creates a 1.0 instance, which gets
into funny situation because we have a 1.1 device.
No idea how to do line wrapping in Mako though, my random guesses
did not work.
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Initially, I didn't understand this feature. Turns out that all it
means is that you can switch multisample rates in the middle of a
zero-attachment subpass. We've been able to do this since forever.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously before fb077b0728, the LOD parameter was being used in place of the
sample index, which would only copy the first sample to all samples in the
destination image. After that multisample image copies wouldn't copy anything
from my observations.
This fixes some copy_and_blit CTS tests.
v3.1: - set lod to 0 for nir_txf_ms (Samuel)
v2: - use GLSL_SAMPLER_DIM_MS instead of 2D (Samuel)
- updated commit description (Samuel)
Fix this properly by copying each sample in a separate radv_CmdDraw and using a
pipeline with the correct rasterizationSamples for the destination image.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
First, this was iterating over the 3DSTATE_CONSTANT_* instruction
but trying to process fields of the 3DSTATE_CONSTANT_BODY substructure.
Secondly, the fields have been called Buffer[0] and Read Length[0],
for a while now, and we were not handling the subscripts correctly.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
With the new callback, Jason's newer batch decoder infrastructure
should be able to do just as well as the old open coded INTEL_DEBUG=bat
handling, with much less code. If there are any limitations, we'd like
to improve the common code rather than doing one-off hacks here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Given an arbitrary batch, we don't always know what the size of certain
things are, such as how many entries are in a binding table. But it's
easy for the driver to track that information, so with a simple callback
we can calculate this correctly for INTEL_DEBUG=bat.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Making these part of libintel_common allows us to use them in the DRI
driver. The standalone tool binaries already link against the common
library, too, so it's no harder for them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This unfortunately makes it malloc/realloc on every new batch, rather
than once at startup. But it ensures that the shadow buffer's size will
absolutely match the BO size. Otherwise, as we tune BATCH_SZ/STATE_SZ
or bufmgr cache bucket sizes, we may get a BO size that's rounded up,
and fail to allocate the shadow buffer large enough.
This doesn't fix any bugs today, as BATCH_SZ/STATE_SZ are the size of
a cache bucket, but it's better to be safe than sorry.
Reported-by: James Xiong <james.xiong@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The gen_field_iterator only iterates the fields of a given gen_group.
If we want to iterate the fields of another gen_group contained as
field, we need to do it manually.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Struct fields might span several dwords, but iter_dword is incremented
up to the last dword of the current field before we print out the
struct's fields. We can't use iter_dword for computing the offset into
the pointer of data to decode.
v2: Fix displayed offset number (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
<register> & <struct> elements always have fixed length. The
get_length() method implies that we're dealing with an instruction in
which the length is encoded into the variable data but the field
iterator uses it without checking what kind of gen_group it is dealing
with.
Let's make get_length() report the correct length regardless of the
gen_group (register, struct or instruction).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The Vertex Elements are now:
* VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <DrawID, is-indexed-draw, 0, 0>
VE1 is it kept as it was before, VE2 additionally contains the new
system value.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This VS system value contains if the draw command used to start the
rendering was an indexed draw command or a non-indexed one
(~0/0 respectively). Useful to calculate the gl_BaseVertex as:
(SYSTEM_VALUE_IS_INDEXED_DRAW & SYSTEM_VALUE_FIRST_VERTEX).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
As the implementation is conservative, we can now enable it
by default. It can be disabled with RADV_DEBUG=nooutoforder.
Don't expect much more than 1% of improvements, but the gain
seems consistent.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Both the internal documentation and the results of testing this in the
CI suggest that this is unnecessary. Add the fixes tag because this
reduces an internal benchmark's startup time by about 17 seconds
(reported by Eero).
Fixes: 710b1d2e66 "i965/tex_image: Flush certain subnormal ASTC channel values"
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This is needed for gfx9 and later for all fmask surface index.
(Mentioned by Marek on irc)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Subpixel precision bias, dilation and the post-snap mode are supported on
GM200 and newer. The pre-snap mode is supported for triangle primitives on
GP100.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Although the specs are written against compatibility GL 4.3 and allows core
profile and GLES2+, it is exposed for GL 1.0+ and GLES1 and GLES2+.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
To silence warnings about unhandled switch values.
Untested otherwise.
v2: move the INT/UINT8 cases after the INT/UINT16 cases, per Eric.
Reviewed-by: Eric Anholt <eric@anholt.net>
With this we should have no passes in src/compiler/nir with any
dependencies on headers from core GL Mesa.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
A bo's ref_count was not being initialized when imported from an fd.
Therefore, we would fail to free the resource during VkFreeMemory().
This patch fixes applications like hifi VR in threaded mode, which
perform frequent imports/releases of IPC shared memory.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Similar to the transformation applied to linear_to_ytiled, also align
each readback from the ytiled source to a cacheline (i.e. transfer a
whole cacheline from the source before moving on to the next column).
This will allow us to utilize movntqda (_mm_stream_si128) in a
subsequent patch to obtain near WB readback performance when accessing
the uncached ytiled memory, an order of magnitude improvement.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When mapping a region of the mipmap_tree, record which complementary
method to use to unmap it afterwards. By doing so we can avoid
duplicating the decision tree used when mapping and thereby eliminate
trivial errors that can be introduced if the two if-chains become out of
sync.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If loading 64-bit vec3 values, a 4 component load would be followed
by a 2 component load and the resulting shuffle would fail as it
requires 2 4 components. This just expands the second results
vector out to 4 components.
This fixes 100 CTS tests:
dEQP-VK.spirv_assembly.type.vec3.*64*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We want to flag EXEC_OBJECT_CAPTURE, but we ought to preserve any
existing kflags. Today, there are none (as the program cache doesn't
support 48-bit addressing), but once we start using softpin, we'll
need to preserve EXEC_OBJECT_PINNED.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We were trying to mark batch buffers with EXEC_OBJECT_CAPTURE, and
accidentally stomped EXEC_OBJECT_SUPPORTS_48B_ADDRESS in the process.
There's no reason to restrict batch buffers to the lower 4GB.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The previous logic of the supports_48b_addresses wasn't actually
checking if i915.ko was running with full_48bit_ppgtt. The ENOENT
it was checking for was actually coming from the invalid context
id provided in the test execbuffer. There is no path in the
kernel driver where the presence of
EXEC_OBJECT_SUPPORTS_48B_ADDRESS leads to an error.
Instead, check the default context's GTT_SIZE param for a value
greater than 4 GiB
v2 (Ken): Fix in i965 as well.
v3 Check GTT_SIZE instead of HAS_ALIASING_PPGTT (Chris Wilson)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The blit here involves scaling since it's copying from I8 format to R8G8 format.
Half of source will be filtered out with PIPE_TEX_FILTER_NEAREST instruction, it
looks that GPU always uses the second half as source. Currently we use "1" as
the start point of x for R, then causing 1 source pixel of U component shift to
right. So "-1" should be the start point for U component.
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The compiler queue was limited to 3 threads, so shader-db running
on a 16-thread CPU would have a bottleneck on the 3-thread queue.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It will contain more variables.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reduce swizzle constraints to the ALPHA_IS_ON_MSB constraint and the clear
value of 1.
This significantly changes the DCC fast clear code, and fixes fast clear
for RGB formats without alpha.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2: require the previous level to be clearable for determining whether
the last unaligned level is clearable
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Use specific avx512 selection mechanism based on avx512er bit instead of
getHostCPUName(). LLVM 6.0.0 has a bug that reports wrong string for KNL
(fixed in 6.0.1).
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Sign-extend integer types to 32bit when specifying "%d" and add new %u
which zero-extends to 32bit. Improves printing of sub 32bit integer types
(i1 specifically).
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
However only if the file exists in DEBUG_OUTPUT_DIR. The expectation is
that AR rasterizerLauncher will start placing it there when launching
a workload (which is in a subsequent checkin)
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
- 64-bit cvt-to-float needs to be explicitly handled
- gathers need the right parameter types to work with doubles
Fixes draw-vertices piglit tests
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Previously there was a special target that blocked for the generation of
anv_entrypoints.h, with meson 0.44 we don't need this, we can use a new
language feature instead. The problem is that previously that blocking
target would hide a race condition for the generation of another header,
anv_extensions.h. Now the build sometimes fails when anv_extensions.h is
not generated in time.
v2: - clarify the race condition in the commit message (Emil)
CC: Mark Janes <mark.a.janes@intel.com>
Fixes: 92550d9b16
("meson: remove workaround for custom target creating .h and .c files")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This patch enables the Vulkan driver on Ice Lake h/w
with added warning about preliminary support.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
It would be nice to share the flags packet emit logic with flat shade
flags, but I couldn't come up with a good way while still using our pack
macros. We need to refactor this to shader record setup at compile time,
anyway.
Fixes ext_framebuffer_multisample-interpolation * centroid-*
This should fix help with intermittent GPU hangs in tests switching
formats while rendering small frames. Unfortunately, it didn't help with
the tests I'm having troubles with.
The next shader gets to start writing the register file during these
slots, so make sure we don't stomp over them.
The only case of hitting this that I could imagine would be dead writes.
GLES's GL_EXT_texture_type_2_10_10_10_REV allows uploading this type to an
unsized internalformat, and it should be non-color-renderable.
fbobject.c's implementation of the check for color-renderable is checks
that the texture has a 2101010 mesa format, so make sure that we have
chosen a 2101010 format so that check can do what it meant to.
Fixes KHR-GLES3.packed_pixels.pbo_rectangle.rgb on vc5.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
With this patch, _ElementSize is initialized along with the rest
of the vertex array attributes in new_draw_rasterpos_stage().
This fixes a crash in st_pipe_vertex_format() when running
topogun-1.06-orc-84k-resize trace file with VMware svga driver.
Reviewed-by: Brian Paul <brianp@vmware.com>
VASurfaceAttribExternalBuffers.pitches is indexed by
plane. Current implementation only supports single plane layout.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Previous bit-fields assignments are incorrect and will result certain mpeg4
decode failed due to wrong flag values. This patch fixes these assignments.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
num_channels has been introduced since "ac/surface: don't set
the display flag for obviously unsupported cases".
Based on RadeonSI.
Fixes: e29facff31 ("ac/surface: don't set the display flag for obviously unsupported cases (v2)")
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
dcc_msaa_allowed is always false on GFX9+ and only true on VI
if RADV_PERFTEST=dccmsaa is set. This means DCC was disabled
in some situations where it should not.
This is likely going to fix a performance regression.
Fixes: 2f63b3dd09 ("radv: enable DCC for MSAA 2x textures on VI under an option")
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For all of the OpFOrd* comparisons except OpFOrdNotEqual the hardware
should probably already return false if one of the operands is NaN so
we don’t need to have an explicit check for it. This seems to at least
work on Intel hardware. This should reduce the number of instructions
generated for the most common comparisons.
For what it’s worth, the original code to handle this was added in
e062eb6415. The commit message for that says that it was to fix
some CTS tests for OpFUnord* opcodes. Even if the hardware doesn’t
handle NaNs this patch shouldn’t affect those tests. At any rate they
have since been moved out of the mustpass list. Incidentally those
tests fail on the nvidia proprietary driver so it doesn’t seem like
handling NaNs correctly is a priority.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The driver may have a reference on the separate stencil buffer for some
reason (like an unflushed job using it), so we can't directly free the
resource and should instead just decrement the refcount that we own.
Fixes double-free in KHR-GLES3.packed_depth_stencil.blit.depth32f_stencil8
on vc5.
Fixes: e94eb5e600 ("gallium/util: add u_transfer_helper")
Reviewed-by: Rob Clark <robdclark@gmail.com>
The internal-type-bpp path is for surfaces that get stored in the raw TLB
format. For 4.x, we're storing MSAA as just 2x width/height at the
original format.
Patch enables use of short and unsigned short data for texture uploads,
rendering and reading of framebuffers within the restrictions specified
in GL_EXT_texture_norm16 spec.
Patch also enables those 16bit format layout qualifiers listed in
GL_NV_image_formats that depend on EXT_texture_norm16.
v2: expose extension with dummy_true
fix layout qualifier map changes (Ilia Mirkin)
v3: use _mesa_has_EXT_texture_norm16, other fixes
and cleanup (Ilia Mirkin)
v4: fix rest of the issues found
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The logic was flawed, since mul(x,y) will be <= 0 (exactly 0) when
the sign is the same but both numbers are sufficiently small
(if the product is smaller than 2^-128).
This could apparently lead to emitting a sufficient amount of
additional bogus vertices to overflow the allocated array for them,
hitting an assertion (still safe with release builds since we just
aborted clipping after the assertion in this case - I'm however unsure
if this is now really no longer possible, so that code stays).
Not sure if the additional vertices could cause other grief, I didn't
see anything wrong even when hitting the assertion.
Essentially, both +-0 are treated as positive (the vertex is considered
to be inside the clip volume for this plane), so integrate the logic
determining different sign into the branch there.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Simplifies the logic when to emit null tris (albeit the reasons why we
have to do this remain unclear).
This is strictly just logic simplification, the behavior doesn't change
at all.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Some analysis suggests that all short immediates are sign-extended. The
insnCanLoad logic already accounted for this, but we could still pick
the wrong form when emitting actual instructions that support both short
and long immediates (with the long form usually having additional
restrictions that insnCanLoad should be aware of).
This also reverses a bunch of commits that had previously "worked
around" this issue in various emitters:
9c63224540: gm107/ir: make use of ADD32I for all immediates
83a4f28dc2: gm107/ir: make use of LOP32I for all immediates
b84c97587b: gm107/ir: make use of IMUL32I for all immediates
d30768025a: gk110/ir: make use of IMUL32I for all immediates
as well as the original import for UMUL in the nvc0 emitter.
Reported-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <kherbst@redhat.com>
When draw buffers are changed on a bound framebuffer, DrawBufferAllocate()
hook should be called. However, it is missing in update_framebuffer with
window-system framebuffer, in which FB's draw buffer state should match
context state, potentially resulting in a change.
Note: This is needed because gallium delays creating the front buffer,
i965 works fine without this change.
V2 (Timothy Arceri):
- Rebased on merged/simplified DrawBuffer driver function
- Move DrawBuffer call outside fb->ColorDrawBuffer[0] !=
ctx->Color.DrawBuffer[0] check to make piglit pass.
v3 (Timothy Arceri):
- Call new DrawBuffaerAllocate() driver function.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (v2)
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99116
Unlike some of the classic drivers the st was only using DrawBuffer()
to allocated some buffers on-demand. Creating a separate function
will allow us to call it from update_framebuffer() in the following
patch without regressing some of the older classic drivers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Because I clearly wasn't thinking and clearly didn't do a good job
testing. Sigh
Fixes: c5a97d658e
("meson: fix builds against LLVM built without rtti")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This option type is nice since it involves less converting strings into
lists, and because it validates the values that are provided.
v2: - Set with_any_vk to true if any vulkan driver is built (Eric)
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Building without rtti is a frought with peril, but it's something that
autotools supports so we need to support it too.
Since we've moved to version 0.44 as a whole we can use the meson
functionality for accessing random llvm-config options we can check for
rtti and add -fno-rtti to all C++ code accordingly.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
meson has gotten pretty smart about tracking C and C++ dependencies
(internal and external), and using the right linker. This wasn't always
the case and we created empty c++ files to force the use of the c++
linker. We don't need that any more.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
meson used to get grumpy if the sources list was empty, even when using
--whole-archive (link_whole). In more recent versions that's not true,
so remove the workaround.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
In more modern versions of meson a custom_target returns an index-able
object. This allows us to create accurate dependency models for targets
that rely only on the header and not on the code from anv_entrypoints.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We have already required 0.44 for building clover and swr, so it was
already partially required. This just makes it required across the board
instead of just for clover and swr.
There is a bug in 0.44 which makes it impossible to build mesa in some
configurations, so require 0.44.1 which fixes this.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This one's completely my fault, I didn't do good enough testing after
rebasing and this got missed.
Fixes: d28c246501
("meson: build graw tests")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we have an option to turn test building on and off, we should
honor that.
Fixes: 34cb4d0ebc
("meson: build tests for gallium mesa state tracker")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since mesa_classic is build-on-demand the tests will create a demand and
add a bunch of extra compilation.
Fixes: 43a6e84927
("meson: build mesa test.")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The paths which sample with the clear color are now using a getter which
performs the sRGB decode needed to enable this fast clear.
This path can be exercised by fast-clearing a texture, then performing
an operation which requires sRGB decoding. Test coverage for this
feature is provided with the following tests:
* Shader texture calls:
- spec@ext_texture_srgb@tex-srgb
* Shader texelfetch calls:
- spec@arb_framebuffer_srgb@fbo-fast-clear
- spec@arb_framebuffer_srgb@msaa-fast-clear
* Blending:
- spec@arb_framebuffer_srgb@arb_framebuffer_srgb-fast-clear-blend
* Blitting:
- spec@arb_framebuffer_srgb@blit texture srgb msaa enabled clear
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It returns both the inline clear color and a clear address which points
to the indirect clear color buffer (or NULL if unused/non-existent).
This getter allows CNL to sample from fast-cleared sRGB textures
correctly by doing the needed sRGB-decode on the clear color (inline)
and making the indirect clear color buffer unused.
v2 (Rafael):
* Have a more detailed commit message.
* Add a comment on the sRGB conversion process.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We want to add and use a getter that turns off the indirect path by
returning zero for the clear color bo and offset.
v2: Fix usage of "clear address" in commit message (Jason).
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We want to add and use a function that accesses the auxiliary buffer's
clear_color_bo and doesn't care if it has an MCS or HiZ buffer
specifically.
v2 (Jason Ekstrand):
* Drop intel_miptree_get_aux_buffer().
* Mention CCS in the aux_buf field.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Make the next patch easier to read by eliminating most of the would-be
duplicate field accesses now.
v2: Update the HiZ comment instead of deleting it (Rafael).
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Imad needs to set a read barrier.
With significant big work groups I was getting wrong results for div u32. Turns
out the issue was with the sched opcodes.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual void vec4_instruction_scheduler::count_reads_remaining(backend_instruction*)’:
src/intel/compiler/brw_schedule_instructions.cpp:764:72: warning: unused parameter ‘be’ [-Wunused-parameter]
vec4_instruction_scheduler::count_reads_remaining(backend_instruction *be)
^~
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual void vec4_instruction_scheduler::setup_liveness(cfg_t*)’:
src/intel/compiler/brw_schedule_instructions.cpp:769:51: warning: unused parameter ‘cfg’ [-Wunused-parameter]
vec4_instruction_scheduler::setup_liveness(cfg_t *cfg)
^~~
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual void vec4_instruction_scheduler::update_register_pressure(backend_instruction*)’:
src/intel/compiler/brw_schedule_instructions.cpp:774:75: warning: unused parameter ‘be’ [-Wunused-parameter]
vec4_instruction_scheduler::update_register_pressure(backend_instruction *be)
^~
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual int vec4_instruction_scheduler::get_register_pressure_benefit(backend_instruction*)’:
src/intel/compiler/brw_schedule_instructions.cpp:779:80: warning: unused parameter ‘be’ [-Wunused-parameter]
vec4_instruction_scheduler::get_register_pressure_benefit(backend_instruction *be)
^~
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual int vec4_instruction_scheduler::issue_time(backend_instruction*)’:
src/intel/compiler/brw_schedule_instructions.cpp:1550:61: warning: unused parameter ‘inst’ [-Wunused-parameter]
vec4_instruction_scheduler::issue_time(backend_instruction *inst)
^~~~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Since all of the fs_generator::generate_foo methods take a fs_inst * as
the first parameter, just remove the name to quiet the compiler.
src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_barrier(fs_inst*, brw_reg)’:
src/intel/compiler/brw_fs_generator.cpp:743:41: warning: unused parameter ‘inst’ [-Wunused-parameter]
fs_generator::generate_barrier(fs_inst *inst, struct brw_reg src)
^~~~
src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_discard_jump(fs_inst*)’:
src/intel/compiler/brw_fs_generator.cpp:1326:46: warning: unused parameter ‘inst’ [-Wunused-parameter]
fs_generator::generate_discard_jump(fs_inst *inst)
^~~~
src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_pack_half_2x16_split(fs_inst*, brw_reg, brw_reg, brw_reg)’:
src/intel/compiler/brw_fs_generator.cpp:1675:54: warning: unused parameter ‘inst’ [-Wunused-parameter]
fs_generator::generate_pack_half_2x16_split(fs_inst *inst,
^~~~
src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_shader_time_add(fs_inst*, brw_reg, brw_reg, brw_reg)’:
src/intel/compiler/brw_fs_generator.cpp:1743:49: warning: unused parameter ‘inst’ [-Wunused-parameter]
fs_generator::generate_shader_time_add(fs_inst *inst,
^~~~
src/intel/compiler/brw_vec4_generator.cpp: In function ‘void generate_set_simd4x2_header_gen9(brw_codegen*, brw::vec4_instruction*, brw_reg)’:
src/intel/compiler/brw_vec4_generator.cpp:1412:52: warning: unused parameter ‘inst’ [-Wunused-parameter]
vec4_instruction *inst,
^~~~
src/intel/compiler/brw_vec4_generator.cpp: In function ‘void generate_mov_indirect(brw_codegen*, brw::vec4_instruction*, brw_reg, brw_reg, brw_reg, brw_reg)’:
src/intel/compiler/brw_vec4_generator.cpp:1430:41: warning: unused parameter ‘inst’ [-Wunused-parameter]
vec4_instruction *inst,
^~~~
src/intel/compiler/brw_vec4_generator.cpp:1432:63: warning: unused parameter ‘length’ [-Wunused-parameter]
struct brw_reg indirect, struct brw_reg length)
^~~~~~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Exposing the visual makes following dEQP tests pass on Android:
dEQP-EGL.functional.wide_color.window_8888_colorspace_srgb
dEQP-EGL.functional.wide_color.pbuffer_8888_colorspace_srgb
Visual is exposed only when DRI_LOADER_CAP_RGBA_ORDERING is set.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add format definition and required plumbing to create images.
Note that there is no match to drm_fourcc definition, just like
with existing _DRI_IMAGE_FOURCC_SARGB8888.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If we dump the bitcode for off-line debug purposes, we really want the
pre-optimized bitcode, otherwise it's useless in identifying problems
with IR optimization (if you have a shader which takes an hour to do
IR optimization, it's also nice you don't have to wait that hour...).
Also, print out the function passes for opt which correspond to what
was used for jit compilation (and also the opt level for codegen).
Using opt/llc this way should then pretty much mimic what was done
for jit. (When specifying something like -time-passes
-debug-pass=[Structure|Arguments] (for either opt or llc) that also
gives very useful information in which passes all the time was spent,
and which passes are really run along with the order - llvm will add
passes due to dependencies on its own, and of course -O2 for llc
comes with a ~100 pass list.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
LICM is simply too expensive, even though it presumably can help quite
a bit in some cases.
It was definitely cheaper in llvm 3.3, though as far as I can tell with
llvm 3.3 it failed to do anything in most cases. early-cse also actually
seems to cause licm to be able to move things when it previously couldn't,
which causes noticeable compile time increases.
There's more loop passes in llvm, but I'm not sure which ones are helpful,
and I couldn't find anything which would roughly do what the old licm in
llvm 3.3 did, so ditch it.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This pass is quite cheap, and can simplify the IR quite a bit for our
generated IR.
In particular on a variety of shaders I've found the time saved by
other passes due to the simplified IR more than makes up for the cost
of this pass, and on top of that the end result is actually better.
The only downside I've found is this enables the LICM pass to move some
things out of the main shader loop (in the case I've seen, instanced
vertex fetch (which is constant within the jit shader) plus the derived
instructions in the shader) which it couldn't do before for some reason.
This would actually be desirable but can increase compile time
considerably (licm seems to have considerable cost when it actually can
move things out of loops, due to alias analysis). But blaming early cse
for this seems inappropriate. (Note that the first two sroa / earlycse
passes are similar to what a standard llvm opt -O1/-O2 pipeline would
do, albeit this has some more passes even before but I don't think
they'd do much for us.)
It also in particular helps some crazy shader used for driver
verification (don't ask...) a lot (about factor of 6 faster in compile
time) (due to simplfiying the ir before LICM is run).
While here, also move licm behind simplifycfg. For some shaders there
seems to be very significant compile time gains (we've seen a factor
of 10000 albeit that was a really crazy shader you'd certainly never
see in a real app), beause LICM is quite expensive and there's cases
where running simplifycfg (along with sroa and early-cse) before licm
reduces IR complexity significantly. (I'm not entirely sure if it would
make sense to also run it afterwards.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
GLSL 4.6 spec describes hex constant as:
hexadecimal-constant:
0x hexadecimal-digit
0X hexadecimal-digit
hexadecimal-constant hexadecimal-digit
Right now if you have a shader with the following structure:
#if 0X1 // or any hex number with the 0X prefix
// some code
#endif
the code between #if and #endif gets removed because the checking is performed
only for "0x" prefix which results in strtoll being called with the base 8 and
after encountering the 'X' char the strtoll returns 0. Letting strtoll detect
the base makes this limitation go away and also makes code easier to read.
From the strtoll Linux man page:
"If base is zero or 16, the string may then include a "0x" prefix, and the
number will be read in base 16; otherwise, a zero base is taken as 10 (decimal)
unless the next character is '0', in which case it is taken as 8 (octal)."
This matches the behaviour in the GLSL spec.
This patch also adds a test for uppercase hex prefix.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This follows what radeonsi does.
Ported from radeonsi:
radeonsi: emit PA_SC_RASTER_CONFIG_1 only once
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We can just unreachable here, this aligns with radv code, makes
it easier to move to common code.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Missed this on initial radeonsi port, we shouldn't use this value
on gfx9, but also in gfx8 only for when we have a geom shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
They are send messages and this makes size_read() and mlen agree. For
both of these opcodes, the payload is just a dummy so mlen == 1 and this
should decrease register pressure a bit.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: mesa-stable@lists.freedesktop.org
This fixes crashes for the following CTS:
dEQP-VK.glsl.texture_functions.query.texturequerylod.*
Cubemaps are the same as 2D arrays.
Fixes: 625dcbbc45 ("amd/common: pass address components individually to
ac_build_image_intrinsic")
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The combinaison of GPA/MDAPI components expects a particular name &
layout for their pipeline statistics query.
v2: Limit the query GPA/MDAPI statistics to gen7->9 (Lionel)
v3: Add curly braces (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The INTEL_performance_query extension provides a list of queries that
a user can select to monitor a particular workload. Each query reports
different sets of counters (roughly looking at different parts of the
hardware, i.e. caches/fixed functions/etc...).
Each query has an associated configuration that we need to program
into the hardware before using the query. Up to now, we provided
predefined queries. This change allows the user to build its own query
(and associated configuration) externally, and have the i965 driver
use that configuration through a new query named :
Intel_Raw_Hardware_Counters_Set_0_Query
When this query is selected, the i965 driver will report raw counters
deltas (meaning their values need to be interpreted by the user, as
opposed to existing queries that provide human readable values).
This change is also useful for debug purposes for building new
pre-defined queries and verifying the underlying numbers make sense
before writing equations for user readable output.
This change's purpose is also to enable GPA. GPA uses a library called
MDAPI that processes raw counter data. MDAPI expects raw data to have
a certain layout (per generation which is a bit unfortunate...). This
change also embeds the expected data layouts.
v2: Enable raw queries on gen 7->11, v1 had 7->9 (Lionel)
v3: Don't assert on cherryview for gen7... (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add comment breaking down where the frequency values come from (Ken)
v3: More documentation (Ken/Lionel)
Adjust clock ratio multiplier to reflect the divider's behavior (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This register contains the current/previous frequency of the GT, it's
one of the value GPA would like to have as part of their queries.
v2: Don't use this register on baytrail/cherryview (Ken)
Use GET_FIELD() macro (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We would like to reuse a number of the functions and structures in
another file in a future commit.
We also move the previous content of brw_performance_query.h into
brw_performance_query_metrics.h to be included by generated metrics
files.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Originally the "each" variable was just a part of the "drivers"
variable. It's not anymore so it's a bit ambiguous.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
This fixes -Ddri-drivers-path, -Dvdpau-libs-path, etc. with DESTDIR when
those paths are absolute. Currently due to the way python's os.path.join
handles absolute paths these will ignore DESTDIR, which is bad. This
fixes them to be relative to DESTDIR if that is set.
Fixes: 3218056e0e
("meson: Build i965 and dri stack")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
We seem to use progress for two cases:
1) When we lowered some returns.
2) When we remove unreachable code.
If just case 2 happens we assert as state->return_flag has not
been allocated yet, but we are still trying to do insert all
predicates based on it.
This splits the concerns. We only use progress internally for case 1
and then keep track of 2 in a separate variable to indicate progress
in the return value of the pass.
This is slightly better than transforming the assert into
if (!state->return_flag) return, as the solution in this patch avoids
inserting predicates even if some other part of the might need them.
Fixes: 6e22ad6edc "nir: return early when lowering a return at the end of a function"
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106174
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
If an EGLSurface is created, made current and destroyed, and then a second
EGLSurface is created. Then the second malloc in driCreateNewDrawable may
return the same pointer address the first surface's drawable had.
Consequently, when dri_make_current later tries to determine if it should
update the texture_stamp it compares the surface's drawable pointer against
the drawable in the last call to dri_make_current and assumes it's the same
surface (which it isn't).
When texture_stamp is left unset, then dri_st_framebuffer_validate thinks
it has already called update_drawable_info for that drawable, leaving it
unvalidated and this is when bad things starts to happen. In my case it
manifested itself by the width and height of the surface being unset.
This is fixed this by setting the pointer to NULL before freeing the
surface.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106126
Signed-off-by: Johan Klokkhammer Helsing <johan.helsing@qt.io>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
For nv50 we coalesce the srcs and defs into a single node. As such, we
can end up with impossible constraints if the source is referenced
after the tex operation (which, due to the coalescing of values, will
have overwritten it).
This logic already exists for inserting moves for MERGE/UNION sources.
It's the exact same idea here, so leverage that code, which also
includes a few optimizations around not extending live ranges
unnecessarily.
Fixes tests/spec/glsl-1.30/execution/fs-textureSize-components.shader_test
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
If users are running mesa under old version of qemu or have turned off
GL at runtime, virtio gpu driver actually doesn't work. Adds a detection
here so mesa can fall back to software rendering.
v2:
- move detection from loader to virgl (Ilia, Emil)
Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The copr repo from che was using LTO and he reported radv broke
recently with it. When testing with lto builds here I noticed
that we weren't seeing any instance extensions reported.
It appears LTO was treating the const without extern as an empty
struct, this is possibly a gcc bug, but we can work around it
just by marking these with extern.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These were getting mapped off into outer space, which would cause nv50
and nvc0 to clip the primitives (as depth_clip was enabled).
These drivers are configured to clip everything outside the [0, 1]
range, even though the hardware supports other view settings.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This helps with the PostRALoadPropagation pass moving long immediates into
FMA/MAD instructions.
changes in shader-db:
total instructions in shared programs : 5894114 -> 5886074 (-0.14%)
total gprs used in shared programs : 666558 -> 666563 (0.00%)
total shared used in shared programs : 520416 -> 520416 (0.00%)
total local used in shared programs : 53524 -> 53524 (0.00%)
total bytes used in shared programs : 54006744 -> 53932472 (-0.14%)
local shared gpr inst bytes
helped 0 0 2 4192 4192
hurt 0 0 7 9 9
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
[imirkin: minor edits to separate nv50 and nvc0+ cases]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This was done a while ago but never marked on features.txt. Note that
this is only supported on GM200+.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2018-04-21 10:02:55 -04:00
1467 changed files with 95040 additions and 56435 deletions
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105775">Bug 105775</a> - SI reaches the maximum IB size in dwords and fail to submit</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105994">Bug 105994</a> - surface state leak when creating and destroying image views with aspectMask depth and stencil</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106074">Bug 106074</a> - radv: si_scissor_from_viewport returns incorrect result when using half-pixel viewport offset</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106126">Bug 106126</a> - eglMakeCurrent does not always ensure dri_drawable->update_drawable_info has been called for a new EGLSurface if another has been created and destroyed first</li>
</ul>
<h2>Changes</h2>
<p>Bas Nieuwenhuizen (2):</p>
<ul>
<li>ac/nir: Make the GFX9 buffer size fix apply to image loads/atomics too.</li>
<li>radv: Mark GTT memory as device local for APUs.</li>
</ul>
<p>Dylan Baker (2):</p>
<ul>
<li>bin/install_megadrivers: fix DESTDIR and -D*-path</li>
<li>meson: don't build classic mesa tests without dri_drivers</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>intel/compiler: Add scheduler deps for instructions that implicitly read g0</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>i965/fs: Return mlen * 8 for size_read() for INTERPOLATE_AT_*</li>
</ul>
<p>Johan Klokkhammer Helsing (1):</p>
<ul>
<li>st/dri: Fix dangling pointer to a destroyed dri_drawable</li>
</ul>
<p>Juan A. Suarez Romero (4):</p>
<ul>
<li>docs: add sha256 checksums for 18.0.1</li>
<li>travis: radv needs LLVM 4.0</li>
<li>cherry-ignore: add explicit 18.1 only nominations</li>
<li>Update version to 18.0.2</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965: Fix shadow batches to be the same size as the real BO.</li>
</ul>
<p>Lionel Landwerlin (1):</p>
<ul>
<li>anv: fix number of planes for depth & stencil</li>
</ul>
<p>Lucas Stach (1):</p>
<ul>
<li>etnaviv: fix texture_format_needs_swiz</li>
</ul>
<p>Marek Olšák (3):</p>
<ul>
<li>radeonsi/gfx9: fix a hang with an empty first IB</li>
<li>glsl_to_tgsi: try harder to lower unsupported ir_binop_vector_extract</li>
<li>Revert "st/dri: Fix dangling pointer to a destroyed dri_drawable"</li>
</ul>
<p>Samuel Pitoiset (2):</p>
<ul>
<li>radv: fix scissor computation when using half-pixel viewport offset</li>
<li>radv/winsys: allow to submit up to 4 IBs for chips without chaining</li>
</ul>
<p>Thomas Hellstrom (1):</p>
<ul>
<li>svga: Fix incorrect advertizing of EGL_KHR_gl_colorspace</li>
</ul>
<p>Timothy Arceri (1):</p>
<ul>
<li>mesa: free debug messages when destroying the debug state</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106315">Bug 106315</a> - The witness + dxvk suffers flickering garbage</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106465">Bug 106465</a> - No test for Image Load/Store on format-incompatible texture buffer</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106479">Bug 106479</a> - NDEBUG not defined for libamdgpu_addrlib</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106481">Bug 106481</a> - No test for Image Load/Store on texture buffer sized greater than MAX_TEXTURE_BUFFER_SIZE_ARB</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106504">Bug 106504</a> - vulkan SPIR-V parsing failed at ../src/compiler/spirv/vtn_cfg.c:381</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106587">Bug 106587</a> - Dota2 is very dark when using vulkan render on a Intel << AMD prime setup</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98281">Bug 98281</a> - 'message's in ctx->Debug.LogMessages[] seem to leak.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99549">Bug 99549</a> - pp: Failed to translate a shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100259">Bug 100259</a> - [EGL] [GBM] undefined reference to `gbm_bo_create_with_modifiers'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101408">Bug 101408</a> - [Gen8+] Xonotic fails to render one of the weapons</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101442">Bug 101442</a> - Piglit shaders@ssa@fs-if-def-else-break fails with sb but passes with R600_DEBUG=nosb</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102542">Bug 102542</a> - mesa-17.2.0/src/gallium/state_trackers/nine/nine_ff.c:1938: bad assignment ?</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102905">Bug 102905</a> - [R600] Miscompilation of TGSI to VLIW causes artifacts in Gallium Nine with Crysis2 bump mapping</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104741">Bug 104741</a> - Graphic corruption for Android apps Telegram and KineMaster</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104762">Bug 104762</a> - Various segfaults/problems in qt/plasma</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104777">Bug 104777</a> - Attaching multiple shader objects for the same stage to a GLSL program triggers a linker error</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104794">Bug 104794</a> - piglit.spec.arb_internalformat_query2.samples and num_sample_counts pname checks</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104803">Bug 104803</a> - SIGSEGV in state_tracker/st_glsl_to_tgsi_temprename.cpp</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104863">Bug 104863</a> - 186 assertions in piglit</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104884">Bug 104884</a> - memory leak with intel i965 mesa when running android container in Ubuntu</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104905">Bug 104905</a> - SpvOpFOrdEqual doesn't return correct results for NaNs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104908">Bug 104908</a> - Texture Compression Hint not converted to enum16</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104915">Bug 104915</a> - Indexed SHADING_LANGUAGE_VERSION query not supported</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105013">Bug 105013</a> - [regression] GLX+VA-API+clutter-gst video playback is corrupt with Mesa 17.3 (but is fine with 17.2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105026">Bug 105026</a> - glxgears asserts with pp_jimenezmlaa=1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105029">Bug 105029</a> - simdlib_512_avx512.inl:371:57: error: could not convert ‘_mm512_mask_blend_epi32((__mmask16)(ImmT), a, b)’ from ‘__m512i’ {aka ‘__vector(8) long long int’} to ‘SIMDImpl::SIMD512Impl::Float’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105065">Bug 105065</a> - Qt Programs occasionally fail to render with new Mesa (glGetProgramBinary)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105161">Bug 105161</a> - KHR_blend_equation_advanced doesn't work in GLSL 1.10-1.40 shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105183">Bug 105183</a> - Weird assertion in NIR linker</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105211">Bug 105211</a> - build failure after zwp_dmabuf commit if wayland-protocols is not installed</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105238">Bug 105238</a> - ast.h:648:16: error: union member 'i' has a non-trivial constructor</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105255">Bug 105255</a> - Waiting for fences without waitAll is not implemented</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105262">Bug 105262</a> - [R600] [BISECTED] ttf fonts are invisible in many programs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105292">Bug 105292</a> - vkGetQueryPoolResults returns incorrect query status for large query buffers (bisected)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105317">Bug 105317</a> - The GPU Vega 56 was hang while try to pass #GraphicsFuzz shader15 test</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105497">Bug 105497</a> - shader-db crashes on 72 core system after ast_type_qualifier bitset change</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105717">Bug 105717</a> - [bisected] Mesa build tests fails: BIGENDIAN_CPU or LITTLEENDIAN_CPU must be defined</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105737">Bug 105737</a> - st_tests_common.cpp:140:42: error: no matching function for call to 'tgsi_get_opcode_info'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105738">Bug 105738</a> - commit f7ffa504a065dc2631fd38cc5fe885b277f4e7e7 causes artifacting in radv</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105740">Bug 105740</a> - glsl_types.cpp(524): error: a dynamically-initialized local static variable is not allowed inside of a statement expression</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105775">Bug 105775</a> - SI reaches the maximum IB size in dwords and fail to submit</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105807">Bug 105807</a> - [Regression, bisected]: 3D Rendering not working correctly in Warhammer 40k: Dawn of War II</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105817">Bug 105817</a> - scons build broken by glSpecializeShaderARB</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105820">Bug 105820</a> - [m32] piglit regressions relinking program without shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105942">Bug 105942</a> - Graphical artefacts after update to mesa 18.0.0-2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105952">Bug 105952</a> - radv causes GPU hang on SI</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105960">Bug 105960</a> - [bisected] meson build test fails with: undefined reference to `etna_pm_create_query'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105994">Bug 105994</a> - surface state leak when creating and destroying image views with aspectMask depth and stencil</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106074">Bug 106074</a> - radv: si_scissor_from_viewport returns incorrect result when using half-pixel viewport offset</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106126">Bug 106126</a> - eglMakeCurrent does not always ensure dri_drawable->update_drawable_info has been called for a new EGLSurface if another has been created and destroyed first</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106133">Bug 106133</a> - make check "OSError: [Errno 24] Too many open files"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106147">Bug 106147</a> - SIGBUS in write_reloc() when Sacha Willems' "texture3d" Vulkan demo starts</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106174">Bug 106174</a> - vulkan dota2 broken (segfaulting), found bug commit</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106180">Bug 106180</a> - [bisected] radv vulkan smoke test black screen (Add support for DRI3 v1.2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106243">Bug 106243</a> - [kbl] GPU HANG: 9:0:0x85dffffb, in Cinnamon</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105396">Bug 105396</a> - tc compatible htile sets depth of htiles of discarded fragments to 1.0</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105399">Bug 105399</a> - [snb] GPU hang: after geometry shader emits no geometry, the program hangs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106756">Bug 106756</a> - Wine 3.9 crashes with DXVK on Just Cause 3 and Quantum Break on VEGA but works ON POLARIS</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106774">Bug 106774</a> - GLSL IR copy propagates loads of SSBOs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106903">Bug 106903</a> - radv: Fragment shader output goes to wrong attachments when render targets are sparse</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106907">Bug 106907</a> - Correct Transform Feedback Varyings information is expected after using ProgramBinary</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106906">Bug 106906</a> - Failed to recongnize keyword “sampler2DRect” and "sampler2DRectShadow"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106928">Bug 106928</a> - When starting a match Rocket League crashes on "Go"</li>
description:'List of dri drivers to build. If this is set to auto all drivers applicable to the target OS/architecture will be built'
)
option(
'dri-drivers-path',
@@ -51,9 +55,14 @@ option(
)
option(
'gallium-drivers',
type:'string',
value:'auto',
description:'comma separated list of gallium drivers to build. If this is set to auto all drivers applicable to the target OS/architecture will be built'
description:'List of gallium drivers to build. If this is set to auto all drivers applicable to the target OS/architecture will be built'
)
option(
'gallium-extra-hud',
@@ -141,9 +150,10 @@ option(
)
option(
'vulkan-drivers',
type:'string',
value:'auto',
description:'comma separated list of vulkan drivers to build. If this is set to auto all drivers applicable to the target OS/architecture will be built'
type:'array',
value:['auto'],
choices:['','auto','amd','intel'],
description:'List of vulkan drivers to build. If this is set to auto all drivers applicable to the target OS/architecture will be built'
)
option(
'shader-cache',
@@ -214,6 +224,12 @@ option(
value:true,
description:'Build assembly code if possible'
)
option(
'glx-read-only-text',
type:'boolean',
value:false,
description:'Disable writable .text section on x86 (decreases performance)'
)
option(
'llvm',
type:'combo',
@@ -248,12 +264,6 @@ option(
value:false,
description:'Build unit tests. Currently this will build *all* unit tests, which may build more than expected.'
)
option(
'texture-float',
type:'boolean',
value:false,
description:'Enable floating point textures and renderbuffers. This option may be patent encumbered, please read docs/patents.txt and consult with your lawyer before turning this on.'
% if e.name == 'vkCreateInstance' or e.name == 'vkEnumerateInstanceExtensionProperties' or e.name == 'vkEnumerateInstanceLayerProperties':
% if e.name == 'vkCreateInstance' or e.name == 'vkEnumerateInstanceExtensionProperties' or e.name == 'vkEnumerateInstanceLayerProperties' or e.name == 'vkEnumerateInstanceVersion':
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.