X/GLX can't handle them. This removes almost 500 GLX visuals that were
incorrectly exposed.
This is a less invasive version of Marek's .getCapability series.
Note: the patch is not applicable for master, but only for the 17.2
branch.
Suggested-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
CC: <mesa-stable@lists.freedesktop.org>
Fixes: f33d8af7aa "st/dri: add 32-bit RGBX/RGBA formats"
[Emil Velikov: commit message polish]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reported by valgrind at:
glsl_to_tgsi_visitor::visit(ir_expression*) (st_glsl_to_tgsi.cpp:1560)
When compiling the Deus Ex shaders.
Fixes: 28a5e7104 ("st/glsl_to_tgsi: handle precise modifier")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 06237fc9e1)
GLES/gl.h has historically provided some typedefs that are not
used in the API itself. Restore these typedefs that were lost to
avoid breaking applications.
These seem to be the only typedefs removed in the update.
Fixes: 7fd0817 "Update Khronos-supplied headers"
[Eric: added a big warning to revert this patch when pulling the updated header]
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
(cherry picked from commit 3db05ed1d1)
This reverts commit 3008161d28,
which caused a regression for VMWare.
The initial code had some recursion in it, that I removed by accident
trying to add back the recursion broke lots of things, take the high
road and revert for now.
Fixes: 3008161d (st_glsl_to_tgsi: rewrite rename registers to use array fully.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit b8bea9a050)
This fixes:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.*
for a2r10g10b10 formats as destination on SI/CIK hardware.
This adds support to the meta program for emitting 10-bit
outputs, and adds 10-bit support to the fragment shader key.
It also only does the int8/10 on SI/CIK.
Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit df61a05019)
In some APU situations the reported visible size can be larger than
VRAM size. This properly clamps the value.
Surprisingly both CTS and spec seem to allow a heap type with size 0,
so this seemed like the easiest option to me.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: 4ae84efbc5 "radv: Use enum for memory heaps."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
(cherry picked from commit 8229706ad8)
If dual object compile fails (as seems to happen with virgl a
fair bit, and does piglit even have any tests for it?), we end up
not restarting the pull params, so we call
vec4_visitor::move_uniform_array_access_to_pull_constant
a second time and it runs over the ends of the alloc.
Fixes: tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test
running inside virgl on ivybridge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 271fa3a684)
The cacheline alignment restriction is on the base address; the pitch
can be anything.
Fixes assertion failures when using primus (say, on glxgears, which
creates a 300x300 linear BGRX surface with a pitch of 1200):
intel_blit.c:190: get_blit_intratile_offset_el: Assertion `mt->surf.row_pitch % 64 == 0' failed.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit 595a47b829)
This implements a wait for glXWaitGL, glXCopySubBuffer, dri flush_front and
creation of fake front until all pending SwapBuffers have been committed to
hardware. Among other things this fixes piglit glx-copy-sub-buffers on dri3.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 185ef06fd2)
The issue here is that the immediate is treated as a 64-bit value,
and fetching it does not work reliably with swizzles that are different
from xy and zw.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit da83687c4b)
This looks like it's supported since llvm 3.9 at least,
so switch over radeonsi and radv to using it, -pro also
uses this. We can now drop creating lds for these operations
as the ds_swizzle operation doesn't actually write to lds at all.
Acked-by: Marek Olšák <marek.olsak@amd.com>
(stable requested due to fixing radv CIK conformance tests)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit cb6f16dce9)
[Emil Velikov: resolve trivial conflicts]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Conflicts:
src/amd/common/ac_nir_to_llvm.c
This makes it match radeonsi. The LLVM backend itself will emit the
correct instruction, but LLVM might do incorrect optimizations since it
thinks the output is undefined when the input is 0, even though it's not
supposed to be. We really need a new intrinsic, or for the backend to
become smarter and recognize this pattern.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
(cherry picked from commit 6d731c5651)
Otherwise, code generation fails. This has become necessary since some
shaders are wrapped in control flow.
Fixes: 081ac6e5c6 ("radeonsi/gfx9: always wrap GS and TCS in an if-block (v2)")
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 2879a602dd)
The slower convert-and-copy process performs a bad conversion
because it converts the value to signed 64-bit integer, but
bindless uniform handles are considered unsigned 64-bit.
This fixes "Check glUniform*() with mixed texture units/handles"
from arb_bindless_texture-uniform piglit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit b38c9c57f2)
Since array splitting for AoA is disabled, we have to retrieve
the type of the first non-array type when an array of images is
declared inside a structure. Otherwise, it will hit an assert
in glsl_type::sampler_index() because it expects either a sampler
or an image type.
This fixes a regression in the following piglit test:
arb_bindless_texture/compiler/images/arrays-of-struct.frag
Fixes: 57165f2ef8 ("glsl: disable array splitting for AoA")
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit f99e9335e2)
On SI this was causing a hang in
dEQP-VK.pipeline.render_to_image.core.2d_array.mipmap.r16g16_sint_s8_uint
This was due to not handling the tile mode index for depth like
I fixed previously for new GPUs.
Fixes: 01d0c5a9 (radv: fix stencil regression since new addrlib import)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 800d162209)
The host doesn't understand this yet, so drop it for now.
Fixes: virgl regressions.
Fixes: af22adee4f (tgsi: add precise flag to tgsi_instruction)
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 554aa09440)
This fixes corruption with bindless textures in Dawn Of War 3.
The do_update_surf_dirtiness mechanism was complicated and dirty_level_mask
was only updated after the first draw call. The problem is bindless textures
are checked for decompression every draw call and we would only decompress
after the first draw call. The solution is to set dirtiness after the last
draw call to the framebuffer, so the (unconditional) decompression of
bindless textures happens at the right time.
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit f4d095cc65)
With merged ESGS shaders, the GS part of a wave may be empty, and the
hardware gets confused if any GS messages are sent from that wave. Since
S_SENDMSG is executed even when EXEC = 0, we have to wrap even
non-monolithic GS shaders in an if-block, so that the entire shader and
hence the S_SENDMSG instructions are skipped in empty waves.
This change is not required for TCS/HS, but applying it there as well
simplifies the logic a bit.
Fixes GL45-CTS.geometry_shader.rendering.rendering.*
v2: ensure that the TCS epilog doesn't run for non-existing patches
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 081ac6e5c6)
The shader that is used to copy vertex data out of the vs/gs shaders to
the user-specified buffer (streamout or SO shader) was not using the
correct offsets.
Adjust the offsets that are used just for the SO shader:
- Make sure that position is handled in the same special way
as in the vs/gs shaders
- Use the correct offset to be passed in the core
- consolidate register slot mapping logic into one function, since it's
been calculated in 2 different places (one for calcuating the slot mask,
and one for the register offsets themselves
Also make room for all attibutes in the backend vertex area.
Fixes:
- all vtk GL2PS tests
- 18 piglit tests (16 ext_transform_feedback tests,
arb-quads-follow-provoking-vertex and primitive-type gl_points
v2:
- take care of more SGV slots in slot mapping logic
- trim feState.vsVertexSize
- fix GS interface and incorporate GS while calculating vsVertexSize
Note that vsVertexSize is used in the core as the one parameter that
controls vertex size between all stages, so it has to be adjusted appropriately
for the whole vs/gs/fs pipeline.
Also note that GS and SO is not fully implemented. This will be addressed
later.
fixes:
- fixes total of 20 piglit tests
CC: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
(cherry picked from commit 194ff5eed1)
This ports 72e46c988 to radv.
radeonsi: apply a TC L1 write corruption workaround for SI
Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit e77ff11ffe)
This ports: da7453666a
radeonsi: don't apply the Z export bug workaround to Hainan
to radv.
Just noticed in passing.
Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit a81e99f50a)
Until we support sync fd, don't report the info.
Fixes CTS dEQP-VK.api.external.semaphore.sync_fd.* from crashing.
Fixes: eaa56eab6 (radv: initial support for shared semaphores (v2))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit 6cbc8cf178)
Always initialise whandle.modifier for DRIImage modifier queries, so if
the driver doesn't support it then we return false for the query.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: d33fe8b84e ("st/dri: enable DRIimage modifier queries")
(cherry picked from commit 45383d32d4)
In the DRIImage queryImage hook, check if resource_get_handle() failed
and return FALSE if so.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit b4a18f13ce)
If the underlying driver does not support modifiers, dmabuf will still
advertise formats through the 'modifier' event, but send them with an
invalid modifier. Ignore them if this is the case, rather than passing
them through to the driver.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
(cherry picked from commit dd072cf4b1)
With commit 5124bf9823, a framebuffer interface hash table is
created in st_gl_api_create(), which is called in
dri_init_screen_helper() for each screen. When the hash table is
overwritten with multiple calls to st_gl_api_create(), it can cause
race condition. This patch fixes the problem by creating a
framebuffer interface hash table per state tracker manager.
Fixes crash with steam.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101876
Fixes: 5124bf9823 ("st/mesa: add destroy_drawable interface")
Tested-by: Christoph Haag <haagch@frickel.club>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit bbc29393d3)
Squashed with:
st/mesa: fix unconditional return in st_framebuffer_iface_remove
Noticed by James Legg @ Feral.
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 914f11e75b)
Squashed with:
st/mesa: Fix inversed test in st_api_destroy_drawable
Fixes a drawable leak.
Fixes: bbc29393d3 ("st/mesa: create framebuffer iface hash table per
st manager")
Bugzilla: https://bugs.freedesktop.org/101930
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
(cherry picked from commit 57132d126f)
We'll fail to flag an error if the context flags appear after the
no-error attribute in the context attribute list.
Delay the check to after attribute parsing to fix this.
Fixes: 4909519a66 ("egl: Add EGL_KHR_create_context_no_error support")
Cc: mesa-stable@lists.freedesktop.org
[Emil Velikov: add fixes/stable tags, commit message polish]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 39bf7756b9)
The number of supported waves per thread group has been reduced to 16
with gfx9. Trying to use 32 waves causes hangs, and barriers might
not work correctly with > 16 waves.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit a0e6b9a2db)
The firmware version numbers for SI were wrong. The new numbers are probably
too conservative (we don't have a definitive answer by the firmware team),
but DRAW_INDIRECT_MULTI has been confirmed to work with these versions on
Tahiti (by Gustaw) and on Verde (by myself).
While this is technically adding a feature, it's a feature we thought we had
for a long time. The change is small enough and we're early enough in the 17.2
release cycle that it should still go in.
Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 65fbaab0b7)
The EU limit of 128 GRFs should allow 32 vertex elements of 4 GRFs.
However, the maximum allowed value of "Vertex URB Entry Read Length"
in SIMD8 is 15. And 15 * 8 = 120 gives us a limit of 30 vertex elements.
Because we also need to reserve a vertex buffer to upload
VertexIndex/InstanceIndex and another to upload DrawID when needed,
we can only expose 28.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 31f1863ace)
Increase the value, not the pointer to the stack variable.
Caught by Coverity (CID 1415574). Not shipped in a real release.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(cherry picked from commit f6e674fa51)
Fixes: 63a43f4161 ("i965: Refactor miptree to isl converter and adjustment")
I don't know how I managed to leave this here for so long. Found when
working on a 1:1 overlapping blit extension for X11.
Cc: mesa-stable@lists.freedesktop.org
(cherry picked from commit 93fec49a75)
As Chris commented, it makes more sense to have batch buffer flushes
before the query. Usually applications like frame_retrace do a series
of queries and in that case, with flushes at the end of the queries,
we might still have the first query contained in 2 different batchs.
More generally it would be quite usual to have the query contained in
2 batch buffers because we never now what's the fill rate of the
current batch buffer.
If we move the flushing at the beginning of the queries, it's pretty
much guaranteed that queries will be contained in a single batch
buffer (unless the amount of commands is huge, but then it's only fair
to include reloading request times in the measurements).
Fixes: adafe4b733 ("i965: perf: minimize the chances to spread queries across batchbuffers")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 9f439ae120)
Otherwise we'll attemt to generate the header even we don't need to.
In that case the dependencies may not be met, leading to build failure.
Fixes: 166852e "configure.ac: rework wayland-protocols handling"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The extension should be in the list as returned by getExtensions().
Seems to have gone unnoticed since close to nobody wants to change the
vblank mode for the software driver.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We need wl_egl_window to be a versioned struct in order to keep track of
ABI changes.
This change makes the first member of wl_egl_window the version number.
An heuristic in the wayland driver is added so that we don't break
backwards compatibility:
- If the first field (version) is an actual pointer, it is an old
implementation of wl_egl_window, and version points to the wl_surface
proxy.
- Else, the first field is the version number, and we have
wl_egl_window::surface pointing to the wl_surface proxy.
Signed-off-by: Miguel A. Vico <mvicomoya@nvidia.com>
Reviewed-by: James Jones <jajones@nvidia.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
mincore() returns 0 on success, and -1 on failure. The last parameter
is a vector of bytes with one entry for each page queried. mincore
returns page residency information in the first bit of each byte in the
vector.
Residency doesn't actually matter when determining whether a pointer is
dereferenceable, so the output vector can be ignored. What matters is
whether mincore succeeds. See:
http://man7.org/linux/man-pages/man2/mincore.2.html
Signed-off-by: Miguel A. Vico <mvicomoya@nvidia.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
At dist/distcheck time we need to ensure that all the files and their
respective dependencies are handled.
At the moment we'll bail out as the linux-dmabuf rules are guarded in a
conditional. Move them outside of it and drop the sources from
BUILT_SOURCES.
Thus the files will be generated only as needed, which will happen only
after the wayland-protocols dependency is enforced in configure.ac.
v2: add dependency tracking for the header
Cc: Andres Gomez <agomez@igalia.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
This calculates ps_iter_samples from the minSampleShading input
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is an alternate fix for the buffer export dedicated interaction.
Fixes CTS dEQP-VK.api.external.memory.opaque_fd.dedicated.buffer.info
Fixes: b70829708a (radv: Implement VK_KHR_external_memory)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If the layer base was > 0, it wasn't getting passed as the start
instance or getting added in the shaders.
Fixes CTS dEQP-VK.api.image_clearing.core.clear_color_attachment.2d_r8_uint_multiple_layers
Fixes: 7e0382fb (radv: add support for layered clears (v2))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The spec says we should return VK_ERROR_FEATURE_NOT_PRESENT.
Ported from anv.
Fixes CTS test dEQP-VK.api.device_init.create_device_unsupported_features
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we get an fd, we need to close it before returning.
Fixes CTS test dEQP-VK.api.external.memory.opaque_fd.dedicated.device_only.import_multiple_times
Fixes: b70829708a (radv: Implement VK_KHR_external_memory)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The image is set on Memory allocation already, but the image doesn't
have to have the BindImageMemory called yet. Luckily, we know offset
within a BO has to be 0 for dedicated allocations, so we can just
use the dummy 0 in the address calaculations.
Fixes CTS test dEQP-VK.api.external.memory.opaque_fd.dedicated.image.export_bind_import_bind
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: b70829708a "radv: Implement VK_KHR_external_memory"
Reviewed-by: Dave Airlie <airlied@redhat.com>
This just sets them to INVALID COLOR, instead of shifting the
attachments together.
This also fixes a number of cases where we use it first and only
then check if it is VK_ATTACHMENT_UNUSED.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fill the entire array instead of just a quarter. This avoids
crashes with large shaders.
(currently this never causes a problem because shaders larger than 2048/4
instructions are not supported by this driver on any hardware, but it will
cause problems in the future)
Fixes: ec43605189 ("etnaviv: fix shader miscompilation with more than 16 labels")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
We already have a helper for doing this in BLORP, this just moves the
logic into ISL where we can share it with other components.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The set of formats which supports CCS_E is actually fairly small on
gen9. However, everything that supports fast-clears on gen8 also
supports fast-clears on gen9+. The one very annoying exception is
that blending is broken for non-0/1 clear colors with sRGB formats.
In order to solve that problem, we do a resolve to get rid of the
clear color. Another option would be to just not fast-clear with
non-0/1 clear colors however non-0/1 + blending + sRGB is uncommon
enough that this shouldn't be a significant performance problem.
This appears to help gl_manhattan31_off by about 2%.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This makes it much easier to edit the template and doesn't really dirty
the python all that much.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This commit replaces the generic "flags" parameter with a more explicit
aux usage parameter. This leads to a lot of duplicated code at the
moment but this will all get cleaned up directly.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This requires us to start using the partial clear state. It makes
things quite a bit more complicated but it's still a fairly
straightforward exercise in diagram following.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that we have this field, it's much easier to switch on it than to
walk an if ladder that checks different things.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We also simplify the way we handle stencil since we know a priori that
it will have ISL_AUX_USAGE_NONE.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The only real change here is that we now reject clear colors for MCS
with certain formats on gen < 9 because we can't trust that the
reinterpretation will work. This may cause some MCS partial resolves.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Our attempts to do it automatically are problematic at best. In order
to really be precise, we need to know both the desired aux usage and
whether or not clear is supported. The current automatic mechanism
doesn't cover this. This commit itself is not a functional change since
it just reworks everything to be in terms of a silly helper. Later
commits will switch things over to more sensible ways of choosing usage.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We keep the old and possibly broken method of determining aux usage
intact for now. Therefore, the only functional change here is that we
may call finish_render a bit more accurately.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Multisample surfaces only have a single miplevel so there's no reason to
be passing the extra parameters around. It only leads to confusion.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This commit changes layer_range_length to return locical layers and also
changes the way we allocate the aux_state field to not allocate extra
layers for MCS. This will be important as we're about to start doing
significantly more detailed tracking of MCS state.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
intel_miptree_supports_ccs_e should handle the gen >= 9 requirement and
there's no reason why we can't do CCS_E on window system buffers so long
as we resolve.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Nothing created through intel_miptree_create_for_renderbuffer will ever
be exposed externally so there's no need to set FOR_SCANOUT.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
It turns out that if you have rendering in-flight with CCS_E enabled and
you go to do a depth resolve without flushing, the CCS data may never
hit the memory.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Image layouts only let us know that an image *may* be fast-cleared. For
this reason we can end up with redundant resolves. Testing has shown
that such resolves can measurably hurt performance and that predicating
them can avoid the penalty.
v2:
- Introduce additional resolve state management function (Jason Ekstrand).
- Enable easy retrieval of fast clear state fields.
v3: Use more descriptive field enums (Jason)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With an earlier patch from this series, resolves are additionally
performed on layout transitions. Remove the now unnecessary implicit
resolves within render passes.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Expound on comment for the pipe controls (Jason Ekstrand).
v3:
- Cast base_layer to uint64_t to avoid overflow.
- Remove "seems" from the pipe control comment.
- Fix clamp of layer_count (Jason Ekstrand).
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Use the performance warning infrastructure to provide helpful
information when testing applications.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For readability, bring the assignment of CCS closer to the assignment of
NONE and MCS.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The lifespan of the fast-clear data will surpass the render pass scope.
We need CCS_D to be enabled in order to invalidate blocks previously
marked as cleared and to sample cleared data correctly.
v2: Avoid refactoring.
v3: Allow CCS_D for subpass resolves.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The next patch enables the use of CCS_D even when the color attachment
will not be fast-cleared. Catch the gen7 case early to simplify the
changes required.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We'll be performing a GPU memcpy in more places to copy small amounts of
data. Add an alternate function that thrashes less state.
v2:
- Make a new function (Jason Ekstrand).
- Move the #define into the function.
v3:
- Update the function name (Jason).
- Update comments.
v4: Use an indirect drawing register as TEMP_REG (Jason Ekstrand).
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Remove ::first_subpass_layout assertion (Jason Ekstrand).
v3: Allow some fast clears in the GENERAL layout.
v4: Remove extra '||' and adjust line break (Jason Ekstrand).
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Don't pass in the command buffer (Jason Ekstrand).
v3: Remove an incorrect assertion and an if condition for gen7.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will be used to load and store clear values from surface state
objects.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It may technically be possible to enable some sort of fast-clear support
for at least the base slice of a 2D array texture on gen7. However,
it's not documented to work, we've never tried to do it in GL, and we
have no idea what the hardware does if you turn on CCS_D with arrayed
rendering. Let's just play it safe and disallow it for now. If someone
really cares that much about gen7 performance, they can come along and
try to get it working later.
The non-LLC story was a horror show. We uploaded data via pwrite
(drm_intel_bo_subdata), which would stall if the cache BO was in
use (being read) by the GPU. Obviously, we wanted to avoid that.
So, we tried to detect whether the buffer was busy, and if so, we'd
allocate a new BO, map the old one read-only (hopefully not stalling),
copy all shaders compiled since the dawn of time to the new buffer,
upload our new one, toss the old BO, and let the state upload code
know that our program cache BO changed. This was a lot of extra data
copying, and flagging BRW_NEW_PROGRAM_CACHE would also cause a new
STATE_BASE_ADDRESS to be emitted, stalling the entire pipeline.
Not only that, but our rudimentary busy tracking consistented of a flag
set at execbuf time, and not cleared until we threw out the program
cache BO. So, the first shader upload after any drawing would hit this
"abandon the cache and start over" copying path.
This is largely unnecessary - it's just ancient and crufty code. We can
use the same persistent mapping paths on all platforms. On non-ancient
kernels, this will use a write combining map, which should be reasonably
fast.
One aspect that is worse: we do occasionally grow the program cache BO,
and copy the old contents to the newer BO. This will suffer from UC
readback performance now. To mitigate this, we use the MOVNTDQA based
streaming memcpy on platforms with SSE 4.1 (all Gen7+ atoms). Gen4-5
are unfortunately going to be penalized.
v2: Add MOVNTDQA path, rebase on other map flag changes.
v3: Drop cache->bo_used_by_gpu too (caught by Chris Wilson).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Chris Wilson pointed out that this mapping really is persistant.
Shouldn't actually have any effect today, but best to set it anyway.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using a read-only mapping is completely bogus - we use this mapping to
write all new shaders to the cache.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Write-combine mappings give much better performance on writes than
uncached access through the GTT.
Improves performance of GFXBench 4's gl_driver2 benchmark at 1024x768
on Apollolake by 3.6086% +/- 0.674193% (n=15).
v2: (by Ken) Rebase on lockless mappings, map_count deletion, valgrind
updates, potential for CPU/WC maps failing, and other changes.
v3: (by Ken and Chris Wilson)
(Ken): Rebase on set_domain -> gem_wait
(Chris): Fix up a failed CPU/WC mmaping with a GTT mapping
Not all objects will be mappable for direct access by the CPU
(either using WC/CPU or WC paths), for example, a dmabuf wrapping an
object on a foreign device or an object wrapping access to stolen
memory. Since either the physical pages are not known or even do not
exist, we need to use the mediated, indirect access via the GTT. (If
one day, the kernel does suddenly start providing mediated access
via a regular WB/WC mmapping, we no longer need the fallback.)
v4: Avoid falling back for MAP_RAW (Chris).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the buffer is idle, we I915_GEM_WAIT will return immediately,
so we may as well skip the ioctl altogether. We can't trust the
"idle" flag for external buffers, but for most, it should be fine.
Reviewed-by: Matt Turner <mattst88@gmail.com>
With the advent of asynchronous maps, domain tracking doesn't make a
whole lot of sense. Buffers can be in use on both the CPU and GPU at
the same time. In order to avoid blocking, we stopped using set_domain
for asynchronous mappings, which means that the kernel's tracking has
lies. We can't properly track it in userspace either, as the kernel
can change domains on us spontaneously (for example, when un-swapping).
According to Chris Wilson, I915_GEM_SET_DOMAIN does the following:
1. pins the backing storage (acquiring pages outside of the
struct_mutex)
2. waits either for read/write access, including inter-device waits
3. updates the domain, clflushing as required
4. marks the object as used (for swapping)
5. turns off FBC/PSR/fancy scanout caching
Item (1) is not terribly important. Most BOs are recycled via the
BO cache, so they already have pages. Regardless, we fixed this
via an initial set_domain in the previous patch.
We implement item (2) with I915_GEM_WAIT. This has one downside:
we'll stall unnecessarily if we do a read-only mapping of a buffer
that the GPU is reading. I believe this is pretty uncommon. We
may want to extend the wait ioctl at some point.
Mesa already does item (3) itself. For cache-coherent buffers (most on
LLC systems), we don't need to do any clflushing - the CPU and GPU views
are coherent. For non-coherent buffers (most on non-LLC systems), we
currently only use the CPU for read-only maps, and we explicitly clflush
when necessary.
We don't care about item (4)...swapping has already killed performance.
Plus, with async maps, the kernel's domain tracking is already bogus,
so it can't do this accurately regardless.
Item (5) should be okay because we avoid cached maps of scanout buffers.
Reviewed-by: Matt Turner <mattst88@gmail.com>
From the ARB_uniform_buffer_object spec:
""shared" uniform blocks, the default layout, ..."
This doesn't fix anything as the default layout is already applied
at this point but fixes the misleading code/comment.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This was added in 2d03f48a65 and seems like it was intended
as a TODO comment in a function stub rather than a useful
code comment.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
When we validate the texture sample count, pass the correct
pipe_texture_target for the texture, rather than PIPE_TEXTURE_2D.
Also add more comments about MSAA.
No piglit regressions with VMware driver.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The NumSamples and FixedSampleLocation fields are set again later at
the end of the function so these earlier assignments aren't needed.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
imm {128.0, -128.0, 2.0, 3.0} is used for lit instruction which
is not used very frequently. So allocate it only if lit instruction is used.
Tested with mtt piglit and mtt glretrace
v2: As per Charmaine's comment
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This patch fixes the ordering of the constant indices for texcoord scale
factor and texture buffer size to match the order they were added to the
constant buffer in svga_get_extra_constants_common().
Tested with MTT piglit, glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
Sometimes, converting unnormalized coordinates to normalized
coordinates requires an epsilon value to produce the right texels with
nearest filtering. Adding 0.0001 to the coordinates when the min/mag
filter is nearest fixes the issue.
Fixes piglit test fbo-blit-scaled-linear
Tested with mtt-piglit, mtt-glretrace
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This allows us to override contexts to use no_error functionality
even if the applications themselves do not.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We have a very specific row pitch that we want and we don't want ISL to
be changing it on us so just be explicit about it.
Fixes: a40f043034
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
fixes
missrendering in TombRaider
KHR-GL44.gpu_shader5.precise_qualifier
KHR-GL45.gpu_shader5.precise_qualifier
v4: disable opt only for MAD, it's fine for SAD
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
v2: use str_match_no_case to fix _SAT_PRECISE detection
v4: usd is_digit_alpha_underscore to match end of mods
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Only implemented for glsl->tgsi. Other converters just set precise to 0.
v2: remove precise paramter from ureg_tex_insn and ureg_memory_insn
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
all subexpression inside an ir_assignment needs to be tagged as precise.
v2: make precise handling more global inside the visitor
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When I ported from libdrm, I forgot to add the line to reset
the sem, we just need to reset the context.
This fixes a regression in DOOM.
Fixes: 9ac1432a57 ("radv: port to new libdrm API.")
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
With this patch, the st manager will maintain a hash table for
the active framebuffer interface objects. A destroy_drawable interface
is added to allow the state tracker to notify the st manager to remove
the associated framebuffer interface object from the hash table,
so the associated framebuffer and its resources can be deleted
at framebuffers purge time.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101829
Fixes: 147d7fb772 ("st/mesa: add a winsys buffers list in st_context")
Tested-by: Brad King <brad.king@kitware.com>
Tested-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The two generators forked from each other, and they remain basically the
same. This rebases the radv version on the anv version, but with the
radv changes ported over. The result is that we get rid of the "cat |"
madness and gain mako, correct "generated by" attributions, and write
files out directly.
The only differences between the output is whitespace and comments.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
This was only needed for checking gen6 stencil which is already
using isl. One could delete GEN6_HIZ_STENCIL layout altogether
but that will be gone with the rest after a while anyway.
The dim_layout converter is needed even after transition to isl
when setting up surface states - see brw_emit_surface_state().
Hence dropping the unneeded argument separately.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
allowing graceful failure instead of crash on assert later on.
This can be hit, for example, on SNB when trying to allocate
8kx8k CUBE_MAP against isl: x-tiled buffer size becomes
2421161984 exceeding the maximum of 1 << 31 == 2147483648.
Another way to hit this on SNB is with multisampling of over
64-bit formats.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Otherwise init_teximage_fields_ms() (called by
_mesa_init_teximage_fields()) will always assert as it can't
find valid base format.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
There is the same constraintg later on as assert in
isl_gen7_choose_image_alignment_el() so catch it earlier in order
to return error instead of crash.
Needed to avoid crashes with piglits on IVB and HSW:
arb_internalformat_query2.image_format_compatibility_type pname checks
arb_internalformat_query2.all internalformat_<x>_type pname checks
arb_internalformat_query2.max dimensions related pname checks
arb_copy_image.arb_copy_image-formats --samples=2/4/6/8
arb_texture_float.multisample-fast-clear gl_arb_texture_float
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
These formats are already allowed by the i965 GL driver, and the
feature seems to work just fine.
There are tests for multisampled rendering in piglit:
tests/spec/ext_framebuffer_multisample which can be patched to
try 16I/32I in addition to GL_RGBA8I.
IvyBridge passed all tests with all sample numbers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
These formats are already allowed by the i965 GL driver, and the
feature seems to work just fine.
There are tests for multisampled rendering in piglit:
tests/spec/ext_framebuffer_multisample which can be patched to
try GL_RGBA16F/32F/16I/16UI/32I/32UI in addition to GL_RGBA/8I.
IvyBridge passed all tests with all sample numbers and even
with 128-bit formats.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We had some caller using LLVMAddInstrAttributes, which couldn't be
converted to lp_add_function_attr, because attributes were only handled
for functions in this case, so fix this.
For llvm >= 4.0, this already works correctly.
(radeonsi seems to avoid setting call site attributes prior to llvm 4.0,
the patch then citing it doesn't work when calling intrinsics. But at
least for calling external functions we always used that, albeit only
for actual call attributes, not call parameter attributes, though some
quick test shows llvm seems to handle that as well. The attribute index
is sort of iffy though, since attribute 0 of the call is the actual function,
attribute 1 corresponds to the first parameter of the called function.)
(Verified with GALLIVM_DEBUG=dumpbc plus llvm-dis that the correct
attributes are shown for calls, both for llvm 4.0 and 3.3.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We can also use storage images internally for resolves, which don't
require TRANSFER_DST usage on the image, so currently we may not create
the needed descriptors.
Just create these descriptors unconditionally.
Fixes: 0e1886efb9 ("radv: Fix descriptors for cube images with VK_IMAGE_USAGE_STORAGE_BIT")
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Linux-specific gettid() syscall shouldn't be used in portable code.
Fix does assume a 1:1 thread:LWP architecture, but works for our
current target platforms and can be revisited later if needed.
Fixes unresolved symbol in linux scons builds.
v2: add comment in code about the 1:1 assumption.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This adds support for sharing semaphores using kernel syncobjects.
Syncobj backed semaphores are used for any semaphore which is
created with external flags, and when a semaphore is imported,
otherwise we use the current non-kernel semaphores.
Temporary imports from syncobj fd are also available, these
just override the current user until the next wait, when the
temp syncobj is dropped.
v2: allocate more chunks upfront, fix off by one after
previous refactor of syncobj setup, remove unnecessary null
check.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds syncobj create/destroy/export/import paths into
the winsys interface.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Remove the following duplicates from the formats table:
- R8G8B8A8_UNORM (V_,_T)
- R8G8B8X8_UNORM (_T,_T)
- DXT3_RGBA (_T,_T)
Only the first has an effect because the _T overrides the V_ initializer,
the latter two were harmless duplications of the same.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Add support for ETC2 compressed textures in the etnaviv driver.
One step closer towards GL ES 3 support.
For now, treat SRGB and RGB formats the same. It looks like these are
distinguished using a different bit in sampler state, and not part of
the format, but I have not yet been able to confirm this for sure.
(Only enabled on GC3000+ for now, as the GC2000 ETC2 decoder
implementation is buggy and we don't work around that)
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
It's incorrect to use $(LOCAL_PATH) in makefile recipes since it's
changing. The typical way to handle it is to use private variable.
Fortunately in this case we can just simplify them to $^.
See further:
https://patchwork.freedesktop.org/patch/167718/
Also simplify LOCAL_GENERATED_SOURCES.
Fixes: 2dd4e2ec (spirv: Generate spirv_info.c)
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Two of the ARB_shader_ballot piglit tests hit the find_lsb case,
removing some of the noise allowed me to better debug the test when it
was failing.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Some hardware, like i965, doesn't support group sizes greater than 32.
In that case, we can reduce the destination size of the ballot
intrinsic, which will simplify our code generation.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The implementation of ballotARB() will start by zeroing the flags
register. So, a doing something like
if (gl_SubGroupInvocationARB % 2u == 0u) {
... = ballotARB(true);
[...]
} else {
... = ballotARB(true);
[...]
}
(like fs-ballot-if-else.shader_test does) would generate identical MOVs
to the same destination (the flag register!), and we definitely do not
want to pull that out of the control flow.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The implementations of the ARB_shader_ballot intrinsics will explicitly
read the flag as a source register.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We already had a channel_num system value, which I'm renaming to
subgroup_invocation to match the rest of the new system values.
Note that while ballotARB(true) will return zeros in the high 32-bits on
systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB
variables do not consider whether channels are enabled. See issue (1) of
ARB_shader_ballot.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The implementations of the ARB_shader_group_vote intrinsics will
explicitly write the flag as the destination register.
Reviewed-by: Matt Turner <mattst88@gmail.com>
I don't expect anyone is going to care about using this in vec4 programs
(vertex/tessellation/geometry on Gen6/7), no one has come up with a good
way to implement it much less test it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Specifically, constant fold intrinsics from ARB_shader_group_vote, but I
suspect it'll be useful for other things in the future.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are intrinsics rather than opcodes, because they operate across
channels.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Within i965, we have many different objects and confusingly when
submitting an execbuf we have lists of both our internal objects and a
list of the kernel's drm_i915_gem_exec_object with very similar names.
Rename the kernel's validation list to avoid the collison as it is only
used for interfacing with the kernel and so a peripheral use of
"object".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit b7153c3e9f.
The point of that commit was to ensure intel_prepare_render() occurred
before color resolves on the current framebuffer. In 0673bbfd9b
(i965: Move surface resolves back to draw/dispatch time), Jason moved
brw_predraw_resolve_framebuffer back to draw time, which is already
after a intel_prepare_render() call. So, this is no longer necessary.
Furthermore, it caused problems. "mpv" would only display a small
corner of movies, and Android started failing camera CTS tests.
This is because intel_prepare_render() ended up handling DRI2 events
which caused the drawable to be resized at an inopportune time, flagging
ctx->NewState |= _NEW_BUFFERS, but at a point where we've already copied
ctx->NewState, and failed to notice the newly set flag.
The lack of _NEW_BUFFERS caused us to skip 3DSTATE_DRAWING_RECTANGLE,
so the drawing ended up being clipped to an outdated framebuffer size.
Just drop the hack and go back to handling this at the proper time.
Thanks to Matti Hämäläinen (ccr), Tomasz Figa (tfiga), and Tapani Palli
for reporting these issues.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101558
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101704
Tested-by: Tapani Pälli <tapani.palli@intel.com>
No need to check if ID is not 0 because _mesa_HashFindFreeKeyBlock()
can't generate this value.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
No need to check if ID is not 0 because _mesa_lookup_vao()
already prevents this to happen.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
v2 (Jason):
- s/separate_stencil_surface/make_separate_stencil_surface/
- drop the check for separate stencil when wrapping an
existing buffer object with miptree. This is dead code as
the first needs_separate_stencil() checks is
MIPTREE_LAYOUT_FOR_BO-flag and says no.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Depth buffers are always Y-tiled. In brw_miptree_choose_tiling()
driver opts to use linear buffers for small and 1D but this does
not apply for depth - GL_DEPTH_COMPONENT and GL_DEPTH_STENCIL_EXT
are considered first.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This will significantly reduce chrun when switching remaaining
surface types to isl. After the full transition it will be easier
to calculate on-demand and drop the helper member in miptree.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This makes intel_mipmap_tree::pitch and isl_surf::row_pitch
semantically equivalent.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
v2 (Jason):
- Don't trigger miptree re-creation in vain later on with ISL
based. Core GL uses zero to indicate single sampled while
ISL uses one - this would cause intel_miptree_match_image()
to always fail.
- Now that native miptree is already using sample number of
one, there is no need for MAX2() when converting to ISL.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Patch moves "assert(brw->num_samples <= 16)" from
emit_3dstate_multisample2() to upload_multisample_state(). Latter
is the only caller of the former and passes "brw->num_samples"
as argument. Therefore it is clearer to assert in the caller.
Possible bug fix in genX(emit_3dstate_multisample2) which
doesn't have a case for num_samples == 0 in the switch
statement.
It should be noted that intel_miptree_map()/unmap() now checks
additionally for "mt->surf.samples == 1" in order to support gen6
stencil which is already transitioned to ISL. This will go away in
next patch when native miptrees start to use isl_surf::samples as
well.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
If we have a compat profile context, it means that GL_QUADS[_STRIP] are
supported so this query makes sense. It's also legal for 3.2 core profile
because of a spec bug.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This bumps the libdrm requirement for amdgpu to the 2.4.82.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just introduces a central semaphore info struct, and passes it around,
and introduces some wrappers that will make porting off libdrm_amdgpu easier.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Not built by default. Currently only builds with icc.
v2:
* document knl,skx possibilities for swr_archs
* merge with changed loader lib selection code
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Allow configuration of the SWR architecture depend libraries
we build for with --with-swr-archs. Maintains current behavior
by defaulting to avx,avx2.
Scons changes made to make it still build and work, but
without the changes for configuring which architectures.
v2:
* add missing comma for swr_archs default
* check that at least one architecture is enabled
* modify loader logic to make it clearer how to add archs
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
cpuid.7 requires cx=0 to select the extended feature leaf.
avx512 detection was using the non-indexed cpuid resulting
in random non-detection of avx512.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
> checking for WAYLAND... no
>
> configure: error: Package requirements (wayland-client >= 1.11 wayland-server >= 1.11 wayland-protocols >= 1.8) were not met:
>
> No package 'wayland-protocols' found
>
> Consider adjusting the PKG_CONFIG_PATH environment variable if you
> installed software in a non-standard prefix.
>
> Alternatively, you may set the environment variables WAYLAND_CFLAGS
> and WAYLAND_LIBS to avoid the need to call pkg-config.
> See the pkg-config man page for more details.
Also, added extra path to PKG_CONFIG_PATH env variable.
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We incorrectly detected VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT. We looked
for the bit in VkImageCreateInfo::usage, but it's actually in
VkImageCreateInfo::flags.
Found by assertion failures while enabling VK_ANDROID_native_buffer.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The LD_LIBRARY_PATH environment variable could be already defined so
we extend it and restore it rather than just overwriting it.
v2:
- Unset the __old_ld helper variable when we are done with it.
- Corrected test for and escaping of variables (Eric).
v3: Remove unneeded variable (Emil).
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The "Perform basic testing" and "Use the release.sh script from xorg
util-modular" sections provide some instructions to do so. We add now
some comments in order to use a recent enough LLVM version to run
dist/distcheck and the automake generated binaries.
v2: Suggested the need to define LLVM_CONFIG also before running the
release.sh script.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Trailing space after the backslash meant the rest of the AM_CFLAGS lines
were no longer included.
This has been silently ignored because of the next line starting with
a `-` dash, instructing make to be silent about that line.
Fixes: 02cc359372 "egl/wayland: Use linux-dmabuf interface for buffers"
Cc: Daniel Stone <daniels@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Simply advertise all supported modifiers, independent of the format.
Special formats, like compressed, which don't support all those modifiers
are already culled from the dmabuf format list, as we don't support
the render target binding for them.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
This allows to create buffers with a specific tiling layout, which is primarily
used by GBM to allocate the EGL back buffers with the correct tiling/modifier
for use with the scanout engines.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
This allows the state trackers to know the tiling layout of the
resource and pass this through the various userspace protocols.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
This implements resource import with modifier, deriving the correct
internal layout from the modifier and constructing a render compatible
base resource if needed.
This removes the special cases for DDX and renderonly scanout allocated
buffers, as the linear modifier is enough to trigger correct handling
of those buffers.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
This reworks the logic in etna_update_sampler_source to select the
newest resource view for updating the texture view. This should make
the logic easier to follow and fixes texture updates from imported
dma-bufs.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
If we import a dma-buf with a sampler/pixel pipe incompatible modifier,
the imported buffer will end up in an external resource view. As
resource_changed signals the change of the imported resource, we need
to update the external view seqno, instead of the base resource seqno.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This fixes failures to import the scanout buffer with screen resolutions
that don't satisfy the RS alignment restrictions, like 1680x1050.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
The minimum RS alignment calculation is needed in various places.
Extract a helper to avoid open-coding the calcuation at every site.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The current way of importing the resource from renderonly after allocation
is opaque and is taking away control from the driver, which it needs in
order to implement more advanced scenarios than the simple linear scanout
with matching stride alignments.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
The following changes need the modifier definitions for the Vivante tiled
formats, which are shipped with libdrm 2.4.82.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Commit 463b7d0332c5("gallium: Enable ARM NEON CPU detection.")
introduced CPU feature detection based Android cpufeatures library.
Unfortunately it also added an assumption that if PIPE_OS_ANDROID is
defined, the library is also available, which is not true for the
standalone build without using Android build system.
Fix it by defining HAS_ANDROID_CPUFEATURES in Android.mk and replacing
respective #ifdefs to use it instead.
v2:
- Add a comment explaining why the separate flag is needed (Emil).
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Earlier commit refactored/split the parsing into separate hunks.
While no functional change was intended, it did not attribute that
different error is set when the attrib. value is incorrect.
Fixes: 3ee2be4113 ("egl: split _eglParseImageAttribList into per
extension functions")
Cc: Michel Dänzer <michel@daenzer.net>
Reported-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The extension should be present (if applicable) in the list returned by
getExtensions(). AFAICT no loader has ever looked for it in
__driDriverExtensions/__driDriverGetExtensions.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
The extension should be in the list as returned by getExtensions().
Seems to have gone unnoticed since close to nobody wants to change the
vblank mode for the software driver.
v2: Rebase
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
The option is only queried from the loader, which has access to the
dri common code in src/mesa/drivers/dri/common/.
One could grant the loader access to brw_config_options but even
then, having the same option in both places is not a good idea.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
It removes unused buffer_count variable from dri2_egl_surface.
And it polishes the assert of dri2_drm_get_buffers_with_format().
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This is a tiny housekeeping patch which does the following:
* Limit lines to 78 or fewer characters.
According to the mesa coding style guidelines.
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Because the color_buffers have a each unique bo, if the designated buffer is
found, release_buffer() can go out the loop which seaches the buffer.
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Adding linux-dmabuf Wayland protocol files as generated did the right
thing, by prepending $(MKDIR_GEN) so autotools didn't try to write into
a build directory which didn't yet exist.
Unfortunately MKDIR_GEN needs to be defined in every Makefile it's used
in (which we do now), or alternately defined and substituted in
configure.ac (which we don't do), and src/egl/ didn't actually have it
from either method. As unset variables expand to nothing, it was
silently being skipped.
Copy & paste the defintion to make sure drivers/dri2/ exists before we
try to generate files into it.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Nick Sarnie <commendsarnex@gmail.com>
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
The previous implementation of CLAMP() allowed NaN to pass through
unscathed, by failing both comparisons. NaN isn't exactly a value
between MIN and MAX, which can break the assumptions of many callers.
This patch changes CLAMP to convert NaN to MIN, arbitrarily. Callers
that need NaN to be handled in a specific manner should probably open
code something, or use a macro specifically designed to do that.
Section 2.3.4.1 of the OpenGL 4.5 spec says:
"Any representable floating-point value is legal as input to a GL
command that requires floating-point data. The result of providing a
value that is not a floating-point number to such a command is
unspecified, but must not lead to GL interruption or termination.
In IEEE arithmetic, for example, providing a negative zero or a
denormalized number to a GL command yields predictable results,
while providing a NaN or an infinity yields unspecified results."
While CLAMP may apply to more than just GL inputs, it seems reasonable
to follow those rules, and allow MIN as an "unspecified result".
This prevents assertion failures in i965 when running the games
"XCOM: Enemy Unknown" and "XCOM: Enemy Within", which call
glTexEnv(GL_TEXTURE_FILTER_CONTROL_EXT, GL_TEXTURE_LOD_BIAS_EXT,
-nan(0x7ffff3));
presumably unintentionally. i965 clamps the LOD bias to be in range,
and asserts that it's in the proper range when converting to fixed
point. NaN is not, so it crashed. We'd like to at least avoid that.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If the source is an indirect register, there is ralloc'd data. Copying
with a direct assignment will copy the pointer, but the data will still
belong to the old instruction's memory context. Since we're lowering
and throwing away instructions, that could free the data by mistake.
Instead, use nir_src_copy, which properly handles this.
This is admittedly not a common case, so I think the bug is real,
but unlikely to be hit.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
While it produces functioning code the pass creates worse code
for arrays of arrays. See the comment added in this patch for more
detail.
V2: skip splitting of AoA of matrices too.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This can happen if, for instance, you have an array of structs and there
are both direct and wildcard references to the same struct and some
members only have direct or only have indirect.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Cc: mesa-stable@lists.freedesktop.org
This guarantees that the value written in the batch matches the
value recorded in the relocation entry.
(Chris Wilson wrote an identical patch as well.)
The code doesn't get exactly a lot simpler but at least it is in a single
place, and we delete more than we add.
Another good point is that you get rid of struct brw_wm_unit_state
which was a third mechanism for encoding GEN state. We used to have
GENXML, manual packing and these bitfield structs. Now we're down to
just GENXML and some manual packing. (Khristian)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add the code into its own function and atom, since almost nothing is
shared with GEN >= 6.
v2: Split GEN <=5 and GEN >= 6 into separate functions (Ken).
v3: Minor tidying by Ken.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When available, use the zwp_linux_dambuf_v1 interface to create buffers,
which allows multiple planes and buffer modifiers to be used.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Now create_wl_buffer is generic enough, we can use it for the
EGL_WL_create_wayland_buffer_from_image extension.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Remove surface-specific code from create_wl_buffer, so it's now just a
generic translation from DRIimage to wl_buffer.
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This was only used in create_dumb() to blacklist planar formats.
However, the start of the function already whitelists ARGB8888 (cursor)
and XRGB8888 (scanout), and nothing else. So this entire function can be
removed.
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Luckily no-one really used the is_format_supported() call, because it
only supported three formats.
Also, since buffers with alpha can be displayed on planes, stop banning
them from use.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Wayland buffers coming from wl_drm use the WL_DRM_FORMAT_* enums, which
are identical to GBM_FORMAT_*. Similarly, FD imports do not need to
convert between GBM and DRI FourCC, since they are (almost) completely
compatible.
This widens the formats accepted by gbm_bo_import() when importing
wl_buffers; previously, only XRGB8888, ARGB8888, RGB565 and YUYV were
supported.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Current logic calls intel_renderbuffer_set_draw_offset() which in
turn tries to calculate x and y offset against layer/level settings
that are against the original miptree actually having sufficient
levels/layers. This returns correctly x=0 y=0 regardless of the given
layer/level only because one calls intel_miptree_get_image_offset()
which goes and consults miptree offset table which in turn luckily
contains entries for max-mipmap levels, all initialised to zero even
in case of non-mipmapped.
This patch stops consulting the table and simply sets the draw
offsets to zero that are compatible with the single slice miptree
backing the renderbuffer.
This prepares for ISL based miptrees that calculate offsets
on-demand and do not tolerate levels beyond what the miptree has.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This will falsely trigger an assert on number of layers once
isl is used for 3D layouts of Gen4 cube maps.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that image surface vertical slice calculator doesn't depend
on total_height, total dimensions are only needed when new buffer
objects are created. Therefore one can safely ignore them when
miptrees are created for already exisiting buffer objects.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This helps to drop dependency to miptree::total_height which is
used in brw_miptree_get_vertical_slice_pitch().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Once the driver moves to ISL both compressed and uncompressed have
the same type. One needs to tell them apart by other means. This
can be done by checking the existence of mcs_buf.
There is a short period of time within intel_miptree_create()
where mcs_buf doesn't exist yet (between calls to
intel_miptree_create_layout() and intel_miptree_alloc_mcs()).
First compute_msaa_layout() makes the decision if compression is
to be used and sets the msaa_layout type. Then based on the type
one sets aux_usage and finally decides if mcs_buf is needed.
This patch duplicates the logic in compute_msaa_layout() and uses
that to make the decision on aux_usage and mcs_buf allocation.
Most of the original logic in compute_msaa_layout() will be gone
in later patch leaving only one version.
Elsewhere only brw_populate_sampler_prog_key_data() needs to know
if compression is used based on the msaa_type. This is now
replaced with consideration for number of samples and existence
of mcs_buf. All other occurrences consider CMS || UMS which can
be represented using single the type of ISL_MSAA_LAYOUT_ARRAY
without any tweaks.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
same as irb::layer_count. In case of copies and blits msaa
surfacas already fall to blorp which natively works with logical
slices.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Checking against zero currently works as single sampling is
represented with zero. Once one moves to isl single sampling
really has sample number of one.
This keeps later patches simpler.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We don't support the general version yet because that requires us to
lower shared variables up-front in SPIR-V -> NIR. This shouldn't be a
whole lot of work but it's not something we support today.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Now that vtn_type has piles of unions, we should assert sanity before
setting fields that may stomp others.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The old table based spirv_*_to_string functions would return NULL for
any values "inside" the table that didn't have entries. The tables also
needed to be updated by hand each time a new spirv.h was imported.
Generate the file instead.
v2: Make this script work more like src/mesa/main/format_fallback.py.
Suggested by Jason. Remove SCons supports. Suggested by Jason and
Emil. Put all the build work in Makefile.nir.am in lieu of adding a new
Makefile.spirv.am. Suggested by Emil. Add support for Android builds
based on code provided by Emil.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This query is not allowed in GL core profile 3.3 and later (since
GL_QUADS and GL_QUAD_STRIP are disallowed). The query was (mistakenly)
supported in GL 3.2. This fixes the glGet error test accordingly.
Reviewed-by: Neha Bhende<bhenden@vmware.com>
This looks like a regression from df30123794 ("radv: use
ac_compute_surface"). Before that, the opt4Space addrlib flag was set
to true unless the image has FMASK (ac_compute_surface will similarly
only set that flag for images without FMASK).
This saves multiple gigabytes of VRAM on one of our games, and brings
its VRAM utilisation on RADV in line with AMDGPU-PRO and NVIDIA.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
"...and stay dead!"
Rafael deleted this file in c2b5a26dc2
(i965: Convert SF_STATE to genxml.) but Marek accidentally brought it
back in commit e7a091936f (mesa: replace
ctx->Polygon._FrontBit with a helper function) when resolving conflicts.
It's not actually even compiled, but it's still here trolling people
into thinking it still exists and needs patching.
Translate the NIR variables directly to LLVM instead of lowering to a
TGSI-style giant array of vec4's and then back to a variable. This
should fix indirect dereferences, make shared variables more tightly
packed, and make LLVM's alias analysis more precise. This should fix an
upcoming Feral title, which has a compute shader that was failing to
compile because the extra padding made us run out of LDS space.
v2: Combine the previous two patches into one, only use this for shared
variables for now until LLVM becomes smarter.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Alex Smith <asmith@feralinteractive.com>
Otherwise, if a client gave us a list of modifiers that contained a
modifier we understand but which is not supported on the hardware, we
might return that one and then fail to create the image.
Reviewed-by: Daniel Stone <daniels@collabora.com>
This commit splits the mapping in half. The modifier_infos table now
only contains the modifier and the since_gen field. The tiling bits
have been moved into a table in tiling_to_modifier as that's the only
place it was ever used. The modifier_is_supported function now takes a
devinfo and does the since_gen check.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Now that we have an actual aux_usage field, we no longer need the
complex logic of is_lossless_compressed in order to figure out if a
miptree is CCS_E compressed. As a side-effect, there is not longer any
need to overload MSAA_LAYOUT_CMS for CCS_E and we can stop doing so.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
HiZ, like MCS and CCS_E, can compress more than just clear colors so we
want it turned on whenever the miptree is being used as a depth
attachment. It's theoretically possible for someone to create a depth
texture, upload data with glTexSubImage2D, and texture from it without
ever binding it as a depth target. If this happens, we would end up
wasting a bit of space by allocating a HiZ surface we never use.
However, this is rather unlikely out side of test cases, so we're better
off just allocating it up-front.
Reviewed-by: Chad Versace <chadversary@chromium.org>
We need this split for the same reason that we need the split for CCS:
intel_miptree_supports_hiz is called *before* we choose the actual
tiling. Adding a tiling_supports_hiz helper lets choose_aux_usage
more accurately decide whether or not to enable hiz. In particular,
this prevents us from enabling HiZ on linear depth buffers.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fixes piglit test crash when context creation fails.
v2: As suggested by Brian, move the init to st_create_context_priv()
Reviewed-by: Brian Paul <brianp@vmware.com>
Enable the capability if the DRM supports it.
Hook up mechanism to send and receive fence FD from the DRM.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Connect fence_get_fd, fence_create_fd, and fence_server_sync.
Implement the required functions in vmw_fence module.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Connect fence_get_fd, fence_create_fd, and fence_server_sync.
Return PIPE_CAP_NATIVE_FENCE_FD capability based on what the
winsys reports
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The new interfaces will be used to enable
EGL_ANDROID_native_fence_sync.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The timeout parameter is required to implement
EGL_ANDROID_native_fence_sync.
v2
* Replaced default timeout from 0 to PIPE_TIMEOUT_INFINITE
* Add more documentation to the new timeout parameter
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
From the Vulkan 1.0.53 spec VU for vkCreateImageView:
"image must have been created with a usage value containing at least
one of VK_IMAGE_USAGE_SAMPLED_BIT, VK_IMAGE_USAGE_STORAGE_BIT,
VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT,
VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT, or
VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT"
We were missing VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT from out list.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
If the queue is full, util_queue_add_job will wait while bo_fence_lock is
held.
It pb_slab wants to reuse a buffer, it will lock the pb_slab mutex and
try to check BO fence busyness, but it has to wait for bo_fence_lock to get
released. Both bo_fence_lock and pb_slab mutex are locked now.
When the CS thread unreferences and releases a suballocated buffer,
it will try to lock the pb_slab mutex and has to wait. The CS thread
can't finish its job in order to free a queue slot and unblock
util_queue_add_job ==> deadlock.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Consider the following situation:
mtx_lock(mutex);
do_something();
util_queue_add_job(...);
mtx_unlock(mutex);
If the queue is full, util_queue_add_job will wait for a free slot.
If the job which is currently being executed tries to lock the mutex,
it will be stuck forever, because util_queue_add_job is stuck.
The deadlock can be trivially resolved by increasing the queue size
(reallocating the queue) in util_queue_add_job if the queue is full.
Then util_queue_add_job becomes wait-free.
radeonsi will use it.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
During bring-up, this is often 0. Prevent automatic disablement of
ARB_timer_query and demotion of the OpenGL version to 3.2 by setting
a non-zero frequency. Print an error message instead.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
For inputs and outputs, indirect indexing is lowered by the GLSL compiler.
For temporaries, use alloca and disable the "promote-alloca" pass.
In the future, we could switch all codepaths to alloca permanently and
just rely on the "promote-alloca" pass.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Mesa here requires the scaling lists in diagonal scan order, but
VAAPI passes them in raster scan order. Therefore, rearrange the
elements when copying.
v2: Move scan tables to vl_zscan.c.
Fix type in size assertion.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
One can override the deviceID, by setting the INTEL_DEVID_OVERRIDE
variable. A few symbolic names or a numerical value for the actual
device ID is accepted.
At the same time we're using strtod (string to double) to convert the
string to a decimal numeral. A seeming thinko, made by the original
commit that introduces the code in libdrm_intel and got here with the
import.
Fixes: 514db96c11 ("i965: Import libdrm_intel.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There seems to be a rounding difference with F2I vs nearest filtering.
The precise problem in the rounding is unknown.
This fixes an incorrect output with OpenMAX encoding.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Otherwise, ImmutableLevels is 0, which is an illegal value. Later,
_mesa_meta_setup_sampler will use _mesa_texture_parameteriv to set
texObj->MaxLevel = CLAMP(params[0], texObj->BaseLevel,
texObj->ImmutableLevels - 1);
which turns into a completely bogus CLAMP(value, 0, -1)...where the
upper bound is smaller than the lower bound. This ends up being -1
today due to the way CLAMP is implemented, which is a bogus MaxLevel.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Grigori recently added EGL_KHR_create_context_no_error support,
which causes EGL to pass a new __DRI_CTX_FLAG_NO_ERROR flag to
drivers when requesting an appropriate context mode.
driContextSetFlags() will already handle it properly for us, but the
classic drivers all have code to explicitly balk at unknown flags. We
need to let it through or they'll fail to create a no_error context.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
When using DCC some clear values don't require a cmask eliminate
step. This patch adds support for black and black with alpha 1,
there are other values, but I don't have access to a comprehensive list.
This works by setting the cmask eliminate predicate when doing the
fast clear, and later when doing the cmask elimination making sure
the draws are predicated.
This increases the fps on Sascha Willems deferred.
Tonga: 580fps->670fps on a Tonga PRO card.
Polaris 730->850fps
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We can only fast clear 128-bit images if the r/g/b channels
are the same, and we are using DCC.
For DCC we'll bail out on translate if this isn't true,
and we catch cmask clears explicitly.
v2: remove 64-bit block (Bas), add uint32 as well.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch uses addrlib to workout the tile swizzles according
to the surface index. It seems to produce the same values as
amdgpu-pro for the deferred test.
v2: don't apply swizzle to CMASK. the eg docs don't mention
it, and we clearly don't align cmask for that.
v3: disable surf index for dedicated images, as these will
most likely be shared, and I don't think the metadata has
space for this info in it yet.
v4: update for shareable images, rename combined_swizzle
to tile_swizzle
This gets the deferred demo from 730->950fps on my rx480.
(dcc cmask elim predication patches get it further)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some of the Sascha Willems demos pick a D32/S8 format for the depth
buffer, then do a LOAD_OP_CLEAR/LOAD_OP_DONT_CARE on it, which means
we don't get to merge the undefined->depth and clear htile transitions.
This add the stencil aspect to the pending clears if there is a depth
clear pending and the stencil aspect is don't care.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
To not confuse apps in thinking it might be faster.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
NV isn't valid for external images anymore.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: 6ddc64b93e "radv: Add support for VK_KHR_dedicated_allocation."
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
This effectively reverts commit 43a171878bb4b5aedb36a. Technically,
VK_KHR_get_memory_requirements2 and VK_KHR_dedicated_allocation are
required for the KHR version but this at least restores the removed
functionality. This patch builds but has received zero testing.
Acked-by: Dave Airlie <airlied@redhat.com>
Fished the SparseImage call out of the headers as the spec missed
the definition.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Dave Airlie <airlied@redhat.com>
We always recommend sub-allocation and don't do anything special for
dedicated allocations.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
There is one small ANV change here because we used the
VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX enum in the BO cache and that had
to be updated to have the _KHR suffix.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
These have been formally deprecated by Khronos never to be shipped
again. The KHR versions should be implemented/used instead.
Acked-by: Dave Airlie <airlied@redhat.com>
These have been formally deprecated by Khronos never to be shipped
again. The KHR versions should be implemented/used instead.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
These have been formally deprecated by Khronos never to be shipped
again. The KHR versions should be implemented/used instead.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This accidentally set __DRI_CTX_FLAG_NO_ERROR whenever any flags were
present. Just needs extra parenthesis.
Fixes: 4909519a66 (egl: Add EGL_KHR_create_context_no_error support)
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Fixes performance regression from f50aa21456 - was forcing internal
code generation to target AVX (no gather, etc).
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This only adds the EGL side, needs to be plumbed into Mesa frontend.
v2: Add check for extension availability.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add a new context flag and plumb it through the various layers of the
context creation code to set up dispatch tables for the no-error mode.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This basic extension allows usage of the __DRI_CTX_FLAG_NO_ERROR flag.
This includes support code for classic Mesa drivers to switch on the
no-error mode if the flag is set.
v2: Move to common DRI code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add async marshalling/unmarshalling for all glClearBuffer variants.
These entry points are commonly used in general and Alien Isolation
specifically uses glClearBufferiv. Slightly reduces the number of
thread synchronizations with glthread in that game.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Extract clear buffer helper functions in preparation for adding
marshal/unmarshal functions for the various glClearBuffer variants.
v2: Fix command size.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The fps graph for example calculates the fps as double with small
variations based on when query_new_value() is called, which causes
many values to be truncated on the cast to uint64_t.
The HUD internally stores the values as double, so just use double
everywhere instead of fixing this with rounding. Using doubles also
allows the hud to show small variations instead of being clamped to
discrete values.
v2: Don't print decimals in the dump file when not necessary
Signed-off-by: Christoph Haag <haagch+mesadev@frickel.club>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This reverts commit d8b2ccdb88, which causes priglit regressions on GPUs
with SNORM support. We'll have another try at enabling this feature after
the 17.2 branchpoint.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
A dangling bo object would result in memory corruption while loading a
level in ioquake3_opengl2.
Fixes: 330d0607ed (gallium: remove pipe_index_buffer and set_index_buffer)
Suggested-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
GC3000 has a new LOG instruction, similar to the new SIN and COS instructions.
Generate the new instruction sequence when appropriate; there are
two occasions, as part of LIT and the generator for the LG2
instruction itself.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
If we blit from a rendertarget or a depthstencil buffer there might still
be dirty data in the TS buffer which needs to be flushed out.
Fixes missing shadow tiles in glmark2 shadow.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Before resolving a rendertarget or a depth/stencil resource into a
texture, flush both the color cache and the depth cache together.
It is unclear whether this is necessary for the following stall to
work properly, or whether the depth flush just adds enough time
for the color cache flush to finish before the resolver is started,
but this change removes artifacts that otherwise appear if a texture
is sampled directly after rendering into it.
The test case is a simple QML scene graph with a QtWebEngine based
WebView rendered on top of a blue background:
import QtQuick 2.0
import QtQuick.Window 2.2
import QtWebView 1.1
Window {
Rectangle {
id: background
anchors.fill: parent
color: "blue"
}
WebView {
id: webView
anchors.fill: parent
}
Component.onCompleted: {
webView.url = "<some animated website>"
}
}
If the website is animated, the WebView renders the site contents into
texture tiles and immediately afterwards samples from them to draw the
tiles into the Qt renderbuffer. Without this patch, a small irregular
triangle in the lower right of each browser tile appears solid blue, as
if the texture sampler samples zeroes instead of the website contents,
and the previously rendered blue Rectangle shows through.
Other attempts such as adding a pipeline stall before the color flush or
a TS cache flush afterwards or flushing multiple times, with stalls
before and after each flush, have shown no effect.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Apparently this can happen. Just bail out early in that case, as all the called
functions return NULL in that case.
Fixes weston-terminal for me.
Fixes: 147d7fb772 ("st/mesa: add a winsys buffers list in st_context")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Use a slightly more explicit version cap for binding wl_drm, so we can
add other interfaces with different versioning schemes later.
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
u_vector.h doesn't actually use anything from u_math, but it does mean
everyone has to pull in src/gallium/auxiliary/util includes.
Just remove it, adding a <string.h> include to u_vector.c to cover
memcpy.
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
In the previous commit, forgot to apply v2 suggestions.
Fixes: 28d0c38 (anv/pipeline: use unsigned long long constant to check
enable vertex inputs)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
When initializing the ANV pipeline, one of the tasks is checking which
vertex inputs are enabled. This is done by checking if the enabled bits
in inputs_read.
But the mask to use is computed doing `(1 << (VERT_ATTRIB_GENERIC0 +
desc->location))`. The problem here is that if location is 15 or
greater, the sum is 32 or greater. But C is handling 1 as a 32-bit
integer, which means the displaced bit is out of range and thus the full
value is 0.
Thus, use 1ull, which is an unsigned long long value.
This fixes:
dEQP-VK.pipeline.vertex_input.max_attributes.16_attributes.binding_one_to_one.interleaved
v2: use 1ull instead of BITFIELD64_BIT() (Matt Turner)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
This actually takes advantage of the newly pushed UBO data, avoiding
pull loads.
Improves performance in GLBenchmark Manhattan 3.1 by:
HSW: ~1%, BDW/SKL/KBL GT2: 3-4%, SKL GT4: 7-8%, APL: 4-5%.
(thanks to Eero Tamminen for these numbers)
shader-db results on Skylake, ignoring programs with spill/fill changes:
total instructions in shared programs: 13963994 -> 13651893 (-2.24%)
instructions in affected programs: 4250328 -> 3938227 (-7.34%)
helped: 28527
HURT: 0
total cycles in shared programs: 179808608 -> 172535170 (-4.05%)
cycles in affected programs: 79720410 -> 72446972 (-9.12%)
helped: 26951
HURT: 1248
LOST: 46
GAINED: 21
Many "Deus Ex: Mankind Divided" shaders which already spilled end up
spill a lot more (about 240 programs hurt, 9 helped). The cycle
estimator suggests this is still overall a win (-0.23% in cycle counts)
presumably because we trade pull loads for fills.
v2: Drop "PULL" environment variable left in for initial debugging
(caught by Matt).
Reviewed-by: Matt Turner <mattst88@gmail.com>
With UBOs, the answer of "have we decided to push this uniform" gets
a bit more complicated - for one, we have multiple surfaces. This
patch refactors things so we can add the new code in a single place.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch starts uploading UBO data via 3DSTATE_CONSTANT_* packets,
and updates the compiler to know that there's extra payload data, so
things continue working. However, it still issues pull loads for all
data. I wanted to separate the two aspects for greater bisectability.
v2: Update for new intel_bufferobj_buffer parameter.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is an annoyingly big hammer, but it seems less mean than disabling
UBO pushing, and I'm not sure what else to do.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously we would re-upload the constant data to the batchbuffer,
then re-emit the packets. We only need to do the last step (causing
the existing data in the batchbuffer to be re-uploaded to the push
constant staging area in the L3).
Now that we've separated the two, it's pretty easy to accomplish.
Reviewed-by: Matt Turner <mattst88@gmail.com>
I hope to upload UBO via 3DSTATE_CONSTANT_XS packets, in addition to
normal uniforms. In order to do that, I'll need to re-emit the packets
when UBOs change. But I don't want to re-copy the regular uniform data
to the batchbuffer every time.
This patch separates out the data uploading from the packet submission.
We're running low on dirty bits, so I made the new atom happen on every
draw call, and added a flag to stage_state indicating that we want the
packet for that stage emitted.
I would have preferred to do this outside the atom system, but it has
to happen between the uploading of push constant data and the binding
table upload.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Right now, we always upload new push constant data, and immediately
emit 3DSTATE_CONSTANT_* packets. We call intel_upload_space and store
the resulting BO pointer in brw->curbe.curbe_bo. We read that when
emitting the packets. This works today, but is fragile - it depends on
upload and packet emission being interleaved.
If we instead were to upload all the data, then emit all the packets,
then upload BO wrapping will get us into trouble. For example, the VS
constants may land in one upload BO, but the FS constants may not fit
and land in a second upload BO. Uploading FS constants would overwrite
the brw->curbe.curbe_bo pointer, so when we emitted 3DSTATE_CONSTANT_VS,
we'd get the wrong BO.
I intend to separate out this code in a future commit, so I need to fix
this. To fix it, we simply store a per-stage BO pointer.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This adds a NIR pass that decides which portions of UBOS we should
upload as push constants, rather than pull constants.
v2: Switch to uint16_t for the UBO block number, because we may
have a lot of them in Vulkan (suggested by Jason). Add more
comments about bitfield trickery (requested by Matt).
v3: Skip vec4 stages for now...I haven't finished wiring up support
in the vec4 backend, and so pushing the data but not using it
will just be wasteful.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Soon, we're going to start providing UBO data to shaders as push
constants, rather than requiring them to issue pull loads. The
3DSTATE_CONSTANT_* commands require 32 byte aligned pointers.
So, we need to increase this from 16 to 32.
Reviewed-by: Matt Turner <mattst88@gmail.com>
By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic
state base address. This makes it unusable for pushing UBOs. I'd like
to be able to use all four push buffers.
There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake)
which controls whether buffer 0 is relative to dynamic state base
address, or simply a normal pointer. Setting that gives us full
flexibility.
We can't currently write this on Haswell and earlier, and will need
to update the kernel command parser, and then do the whole version
checking song and dance.
Reviewed-by: Matt Turner <mattst88@gmail.com>
When writing a region of a buffer via glBufferSubData(), we can write
the data asynchronously if the destination doesn't contain any data.
Even if it's busy, the data was undefined, so the new data is fine too.
Removes all stall avoidance blits on BufferSubData calls in
"Total War: WARHAMMER" on my Skylake GT4.
Decreases the number of stall avoidance blits in Manhattan 3.1:
- Skylake GT4: -18.3544% +/- 6.76483% (n=13)
- Apollolake: -12.1095% +/- 5.24458% (n=13)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This doesn't do anything yet, but soon we'll want to know whether an
access to a buffer section may write that data, or simply reads it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Merge the code with gen6+ 3DSTATE_GS, and delete brw_gs_state.c,
together with brw_gs_unit_state.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since we always call brw_batch_emit anyways, we can hopefully make things
simpler by calling it only once, and then branching inside its body. This
can be helpful when bringing the gen4-5 code into this function.
Additionally, check for GEN_GEN == 6 instead of < 7 in cases that won't apply
to lower gens.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This function only emits a particular case of 3DSTATE_GS. Instead, we can do
that inside genX(upload_gs_state), and later reuse part of that code for
emitting gen4-5 state.
There's the additional benefit of allowing us to remove gen6_gs_state.c, which
was only left because of this function.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use set_blend_entry_bits and set_depth_stencil_bits to fill most of the
color calc struct, and then manually update the rest.
v2:
- Always check for depth_irb (Ken)
- Always set Backface Stencil Ref (Ken)
- Always set alpha reference value (Ken)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gen6+ uses _mesa_base_format_has_channel() to check for the alpha
channel, while gen4-5 use ctx->DrawBuffer->Visual.alphaBits. By using
_mesa_base_format_has_channel() here we keep the same behavior accross
all gen.
While initially both ways of checking the alpha channel seemed correct
to me, this change also seems to fix fbo-blending-formats piglit test on
gen4.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add a helper function to reuse code that fills blend entry related
state, and make genX(upload_blend_state) use it. This function can later
be used by gen4-5 color calc state to set the blend related bits.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen4-5 basically glue DEPTH_STENCIL_STATE, COLOR_CALC_STATE, and
BLEND_STATE together into a single COLOR_CALC_STATE structure.
By making a helper function, we'll be able to reuse it when filling
out Gen4-5 COLOR_CALC_STATE without replicating any actual logic.
We use generation-defined typedef to handle the polymorphism.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
If we allow the size to be more than 2^32, then we should compute it
in 64bit arithmetic otherwise we might run into overflow issues.
CID: 1412892, 1412891
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The compact flag doesn't make sense on local variables, since the
packing on them is up to the driver. This fixes nir_validate assertions
in some cases, particularly when lower_io_to_temporaries is used on
per-vertex inputs/outputs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
While normally we give variables whose name field is NULL a temporary
name when called from nir_print_shader(), when we were calling from
nir_print_instr() we never bothered, meaning that we just segfaulted
when trying to print out instructions with such a variable. Since
nir_print_instr() is meant to be called while debugging, we don't need
to bother too much about giving a consistent name, but we don't want to
crash in the middle of debugging.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's a bit rare, but blorp can trigger a urb reconfiguration. When
that happens, we need to re-upload the URB config. Previoulsy blorp
would set BRW_NEW_URB_SIZE, but this is a pretty big hammer as it
would cause back-to-black blorp operations to reconfigure both times.
Using BRW_NEW_BLORP is a small, more accurate hammer.
v2 (idr): Sort BRW_NEW_ tokens to match brw_recalculate_urb_fence and
gen6_urb.
v3 (idr): Don't whack BRW_NEW_URB_SIZE in blorp. Suggested by Jason.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add support for 32-bit RGBX/RGBA formats which are required for Android.
The original patch (commit ccdcf91104) was reverted (commit
c0c6ca40a2) in mesa as it broke GLX resulting in swapped colors. Based
on further investigation by Chad Versace, moving the RGBX/RGBA configs
to the end is enough to prevent breaking GLX.
The handling of RGBA/RGBX in dri_fill_st_visual is a fix from Marek
Olšák.
Cc: Eric Anholt <eric@anholt.net>
Cc: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Previous check-ins without testing with USE_SIMD16_FRONTEND have
introduced regressions. This fixes the build, not the regressions.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Core will ensure hot tiles are loaded for read and write render targets,
and will skip all output merger for read-only render targets.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Forwarding from the ES prolog to the ES just barely exceeds the current
maximum array size when 16 vertex attributes are used. Give it a decent
bump to account for merged shaders having up to 32 user SGPRs.
Fixes a crash in GL45-CTS.multi_bind.draw_bind_vertex_buffers.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If the application hasn't done any drawing since the last call, we
would reuse the same back buffer which was used for the previous swap,
which may not have completed yet. This could result in various issues
such as tearing or application hangs.
In the normal case, the behaviour is unchanged.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97957
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101683
Cc: mesa-stable@lists.freedesktop.org
[Michel Dänzer: Make Thomas' fix from bugzilla actually work as
intended, write commit log]
Any form of CCS on gen9+ only works on Y-tiled images. The only caller
of create_for_bo which uses Y-tiled BOs is create_for_dri_image.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
We want to start using create_for_dri_image for all miptrees created
from __DRIimage, including those which come from a window system. In
order to allow for fast clears to still work on window system buffers,
we need to allow for creating aux surfaces.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
The __DRI_FORMAT enums are all UNORM but we will frequently want sRGB
when creating miptrees for renderbuffers. This lets us specify.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Due to the wonders of autogeneration, this new version covers a few
formats that the old version was missing:
MESA_FORMAT_SRGB8_ALPHA8_ASTC_3x3x3
MESA_FORMAT_SRGB8_ALPHA8_ASTC_4x3x3
MESA_FORMAT_SRGB8_ALPHA8_ASTC_4x4x3
MESA_FORMAT_SRGB8_ALPHA8_ASTC_4x4x4
MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x4x4
MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5x4
MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5x5
MESA_FORMAT_SRGB8_ALPHA8_ASTC_6x5x5
MESA_FORMAT_SRGB8_ALPHA8_ASTC_6x6x5
MESA_FORMAT_SRGB8_ALPHA8_ASTC_6x6x6
Reviewed-by: Chad Versace <chadversary@chromium.org>
Later commits require intel_update_image_buffer() to have control over
the miptree creation. However, intel_update_winsys_renderbuffer_miptree()
currently creates it based on the given buffer object. This patch moves
the creation to the caller side.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
There is nothing particularly useful to do currently if the update
fails, but there is no point carrying on either. As a result, this has a
behavior change.
v2: Make the return type a bool (Topi)
v3: Don't leak the bo if update_winsys_renderbuffer fails. (Jason)
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This does make a tiny functional change in that we now also test for
whether or not the format supports texturing and not just rendering.
However, this should have no practical effect as all renderbuffers use
texturable formats.
Reviewed-by: Chad Versace <chadversary@chromium.org>
This is what we do in intel_image_target_renderbuffer_storage and it
makes more sense than stomping them. Because the image gets created as
a 2D image with one miplevel, they should already be equal to the
provided width/height. Adding the tile offset makes some sense
depending on how you interpret the fields.
The only place these fields are used for in state setup is to set up the
image parameters we pass into shaders. There may be issues here if you
try to use image_load_store on something pulled in from EGL but that's
probably broken already. This just makes it consistently broken.
Reviewed-by: Chad Versace <chadversary@chromium.org>
This is mostly a direct port. The only bit of refactoring that was done
was to make creating a planar miptree be an early return from the
non-planar case. Alternatively, we could have three functions: two
helpers and a main function to just call the right helper. Making the
planar case an early return seemed cleaner.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We were using the "cp" union fields, which are only valid for compute
shaders. The threads calculation affects the available GPRs, so just
pick a small number for other shader types to avoid limiting available
registers.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
The comments are correct - we get -1 and 0. However by adding 1, we
convert this into 0,1. This mostly works for conditionals, but when
negated, this will yield the wrong result. Instead just negate the
values (as they are backwards -- -1 means back instead of front).
Fixes tests/shaders/glsl-fs-frontfacing-not.shader_test and
dEQP-GLES3.functional.shaders.builtin_variable.frontfacing on A530.
The latter also tested on A306 by Rob Clark.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
If a cube image has VK_IMAGE_USAGE_STORAGE_BIT set, the type in an image
view's descriptor was set to a 2D array (and a few other fields adjusted
accordingly). This is correct when the image view is actually bound as a
storage image, but not when bound as a sampled image. In that case the
type should be set as a cube.
Fix by generating 2 sets of descriptors at view creation time for both
storage and non-storage usage, and then choose between them based on
descriptor type when writing descriptor sets.
v2: Generate storage descriptors for images with TRANSFER_DST, since
those may be used as storage images internally.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This free was left in after dynamic descriptors were changed to not be
allocated separately from the descriptor set, and can cause a crash.
Fixes: 39644fa40a ("radv: Don't allocate dynamic descriptors separately")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If size of client memory copy is too large, don't copy. The draw will
access user-buffer directly and then block. This is faster and more
efficient than queuing many large client draws.
Applications that still use large client arrays benefit from this. VMD
is an example.
The threshold for this path defaults to 32KB. This value can be
overridden by setting environment variable SWR_CLIENT_COPY_LIMIT.
v2: Use #define for default value, rather than hard-coded constant.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Moved reading of environment config options out of
swr_create_screen_internal, into a separate swr_validate_env_options.
This is to keep from cluttering create_screen.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Removed the hard-coded constant in favor of a #define. Also removed
TODO comment. The constant value doesn't need an environment
configurable option.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Since commit 7f80a9ff13 ("vc4: Introduce XML-based packet header
generation like Intel's."), the vc4 build on Android is broken:
out/target/product/linaro_x86_64/gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h:12:10: fatal error: 'v3d_packet_helpers.h' file not found
external/mesa3d/src/gallium/drivers/vc4/vc4_cl_dump.c:28:10: fatal error: 'vc4_packet.h' file not found
The path of the generated header needs to be fixed since we build out of
tree.
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Rob Herring <robh@kernel.org>
Commit a5e733c6b5 fixes the dangling
framebuffer object by unreferencing the window system draw/read buffers
when context is released. However this can prematurely destroy the
resources associated with these window system buffers. The problem is
reproducible with Turbine Demo running with VMware driver. In this case,
the depth buffer content was lost when the context is rebound to a
drawable.
To prevent premature destroy of the resources associated with
window system buffers, this patch maintains a list of these buffers in
the context, making sure the reference counts of these buffers will not
reach zero until the associated framebuffer interface objects no
longer exist. This also helps to avoid unnecessary destruction and
re-construction of the resources associated with the framebuffer.
Fixes VMware bug 1909807.
Reviewed-by: Brian Paul <brianp@vmware.com>
The locking was supposed to go away in commit 314647c4c2
(i965: Drop global bufmgr lock from brw_bo_map_* functions.), but
this lone unlock remains.
I'm guessing I messed this up when splitting up Chris's patch.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
X11 and GL compositor performance on VC4 has been terrible because of our
SHARED-usage buffers all being forced to linear. This swaps SHARED &&
!LINEAR buffers over to being tiled.
This is an expected win for all GL compositors during rendering (a full
copy of each shared texture per draw call), allows X11 to be used with
decent performance without a GL compositor, and improves X11 windowed
swapbuffers performance as well. It also halves the memory usage of
shared buffers that get textured from. The only cost should be idle
systems with a scanout-only buffer that isn't flagged as LINEAR, in which
case the memory bandwidth cost of scanout goes up ~25%.
This implements the EGL_EXT_image_dma_buf_import_modifiers extension,
supporting the VC4 T_TILED modifier.
v2: Added modifier support to resource creation/import, and
advertisement (by daniels).
v3: Fix old-kernel fallback path, fix compiler error and warnings, and
comment touchups (by anholt).
Reviewed-by: Daniel Stone <daniels@collabora.com>
Rather than open-coding populating the first slice inside resource
import, use vc4_setup_slices to do it for us.
v2: Rebase on VC4_DEBUG=surf change
Reviewed-by: Daniel Stone <daniels@collabora.com>
Needing to get our uapi header from libdrm has only complicated things.
Follow intel's lead and drop our requirement for it.
Generated from the same commit mentioned in the README.
v2: Update Android.mk as well, move vc4_drm.h reference for distcheck.
Reviewed-by: Daniel Stone <daniels@collabora.com>
I want to remove vc4's dependency on headers from libdrm as well, but
storing multiple copies of drm_fourcc.h in our tree would be silly.
v2: Update Android.mk as well, move distcheck drm*.h references to
top-level noinst_HEADERS.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Daniel Stone <daniels@collabora.com> (v1)
Reviewed-by: Rob Herring <robh@kernel.org>
This fixes 32-bit builds of the driver. Commit 08413a81b9
changed things so that we now put struct anv_states in the u_vector for
binding tables. On 64-bit builds, sizeof(struct anv_state) is a power
of two but it isn't on 32-bit builds.
Fixes: 08413a81b9
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
With ealier commit we relaxed the requirement from C++14 to C++11.
Update the build script so that it
Cc: Tim Rowley <timothy.o.rowley@intel.com
Fixes: 0b80b02502 ("swr: relax c++ requirement from c++14 to c++11")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
DRI_IMAGE's createImageFromTexture is used to implement the extension,
so we should check for it prior to advertising.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Drop the (duplicate) top-level check in dri2_create_image_khr() and add
the respective checks in dri2_create_image_khr_{texture,renderbuffer}
v2: use unreachable instead of assert in dri2_create_image_khr_texture
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
If the respective extension is not supported, one should return
EGL_BAD_PARAMETER as mentioned in earlier commits.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Although not listed amongst the initial EGL_LINUX_DRM_FOURCC_EXT and
friends list, the spec reads
... Required attributes and their values are as
follows:
* EGL_WIDTH & EGL_HEIGHT: The logical dimensions of the buffer in pixels
* EGL_LINUX_DRM_FOURCC_EXT: The pixel format of the buffer, as specified
by drm_fourcc.h and used as the pixel_format parameter of the
drm_mode_fb_cmd2 ioctl.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Will allow us to simplify existing code and make further improvements
short and simple.
No functional change intended.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
As per EGL_KHR_image_base:
If an attribute specified in <attrib_list> is not one of the
attributes listed in Table bbb, the error EGL_BAD_PARAMETER is
generated.
We should set the error as opposed to simply log it.
Currently we have a partial solution, whereby only some of the callers
call _eglError().
Since that has proven to be less robust, simply set the error by the
function itself and change the return type to EGLBoolean, updating the
callers.
So now the code is slightly simpler. Plus the follow-up fixes will be
easier to manage.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Don't bother allocating any memory until we're finished parsing and
sanitising all the attributes.
As a nice side effect we now consistently set eglError when any of
the attrib/values are not correct.
Strangely enough the spec does not mention _anything_ about what error
should be set where, even if the implementation already sets the odd
one.
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Commit bfe1e7737a changed how texture swizzles are set up.
This exposed a latent bug in the VMware driver: we were ignoring
the texture instruction's writemask when applying the 0 and 1
swizzle terms.
This wasn't caught by the Piglit texture swizzle test because it
only exercises fixed function (no write masking).
Fixes issues seen with ETQW apitrace.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Valgrind doesn't actually implement VALGRIND_FREELIKE_BLOCK as the
exact inverse of VALGRIND_MALLOCLIKE_BLOCK. It makes the block
inaccessible, but still leaves it defined in its allocation tracker i.e.
it will report the mmap as lost despite the call to FREELIKE!
Instead of treating the mmap as an allocation, treat it as changing the
access bits upon the memory, i.e. that it becomes defined (because of
the buffer objects always contain valid content from the user's
perspective) upon mmap and inaccessible upon munmap. This makes memcheck
happy without leaving it thinking there is a very large leak.
Finally for consistency, we treat all the mmap/munmap paths the same
even though valgrind can intercept the regular mmap used for GTT. We
could move this in the drm_mmap/drm_munmap macros, but that quickly
looks ugly given the desire for those to support different OSes, but I
didn't try that hard!
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When using a read-only CPU mapping, we may encounter stale buffer
contents. For example, the Piglit primitive-restart test offers the
following scenario:
1. Read data via a CPU map.
2. Destroy that buffer.
3. Create a new buffer - obtaining the same one via the BO cache.
4. Call BufferSubData, which does a GTT map with MAP_WRITE | MAP_ASYNC.
(We avoid set_domain for async mappings, so no flushing occurs.)
5. Read data via a CPU map.
(Without explicit clflushing, this will contain data from step 1!)
Otherwise, everything ought to work, keeping in mind that we never use
CPU maps for writing - just read-only CPU maps.
This restores the performance gains after Matt's revert in commit
71651b3139.
v2: Do the invalidate later, and even when asking for a brand new map.
v3: Add more comments from Chris.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Just map the buffer and memcpy. This will do a CPU mmap, which should
be reasonably efficient, and doing this gives us full control over the
domains and caching instead of leaving it to the kernel.
This prevents regressions on Braswell in the next commit. Specifically
GL45-CTS.shader_atomic_counters.basic-buffer-operations. Because async
maps start skipping set-domain, the pread thought everything was nicely
still in the CPU domain, and returned stale data.
v2: Use _mesa_error_no_memory() if the map fails instead of crashing.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
swr used to build and link the rasterizer to the driver, and to support
multiple architectures we needed to have multiple versions of the
driver/rasterizer combination, which needed to link in much of mesa.
Changing to having one instance of the driver and just building
architecture specific versions of the rasterizer gives a large reduction
in disk space.
libGL.so 6464 Kb -> 7000 Kb
libswrAVX.so 10068 Kb -> 5432 Kb
libswrAVX2.so 9828 Kb -> 5200 Kb
Total 26360 Kb -> 17632 Kb
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Use the SWR rasterizer API through the table returned from
SwrGetInterface rather than referencing the functions directly.
This will allow us to move to a model of having the driver dynamically
load the appropriate swr architecture library.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
We could have used a single integer to store that value, but
Cannonlake has different number of subslices per slice depending on
the GT.
v2: Add CFL subslice numbers (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reduces IOCTL calls by 1, and provides a centralized place to override
such configurations if we have a need to do so.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
From KHR_fence_sync:
When the condition of the sync object is satisfied by the fence
command, the sync is signaled by the associated client API context,
causing any eglClientWaitSyncKHR commands (see below) blocking on
<sync> to unblock. The only condition currently supported is
EGL_SYNC_PRIOR_COMMANDS_COMPLETE_KHR, which is satisfied by
completion of the fence command corresponding to the sync object,
and all preceding commands in the associated client API context's
command stream. The sync object will not be signaled until all
effects from these commands on the client API's internal and
framebuffer state are fully realized. No other state is affected by
execution of the fence command.
If clients are passing the fence fd (from EGL_ANDROID_native_fence_sync)
to a compositor, that fence must only be signaled once the framebuffer
is resolved and not before as is currently the case.
v2: fixup assert to use GL_SYNC_GPU_COMMANDS_COMPLETE (Chad)
Reported-by: Sergi Granell <xerpi.g.12@gmail.com>
Fixes: c636284ee8 ("i965/sync: Implement DRI2_Fence extension")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sergi Granell <xerpi.g.12@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Chad Versace <chadversary@chromium.org>
Cc: Daniel Stone <daniels@collabora.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100242">Bug 100242</a> - radeon buffer allocation failure during startup of Factorio</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101657">Bug 101657</a> - strtod.c:32:10: fatal error: xlocale.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101666">Bug 101666</a> - bitfieldExtract is marked as a built-in function on OpenGL ES 3.0, but was added in OpenGL ES 3.1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101703">Bug 101703</a> - No stencil buffer allocated when requested by GLUT</li>
</ul>
<h2>Changes</h2>
<p>Aaron Watry (1):</p>
<ul>
<li>radeon/winsys: Limit max allocation size to 70% of VRAM</li>
</ul>
<p>Aleksander Morgado (2):</p>
<ul>
<li>etnaviv: fix refcnt initialization in etna_screen</li>
<li>etnaviv: don't dereference etna_resource pointer if allocation fails</li>
</ul>
<p>Alex Smith (2):</p>
<ul>
<li>ac/nir: Use correct LLVM intrinsics for atomic ops on imageBuffers</li>
<li>ac/nir: Fix ordering of parameters for image atomic cmpswap intrinsics</li>
</ul>
<p>Andres Gomez (3):</p>
<ul>
<li>docs: add sha256 checksums for 17.1.4</li>
<li>cherry-ignore: i965: Fix anisotropic filtering for mag filter</li>
<li>Update version to 17.1.5</li>
</ul>
<p>Anuj Phogat (2):</p>
<ul>
<li>intel/isl: Use uint64_t to store total surface size</li>
<li>intel/isl: Add the maximum surface size limit</li>
</ul>
<p>Brian Paul (3):</p>
<ul>
<li>draw: check for line_width != 1.0f in validate_pipeline()</li>
<li>svga: clamp device line width to at least 1 to fix HWv8 line stippling</li>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.