This ensures that future calls to eglSwapBuffers and eglMakeCurrent emit
an error.
This patch is part of a series for fixing
android.hardware.camera2.cts.RobustnessTest#testAbandonRepeatingRequestSurface
on Chrome OS x86 devices.
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Nicolas Boichat <drinkcat@chromium.org>
Cc: Tapani Pälli <tapani.palli@intel.com>
That is, call ANativeWindow::cancelBuffer in droid_destroy_surface().
This should prevent application deadlock when the app destroys the
EGLSurface after EGL has acquired a buffer from SurfaceFlinger
(ANativeWindow::dequeueBuffer) but before EGL has released it
(ANativeWindow::enqueueBuffer).
This patch is part of a series for fixing
android.hardware.camera2.cts.RobustnessTest#testAbandonRepeatingRequestSurface
on Chrome OS x86 devices.
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Nicolas Boichat <drinkcat@chromium.org>
Cc: Tapani Pälli <tapani.palli@intel.com>
Add a new bool, _EGLSurface::Lost, and check it in eglMakeCurrent and
eglSwapBuffers. The EGL 1.5 spec says that those functions emit errors
when the native surface is no longer valid.
This patch just updates core EGL. No driver sets _EGLSurface::Lost yet.
I discovered that Mesa failed to detect lost surfaces while debugging an
Android CTS camera test,
android.hardware.camera2.cts.RobustnessTest#testAbandonRepeatingRequestSurface.
This patch doesn't fix the test though, though, because the test expects
EGL_BAD_SURFACE when the surface becomes lost, and this patch actually
complies with the EGL spec. If I interpreted the EGL spec correctly,
EGL_BAD_NATIVE_WINDOW or EGL_BAD_CURRENT_SURFACE is the correct error.
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Nicolas Boichat <drinkcat@chromium.org>
Cc: Tapani Pälli <tapani.palli@intel.com>
This implementation allocates a 4k BO for each semaphore that can be
exported using OPAQUE_FD and uses the kernel's already-existing
synchronization mechanism on BOs.
Reviewed-by: Chad Versace <chadversary@chromium.org>
This just stubs things out. Real external semaphore support will come
with VK_KHX_external_semaphore_fd.
Reviewed-by: Chad Versace <chadversary@chromium.org>
After successful drmGetDevices2() call, drmFreeDevices() needs to be called.
Fixes: 743315f2 "radv: do not open random render node(s)"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
drmGetDevices2 takes count and not size. Probably hasn't caused problems
yet in practice and was missed as setups with more than 8 DRM devices
are not very common.
Fixes: 743315f2 "radv: do not open random render node(s)"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2 (Jason Ekstrand):
- Take a view_mask rather than a whole subpass
- Build the view mask into the VS shader key
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We want to insert more lowering code that may insert system values and
we need to gather info after that lowering.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Shader hashing is very closely related to shader compilation. Putting
them right next to each other in anv_pipeline makes it easier to verify
that we're actually hashing everything we need to be hashing. The only
real change (other than the order of hashing) is that we now hash in the
shader stage.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We always use only single element.
v2: Change single element arrays to variables
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The regioning parameters are now properly set by convert_to_hw_regs()
and we don't need to fix them in the generator. That latter fix
previously done in the generator was strictly speaking wrong for any
non-identity regions.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On gen7, the swizzles used in DF align16 instructions works for element
size of 32 bits, so we can address only 2 consecutive DFs. As we assumed that
in the rest of the code and prepare the instructions for this (scalarize_df()),
we need to set it to two again.
However, for DF align1 instructions, a width of 2 is wrong as we are not
reading the data we want. For example, an uniform would have a region of
<0, 2, 1> so it would repeat the first 2 DFs, when we wanted to access
to the first 4.
This patch sets the default one to 4 and then modifies the width of
align16 instruction's DF sources when we translate the logical swizzle
to the physical one.
v2:
- Remove conditional (Curro).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
From IVB PRM, vol4, part3, "General Restrictions on Regioning
Parameters":
"If ExecSize = Width and HorzStride ≠ 0, VertStride must
be set to Width * HorzStride."
In next patch, we are going to modify the region parameter for
uniforms and vgrf. For uniforms that are the source of
DF align1 instructions, they will have <0, 4, 1> regioning and
the execsize for those instructions will be 4, so they will break
the regioning rule. This will be the same for VGRF sources where
we use the vstride == 0 exploit.
As we know we are not going to cross the GRF boundary with that
execsize and parameters (not even with the exploit), we just fix
the vstride here.
v2:
- Move is_align1_df() (Curro)
- Refactor exec_size == width calculation (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This fixes:
dEQP-VK.glsl.builtin.precision.min.*
dEQP-VK.glsl.builtin.precision.max.*
dEQP-VK.glsl.builtin.precision.clamp.*
The problem is the hw doesn't compare denorms properly,
so we have to flush them, even though the spec says
flushing is optional, if you don't flush the results
should be correct.
The -pro driver changes the shader float mode,
it would be nice if llvm could grow that perhaps.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
SPIR-V defines the f32->f16 operation as flushing denormals to 0,
this compares the class using amd class opcode.
Thanks to Matt Arsenault for figuring it out.
This fix is VI+ only, add a TODO for SI/CIK.
This fixes:
dEQP-VK.spirv_assembly.instruction.compute.opquantize.flush_to_zero
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Having it in the winsys didn't work when multiple devices use
the same winsys, as we then have multiple contexts per queue,
and each context counts separately.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 7b9963a28f "radv: Enable userspace fence checking."
Loop unroll asserts if it hits a sub, we don't really want
to lower subs as llvm handles these things, but do this for
now, until we can fix loop unroll to work with subs.
Fixes: 14ae0bfa5 (radv: Add NIR loop unrolling)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
All of the dynamic states apply to rasterization & fragment processing,
so we don't need to set them if we don't rasterize.
We don't clear the dirty flags for them though, so we don't miss any
updates for the next pipeline with rasterization.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 76603aa90b "radv: Drop the default viewport when 0 viewports are given."
This still doesn't give us complete pWaitDstStageMask support,
but it should provide enough to be correct if not as efficent as
possible.
If we have wait semaphores we must flush between submits and
flush the shaders as well.
This fixes the remaining fails in:
dEQP-VK.synchronization.op.single_queue.semaphore.*ssbo*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I don't see any reasons why vector_elements is 1 for images and
0 for samplers. This increases consistency and allows to clean
up some code a bit.
This will also help for ARB_bindless_texture.
No piglit regressions with RadeonSI.
This time the Intel CI system doesn't report any failures.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will allow Raspbian's ARMv6 builds to take advantage of the new NEON
code, and could prevent problems if vc4 ends up getting used on a v7 CPU
without NEON.
v2: Drop dead NEON_SUFFIX (noted by Erik Faye-Lund)
Android.mk was setting the flag across the entire driver, so we didn't
have non-NEON versions getting built. This was going to be a problem with
the next commit, when I start auto-detecting NEON support and use the
non-NEON version when appropriate.
Reviewed-by: Rob Herring <robh@kernel.org>
I wrote this code with reference to pixman, though I've only decided to
cover Linux (what I'm testing) and Android (seems obvious enough). Linux
has getauxval() as a cleaner interface to the /proc entry, but it's more
glibc-specific and I didn't want to add detection for that.
This will be used to enable NEON at runtime on ARMv6 builds of vc4.
v2: Actually initialize the temp vars in the Android path (noticed by
daniels)
v3: Actually pull in the cpufeatures library (change by robher).
Use O_CLOEXEC. Break out of the loop when we find our feature.
v4: Drop VFP code, which was confused about what it was detecting and not
actually used yet.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
If we are clearing stencil only, we still need to provide a
a valid Z output from the vertex shader, we can't rely
on the depth clear value having any meaning, as we use this
for the position output, and it could get clipped, so we
don't end up clearing anything.
Fixes:
dEQP-VK.renderpass.simple.stencil
since I added S8 support.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The renderonly_scanout holds a reference on its prime pipe resource,
which should be released when it is destroyed. If it was created by
renderonly_create_kms_dumb_buffer_for_resource, the dumb BO also has
to be destroyed.
Fixes: 848b49b288 ("gallium: add renderonly library")
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This ports
0fcb92c17d
anv: wsi: report presentation error per image request
This fixes:
dEQP-VK.wsi.xlib.incremental_present.scale_none.*
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We would be storing this info twice per image, no need to,
remove it from the surface struct.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When we were assembling the unsigned 64-bit query return from its
two signed 32-bit component parts, the lower half was getting
sign-extended into the top half. Be more explicit about what we want to
do.
Fixes gbm_bo_get_modifier() returning ((1 << 64) - 1) rather than
((1 << 56) - 1), i.e. DRM_FORMAT_MOD_INVALID.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
NIR now validates that SSA references use the same number of channels as
are in the SSA value.
v2: Reword commit message, since the commit didn't land before the
validation change did.
Fixes: 370d68babc ("nir/validate: Validate that bit sizes and components always match")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Cc: <mesa-stable@lists.freedesktop.org>
Set the bit in the same stage as the timestamp, instead always at top of pipe.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
The Android native fence in i965 has two fds: _EGLSync::SyncFd and
brw_fence::sync_fd.
The semantics of __DRI2fenceExtensionRec::create_fence_fd are unclear on
whether the DRI driver takes ownership of the incoming fd (which is the
same incoming fd from eglCreateSync). i965 did take ownership, but all
other Mesa drivers do not; instead, they dup the incoming fd. As
a result, _EGLSync::SyncFd and brw_fence::sync_fd were the same fd, and
both egl_dri2 and i965 believed they owned it. On eglDestroySync, that
led to a double-close.
Fix the double-close by making brw_dri_create_fence_fd dup the incoming
fd, just like the other drivers do.
Signed-off-by: Randy Xu <randy.xu@intel.com>
Test: Run Vulkan and GLES stress test and no crash.
Fixes: 6403e37651 ("i965/sync: Implement fences based on Linux sync_file")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
[chadv: Polish the commit message]
Cc: mesa-stable@lists.freedesktop.org
NEON is sufficiently different on arm64 that we can't just reuse this
code. Disable it on arm64 for now.
v2: Use PIPE_ARCH_ARM instead, as __ARM_ARCH may be 8 for a 32-bit build
for a v8 CPU.
Signed-off-by: Eric Anholt <eric@anholt.net>
Cc: <mesa-stable@lists.freedesktop.org>
Samplers are encoded into the instruction word, so there's no need to
make space in the uniform file.
Previously matrix_columns and vector_elements were set to 0, making this
else case a no-op. Commit 75a31a20af changed that, causing malloc
corruption in thousands of tests on i965.
Fixes: 75a31a20af ("glsl: set vector_elements to 1 for samplers")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100871
Considering we cannot make dummy_thread a constant we might as well,
initialise by the same function that handles the actual thread info.
This way we don't need to worry about mismatch between the initialiser
and initialising function.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Annotate the array as static const and use C99 initialiser to populate
it.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
val_bool and val_int are in a union. val_bool gets the first byte, which
happens to work on LE when setting via the int, but breaks on BE. By
setting the value properly, we are able to use DRI3 on BE architectures.
Tested by running glxgears with a NV34 in a G5 PPC.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
[Emil Velikov: squash the vmwgfx hunk]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Add a page that has information which release is expected when and
associated information.
Reference to it from the "Releasing process" and "Release notes" pages.
v2:
- Add Andres for 17.0.5
- Rework table format to include the branch (Eric)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The instance should have 2 cores, yet bumping the jobs to 4 should give
us a minor speed improvement.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Split into OpenCL and others, since the former is quite time consuming.
v2:
- explicitly enable/disable components
- build libvdpau 1.1 requirement
- enable st/vdpau
- build libva 1.6.2 (API 0.38) requirement
v3: Drop ubuntu-toolchain-r-test from sources (Andres)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Split the target to allow faster builds for each run.
The overall build time will be more, yet Travis runs multiple builds in
parallel so we're limited by the slowest one.
Things are split roughly as:
- DRI loaders, classic DRI drivers, classic OSMesa, make check
- All Gallium drivers (minus the SWR) alongside st/dri (mesa)
- The Vulkan drivers - ANV and RADV, make check (anv)
v2:
- rework RUN_CHECK to MAKE_CHECK_COMMAND
- explicitly disable DRI loaders
- generate linux/memfd.h locally and enable ANV
- add libedit-dev
v3: Use printf to create the header (Andres).
v4: Really add the libedit + printf hunks.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
The former does not require any LLVM, while the latter uses LLVM 3.3.
This way we'll quickly catch any LLVM 3.3+ functionality that gets
introduced where it shouldn't.
Add the full list of addons for each build permutation.
v2: Keep libedit-dev, rework check target.
v3: Comment the current check target, add -j4 SCONSFLAGS
v4:
- Remove llvm-toolchain-trusty-3.3 source (Andres)
- Keep check target as-is (Andres)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
With next commits we'll add a couple of more options.
v2: Rework check target.
v3: Comment the current check target, add -j4 SCONSFLAGS
v4: Keep check target as-is, will rework with later patch.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Split the "if test" blocks so that we get more sensible output in case
of a failure.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
We effectively override libdrm-dev and libxcb-dri2-0-dev since we build
and install the package locally.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
According to the manual
"If you are using ccache, use:
language: c # or other C/C++ variants
cache: ccache
to cache $HOME/.ccache and automatically add /usr/lib/ccache to your
$PATH."
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Provides a small, but consistent improvement.
Example numbers of the jobs added later in the series.
"make loaders/classic DRI" - 1s
"scons SWR" - 6s
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
The txc-dxtn library implements the patented S3 Texture Compression
algorithm.
By default it won't be used but we add the possibility of setting the
USE_TXC_DXTN variable to yes in the travis web UI so it will be
installed and used for the scons tests.
Cc: Eric Anholt <eric@anholt.net>
Cc: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
[Emil Velikov: keep the LIB prefix, drop the LD_LIBRARY_PATH, fold URL]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Trusty's LLVM toochain repository was whitelisted some time ago. See:
479067c5e7
Signed-off-by: Andres Gomez <agomez@igalia.com>
[Emil Velikov]
- set sudo to false
- reference the Trusty change (Rhys)
- keep libedit-dev
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Some of the libraries may be dlopened, which may not always work due to
the non-standard prefix that we're using.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
We do not need to restrict WGL_BIND_TO_TEXTURE_RGB_ARB to
RGB visuals only. It can be supported with RGBA visuals as well.
This fixes the early exit of cinebench-r15-test trace.
Tested with cinebench-r15, piglit, glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
If texture is imported and templ format is sRGB, use compatible sRGB format
to the imported texture format while creating surface view.
tested with MTT piglit, glretrace, viewperf and conform
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This function will return compatible svga srgb format for corresponding
linear format
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This patch will allow driver to choose srgb capable FBconfig
if GLX_FRAMEBUFFER_SRGB_CAPABLE_ARB attribute is 1
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
dri3 is a bit sloppy about its format compatibility requirements, so add
a possibility to import xrgb surfaces as argb textures and vice versa.
At the same time, make the svga_texture_from_handle() function a bit more
readable and fix the error path where we leaked a winsys surface.
v2: Addressed review comments by Brian.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Small fetch performance optimization - use gather instruction
for odd format fetch instead of slow emulated code.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Misplaced #endif preventing depth and stencil hot tile pointers
from incrementing in SIMD16 8x2 configuration of BackendPixelRate.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Frontend - reduce simdvertex/simd16vertex stack usage for VS output in
ProcessDraw, fixes stack overflow in some of the deeper call stacks under
SIMD16.
1. Move the vertex store out of PA_FACTORY, and off the stack
2. Allocate the vertex store out of the aligned heap (pointer is
temporarily stored in TLS, but will be migrated to thread pool
along with other frontend temporary buffers).
3. Grow the vertex store as necessary for the number of verts per
primitive, in chunks of 8/4 simdvertex/simd16vertex
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Disabling buffer overrun warning for Assemble(uint32_t slot,
simdvector *verts) due to what looks like a MSVC compiler bug
when compiling the SIMD16 FE.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Ability to allocate space for an arbitrary number (at compile time)
of positions in the vertex layout.
Removes KNOB_NUM_ATTRIBUTES from knobs.h, replaces the VTX slot
number #defines with the SWR_VTX_SLOTS enum (which contains
replacement for NUM_ATTRIBUTES: SWR_VTX_NUM_SLOTS)
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
We already have BRW_NEW_BATCH, which completely covers all the cases
that BRW_NEW_CONTEXT would handle. Drop it.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Gen4-5 and Gen8+ already set this, but Gen6-7.5 did not. We ought to
be consistent - the answer depends on the API, not the hardware generation.
The Sandybridge PRM says about RASTRULE_UPPER_RIGHT:
"To match OpenGL point rasterization rules (round to +infinity, where
this is the upper right direction wrt OpenGL screen origin of lower
left).
So this is likely the one we should use.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We set this unconditionally on every other platform. Zero (Manhattan)
isn't even listed as an option in the Sandybridge docs - only "true".
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The original Broadwater and Crestline platforms computed antialiased
line distances using "manhattan" distance, aka a + b = c. Eaglelake
and Cantiga added "true" distance, which apparently does something
like max(a, b) + min(a, b) / 4. Not exactly "true", but at least
more accurate.
The G45 documentation indicates that the old manhattan distance setting
is "only for debug purposes" and should never be used. The Ironlake
documentation no longer mentions AALINEDISTANCE_MANHATTAN, though it
does still contain the narrative about the feature.
At any rate, we should use the more accurate mode.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Basically, don't load GRID_SIZE or BLOCK_SIZE if they are unused, determine
whether to load BLOCK_ID for each component separately, and set the number
of THREAD_ID VGPRs to load. Now we should get the maximum CS launch wave
rate in most cases.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
LLVM 5.0 removes s_barrier instructions if the max-work-group-size
attribute is not set. What a surprise.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This removes s_load_dword latency for tess rings.
We need just 1 SGPR for the address if we use 64K alignment. The final asm
for recreating the descriptor is:
// s2 is (address >> 16)
s_mov_b32 s3, 0
s_lshl_b64 s[4:5], s[2:3], 16
s_mov_b32 s6, -1
s_mov_b32 s7, 0x27fac
v2: bitcast the descriptor type from v2i64 to v4i32
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The use of PrimID in the pixel shader is too rare to deserve such
a sizable support code.
The initial idea of the VS epilog was to move the clipping code there and
remove it based on states, but optimized variants are now used to do that
and are easier to support, so the VS epilog has turned out to be not so
useful.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tentatively enable it, expecting the scratch buffer support to be done before
the next Mesa release.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The 2nd shader of merged shaders should take a reference of the 1st shader.
The next commit will do that.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We could also remove index_bounds_valid and use max_index != ~0 instead.
Opinions on that are welcome.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This reverts commit 24011ead71.
This causes lots of ES 3.1 CTS tests to fail to compile a bit of code
like:
layout(binding = 0) buffer InOut
{
highp uint inputValues[384];
highp uint outputValues[384];
coherent highp uint groupValues[64]; <-----
} sb_inout;
error: memory qualifiers may only be applied to images
The VMware driver has a limited set of integer texture formats. We
often have to fall back to 4-component formats when 1- or 2-component
formats are missing.
This fixes about 8 integer texture Piglit tests with the VMware driver
on Linux. We've had this code in-house for a long time but I guess it
was never up-streamed to Mesa master.
This shouldn't regress any other drivers since we're either choosing
an earlier format in the list, or failing anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Return as soon as we find an existing color channel that's enabled for
writing. Typically, this allows us to return true on the first loop
iteration intead of doing four iterations.
No piglit regressions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Drop it from x11_anv_wsi_image_create and x11_anv_wsi_image_free. The
functions are used by Wayland WSI too.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
stfb->iface is always non-NULL for an st_framebuffer. These checks
were incorrect, relying on out-of-bounds memory access in the
surface-less case of EGL_KHR_surfaceless_context.
v2: remove redundant stread check (Marek)
Reviewed-by: Marek Olšák <marek@olsak@amd.com> (v2)
The incomplete framebuffer is set for a surfaceless context. This leads to
the following error in piglit spec@egl_khr_surfaceless_context@viewport:
==26703==ERROR: AddressSanitizer: global-buffer-overflow on address 0x7f6886e43240 at pc 0x7f68854db0fd bp 0x7ffca404b3b0 sp 0x7ffca404b3a0
READ of size 8 at 0x7f6886e43240 thread T0
#0 0x7f68854db0fc in st_viewport ../../../mesa-src/src/mesa/state_tracker/st_cb_viewport.c:57
#1 0x556840176cdb in main tests/egl/spec/egl_khr_surfaceless_context/viewport.c:101
#2 0x7f688edcf3f0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x203f0)
#3 0x556840176e19 in _start (/home/nha/amd/piglit/bin/egl-surfaceless-context-viewport+0xe19)
0x7f6886e43240 is located 32 bytes to the left of global variable 'DummyRenderbuffer' defined in '../../../mesa-src/src/mesa/main/fbobject.c:69:31' (0x7f6886e43260) of size 112
0x7f6886e43240 is located 8 bytes to the right of global variable 'IncompleteFramebuffer' defined in '../../../mesa-src/src/mesa/main/fbobject.c:73:30' (0x7f6886e42de0) of size 1112
SUMMARY: AddressSanitizer: global-buffer-overflow ../../../mesa-src/src/mesa/state_tracker/st_cb_viewport.c:57 in st_viewport
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek@olsak@amd.com>
These operations are currently implemented as IR expressions. However,
they cannot be transformed and moved in the way that other IR
expressions can because they have non-trivial interactions with
control-flow.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes the following ARB_shader_image_load_store tests:
format-layout-with-non-image-type.frag
memory-qualifier-with-non-image-type.frag
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The more I think about it the more this seems like a bad idea.
When we were deleting old cache dirs this wasn't so bad as it
was unlikely we would ever hit the actual limit before things
were cleaned up. Now that we only start cleaning up old cache
items once the limit is reached the a percentage based max
cache limit is more risky.
For the inital release of shader cache I think its better to
stick to a more conservative cache limit, at least until we
have some way of cleaning up the cache more aggressively.
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This commit just exposes the memory handle type. There's interesting we
need to do here for images. So long as the user doesn't set any crazy
environment variables such as INTEL_DEBUG=nohiz, all of the compression
formats etc. should "just work" at least for opaque handle types.
v2 (chadv):
- Rebase.
- Fix vkGetPhysicalDeviceImageFormatProperties2KHR when
handleType == 0.
- Move handleType-independency comments out of handleType-switch, in
vkGetPhysicalDeviceExternalBufferPropertiesKHX. Reduces diff in
future dma_buf patches.
Co-authored-with: Chad Versace <chadversary@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This cache allows us to easily ensure that we have a unique anv_bo for
each gem handle. We'll need this in order to support multiple-import of
memory objects and semaphores.
v2 (Jason Ekstrand):
- Reject BO imports if the size doesn't match the prime fd size as
reported by lseek().
Reviewed-by: Chad Versace <chadversary@chromium.org>
This is the trivial implementation that just exposes the extension
string but exposes zero external handle types.
Reviewed-by: Chad Versace <chadversary@chromium.org>
This is a complete but trivial implementation. It's trivial becasue We
support no external memory capabilities yet. Most of the real work in
this commit is in reworking the UUIDs advertised by the driver.
v2 (chadv):
- Fix chain traversal in vkGetPhysicalDeviceImageFormatProperties2KHR.
Extract VkPhysicalDeviceExternalImageFormatInfoKHX from the chain of
input structs, not the chain of output structs.
- In vkGetPhysicalDeviceImageFormatProperties2KHR, iterate over the
input chain and the output chain separately. Reduces diff in future
dma_buf patches.
Co-authored-with: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We're about to have more UUIDs for different things so this one really
needs to be properly labeled.
Reviewed-by: Chad Versace <chadversary@chromium.org>
The command is really operating on a Queue not a command buffer and the
nearest object to that with an allocator is VkDevice.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: "17.0 17.1" <mesa-dev@lists.freedesktop.org>
I don't see any reasons why vector_elements is 1 for images and
0 for samplers. This increases consistency and allows to clean
up some code a bit.
This will also help for ARB_bindless_texture.
No piglit regressions with RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The majority of cache files are less than 1kb this resulted in us
greatly miscalculating the amount of disk space used by the cache.
Using the number of blocks allocated to the file is more
conservative and less likely to cause issues.
This change will result in cache sizes being miscalculated further
until old items added with the previous calculation have all been
removed. However I don't see anyway around that, the previous
patch should help limit that problem.
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Modern disks are extremely large and are only going to get bigger.
Usage has shown frequent Mesa upgrades can result in the cache
growing very fast i.e. wasting a lot of disk space unnecessarily.
5% seems like a more reasonable default.
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
This assert wasn't in the original radeonsi code but I added
it without totally understanding the original code, it caused
some regressions in variable-indexing tessellation shaders.
Fixes: e2659176 radeonsi/ac: move vertex export remove to common code.
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is ported from radeonsi, and I can see at least one
Talos shader drops an export due to this, and saves some
VGPR usage.
v2: use shared code.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since the host pool changes,
Fixes:
dEQP-VK.api.descriptor_pool.out_of_pool_memory
Fixes: 126d5ad "radv: Use host memory pool for non-freeable descriptors."
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Varying types have already been validated in
apply_type_qualifier_to_variable() by this point.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Here get_scalar_type() was just being use to remove the array
after that we converted it back to base_type anyway so just
use the without_array() helper.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
When we ran Viewperf11's Maya-03 test 3 we saw warnings about flushing
the command buffer with mapped buffers. This happened when transitioning
from hardware rendering to a 'draw' fallback path.
The problem is the util_set_vertex_buffers_count() function doesn't do
exactly what we want in svga_hwtnl_vertex_buffers(). In a case such as
dst_count=2, dst={bufA, bufB}, count=1 and src={bufC}, when the function
returns we'll have dst_count=2 and dst={bufC, bufB}. What we really want
is dst_count=1 and dst={bufC, NULL}. As it was, we were telling the svga
device that there were two vertex buffers when in fact we really only
needed one for the subsequent drawing command.
In this particular case, we first did hardware drawing with {bufA, bufB}
then we transitioned to the 'draw' module, consuming vertex data from
bufA and bufB and writing the new vertex data to bufC. bufA and bufB are
mapped for reading when we flush the command buffer but should not be
referenced by the command buffer. The above change fixes that.
No Piglit regressions. Also tested with Viewperf, Google Earth, Heaven,
etc.
VMware bug 1842059
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We only need to construct the debug message if the mapped_sync flag is set.
This should make the function faster since the flag is usually false.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Instead of directly sending the InvalidateGBSurface command,
this patch uses the invalidate_surface interface.
Fixes Linux VM piglit failures including
ext_texture_array-gen-mipmap, fbo-generatemipmap-array S3TC_DXT1
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch revises the fix in commit 606f13afa31c9f041a68eb22cc32112ce813f944
to properly translate the surface format for screen target.
Instead of changing the svga format for PIPE_FORMAT_B5G6R5_UNORM
to SVGA3D_R5G6B5 for all texture surfaces, this patch only restricts
SVGA3D_R5G6B5 for screen target surfaces. This avoids rendering
failures when specify a non-vgpu10 format in a vgpu10 context with
software renderer.
Fixes piglit failures spec@!opengl 1.1@draw-pixels,
spec@!opengl 1.1@teximage-colors gl_r3_g3_b2
spec@!opengl 1.1@texwrap formats
Tested Xorg with 16bits depth.
Also tested with MTT piglit, MTT glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
CinebenchR15 not only binds the same texture for rendering and sampling,
it actually changes the framebuffer buffer attachment very often, causing
a lot of backed surface view to be created and a lot of surface copies
to be done. This patch caches the backed surface handle
in the texture resource and allows the backed surface view to
reuse the backed surface handle. With this patch, the number of
backed surface view reduces from 1312 to 3. Unfortunately, this
does not eliminate all the surface copies. There are still surface
copies involved when we switch from original to backed surface handle
for rendering.
Tested with CinebenchR15, NobelClinicianViewer, Turbine, Lightsmark2008,
MTT glretrace, MTT piglit.
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch adds a timestamp in svga_surface structure to keep track
of when the backing surface is last sync with the original resource.
This helps to avoid unnecessary surface copy from the original
resource to the backing surface if the original resource has not
since been modified.
This reduces the amount of surface copy with CinebenchR15.
Tested with CinebenchR15, mtt glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
For VGPU10, we will render to a backed surface view when
the same resource is used for rendering and sampling.
In this case, we will mark the dirty bit for the backed surface view.
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch moves the rendertarget view related fields from
svga_hw_draw_state to svga_hw_clear_state where all the hw
framebuffer related state resides.
Reviewed-by: Brian Paul <brianp@vmware.com>
Instead of setting the rendered_to flags at set time, this patch
moves the setting of the flags to framebuffer emit time.
Reviewed-by: Brian Paul <brianp@vmware.com>
The debug output in svga_create_sampler_state() was controlled by
DEBUG_VIEWS but that's not consistent with the other debug output for
sampler views. Create/use a new debug flag just for this.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Tested by verifying 3D acceleration works with HWv8 but not earlier.
For HWv7 and older we get the GDI Generic renderer.
Reviewed-by: Neha Bhende<bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
If for some reason kernel is not able to create surface,
when no buffer was provided the function
vmw_svga_winsys_surface_create should return NULL.
This patch fixes the issue where the code was not following the
clean up path in case of error, which used to cause SIGSEGV.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
shader-db results on GK106 (Thanks Karol):
total instructions in shared programs : 3931608 -> 3929463 (-0.05%)
total gprs used in shared programs : 481255 -> 479014 (-0.47%)
total local used in shared programs : 27481 -> 27381 (-0.36%)
total bytes used in shared programs : 36031256 -> 36011120 (-0.06%)
local gpr inst bytes
helped 14 1471 1309 1309
hurt 1 88 384 384
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
The main goal of this pass to merge temporary registers in order
to reduce the total number of registers and also to produce
optimal TGSI code.
In fact, compilers seem to be confused when temporary variables
are already merged, maybe because it's done too early in the
process.
Skipping the pass, reduce both the register pressure and the code
size, at least for Nouveau and RadeonSI because they have a real
backend compiler.
Found by luck while fixing an issue in the TGSI dead code elimination
pass which affects tex instructions with bindless samplers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Because the buffer is new, it can't be referenced by any CS.
This can save few CPU cycles by skipping the whole
PIPE_TRANSFER_UNSYNCHRONIZED if in amdgpu_bo_map().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There are 2 major hw changes:
- The address must always point to the address of level 0. GFX9 tiling
modes don't allow binding to a non-0 level.
- 3D must always be bound as 3D, because 2D and 3D use entirely different
tiling modes, and the texture target determines which set of modes is
used.
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Shared context support for VAOs was dropped in 0b2750620b.
From the ARB_vertex_array_object spec:
"This extension differs from GL_APPLE_vertex_array_object
in that client memory cannot be accessed through a
non-zero vertex array object. It also differs in that
vertex array objects are explicitly not sharable between
contexts."
Nobody should be using this extension over
ARB_vertex_array_object anymore so just drop it rather than
adding locking back just for VAOs created from these
functions.
For reference the Nvidia blob doesn't expose this extension.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2: - Added some error handling.
- memset the buffer to 0.
v3: Added assert for buffer size.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
In order to cleanly eliminate exports rewrite the
code first to mirror how radeonsi works for now.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These need to be ordered as per shader enum ordering, I'll
rewrite this soon, but this is a bug fix.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
due to the lack of pipe_resource wrapping, we can get this call from inside
of driver calls, which would try to lock an already-locked mutex.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If r300g is the only radeon driver built, the Android build fails to
build:
ninja: error:
'out/target/product/linaro_x86_64/obj/STATIC_LIBRARIES/libmesa_pipe_radeon_intermediates/export_includes',
needed by
'out/target/product/linaro_x86_64/obj/SHARED_LIBRARIES/gallium_dri_intermediates/import_includes',
missing and no known rule to make it
This is because the path to build libmesa_pipe_radeon was only getting
added for r600g and radeonsi, but the library dependency was added for
all radeon drivers. As libmesa_pipe_radeon is not needed for r300g, drop
the library dependency.
Cc: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
From Chapter 5 'Shared Objects and Multiple Contexts' of
the OpenGL 4.5 spec:
"Objects which contain references to other objects include
framebuffer, program pipeline, query, transform feedback,
and vertex array objects. Such objects are called container
objects and are not shared"
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Just return earlier in that case. Also set prefix to an empty string, so
we don't get to use it undefined.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to emit BLEND_STATE, which size is 1 + 2 * nr_draw_buffers
dwords (on gen8+), but the BLEND_STATE struct length is always 17. By
marking it size 1, which is actually the size of the struct minus the
BLEND_STATE_ENTRY's, we can emit a BLEND_STATE of variable number of
entries.
For gen6 and gen7 we set length to 0, since it only contains
BLEND_STATE_ENTRY's, and no other data.
With this change, we also change the code for blorp and anv to emit only
the needed BLEND_STATE_ENTRY's, instead of always emitting 16 dwords on
gen6-7 and 17 dwords on gen8+.
v2:
- Use designated initializers on blorp and remove 0 from
initialization (Jason)
- Default entries to disabled on Vulkan (Jason)
- Rebase code.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the 'dwords' dict is empty, max(dwords.keys()) throws an exception.
This case could happen when we have an instruction that is only an array
of other structs, with variable length.
v2:
- Add another clause for empty dwords and make it work with python 3
(Dylan)
- Set the length to 0 if dwords is empty, and do not declare dw
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before this commit, when a group with count="0" is found, only one field
is added to the struct representing the instruction. This causes only
one entry to be printed by aubinator, for variable length groups.
With this commit we "detect" that there's a variable length group
(count="0") and store the offset of the last entry added to the struct
when reading the xml. When finally reading the aubdump file, we check
the size of the group and whether we have variable number of elements,
and in that case, reuse the last field to add the remaining elements.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The section of the PRM mentioned in the code comment above this table
says that this format supports the render target write message. Internal
documentation says that this format also supports alpha blending. As a
side effect, this allows CCS_D buffers to be created for images with
this format.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Until now the spilling cost calculation was neglecting the amount of
data read from the register during the spilling cost calculation.
This caused it to make suboptimal decisions in some cases leading to
higher memory bandwidth usage than necessary.
Improves Unigine Heaven performance by ~4% on BDW, reversing an
unintended FPS regression from my previous commit
147e71242c with n=12 and statistical
significance 5%. In addition SynMark2 OglCSDof performance is
improved by an additional ~5% on SKL, and a Kerbal Space Program
apitrace around the Moho planet I can provide on request improves by
~20%.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is what we use later on to compute the number of registers that
will actually get spilled to memory, so it's more likely to match
reality than the current open-coded approximation.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Curro pointed out that I should not just check for MACH, but use
the reads_accumulator_implicitly() helper, which would also prevent
the same bug with MAC and SADA2 (if we ever decide to use them).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Fixes following building errors due to missing include paths:
external/mesa/src/amd/common/ac_shader_info.c:23:10: fatal error: 'nir/nir.h' file not found
^
external/mesa/src/compiler/nir/nir.h:48:10: fatal error: 'nir_opcodes.h' file not found
^
Fixes: 224cf29 "radv/ac: add initial pre-pass for shader info gathering"
Acked-by: Dave Airlie <Airlied@redhat.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
This just updates this to use the same flags as radeonsi
for consistency.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
IVB is running into some spilling issues in piglit with the
loop removed. However those tests are not really reflective
of a real world use case, also fp64 is brand new to IVB
so we leave the spilling issues to be resolved at a later
time.
Run time for shader-db on my machine goes from ~795 seconds to
~665 seconds.
shader-db results BDW:
total instructions in shared programs: 12969459 -> 12968891 (-0.00%)
instructions in affected programs: 1463154 -> 1462586 (-0.04%)
helped: 3622
HURT: 3326
total cycles in shared programs: 246453572 -> 246504318 (0.02%)
cycles in affected programs: 208842622 -> 208893368 (0.02%)
helped: 24029
HURT: 35407
total loops in shared programs: 2931 -> 2931 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 14560 -> 14498 (-0.43%)
spills in affected programs: 2270 -> 2208 (-2.73%)
helped: 17
HURT: 2
total fills in shared programs: 19671 -> 19632 (-0.20%)
fills in affected programs: 2060 -> 2021 (-1.89%)
helped: 17
HURT: 2
LOST: 17
GAINED: 40
Most of the hurt shaders are 1-2 instructions, with what looks like a max of 7.
I've looked at the worst cycles regressions and as far as I can tell its just
a scheduling difference.
Acked-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If packing doesn't cross locations we can easily make use of
ARB_enhanced_layouts to do packing rather than using the GLSL IR
lowering pass lower_packed_varyings().
Shader-db Broadwell results:
total instructions in shared programs: 12977822 -> 12977819 (-0.00%)
instructions in affected programs: 1871 -> 1868 (-0.16%)
helped: 4
HURT: 3
total cycles in shared programs: 246567288 -> 246567668 (0.00%)
cycles in affected programs: 1370386 -> 1370766 (0.03%)
helped: 592
HURT: 733
Acked-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently the NIR backends depend on GLSL IR copy propagation to
fix up the interpolateAt* function params after varying packing
changes the shader input to a global. It's possible copy propagation
might not always do what we need it too, and we also shouldn't
depend on optimisations to do this type of thing for us.
I'm not sure if the same is true for TGSI, but the following
commit should re-enable packing for most cases in a safer way,
so we just disable it everywhere.
No change in shader-db for i965 (BDW)
Acked-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These should be lowered away in GLSL IR but if we don't get dead
code to clean them up it causes issues in glsl_to_nir.
We wan't to drop as many GLSL IR opts in future as we can so this
makes glsl_to_nir just ignore the vars if it sees them.
In future we will want to just use the nir lowering pass that
Vulkan currently uses.
Acked-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This shuffles constants down in the reverse of what the previous
patch does and applies some simpilifications that may be made
possible from doing so.
Shader-db results BDW:
total instructions in shared programs: 12980814 -> 12977822 (-0.02%)
instructions in affected programs: 281889 -> 278897 (-1.06%)
helped: 1231
HURT: 128
total cycles in shared programs: 246562852 -> 246567288 (0.00%)
cycles in affected programs: 11271524 -> 11275960 (0.04%)
helped: 1630
HURT: 1378
V2: mark float opts as inexact
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
V2: mark float opts as inexact
If one of the inputs to an mul/add is the result of another
mul/add there is a chance that we can reuse the result of that
mul/add in other calls if we do the multiplication in the right
order.
Also by attempting to move all constants to the top we increase
the chance of constant folding.
For example it is a fairly common pattern for shaders to do something
similar to this:
const float a = 0.5;
in vec4 b;
in float c;
...
b.x = b.x * c;
b.y = b.y * c;
...
b.x = b.x * a + a;
b.y = b.y * a + a;
So by simply detecting that constant a is part of the multiplication
in ffma and switching it with previous fmul that updates b we end up
with:
...
c = a * c;
...
b.x = b.x * c + a;
b.y = b.y * c + a;
Shader-db results BDW:
total instructions in shared programs: 13011050 -> 12967888 (-0.33%)
instructions in affected programs: 4118366 -> 4075204 (-1.05%)
helped: 17739
HURT: 1343
total cycles in shared programs: 246717952 -> 246410716 (-0.12%)
cycles in affected programs: 166870802 -> 166563566 (-0.18%)
helped: 18493
HURT: 7965
total spills in shared programs: 14937 -> 14560 (-2.52%)
spills in affected programs: 9331 -> 8954 (-4.04%)
helped: 284
HURT: 33
total fills in shared programs: 20211 -> 19671 (-2.67%)
fills in affected programs: 12586 -> 12046 (-4.29%)
helped: 286
HURT: 33
LOST: 39
GAINED: 33
Some of the hurt will go away when we shuffle things back down to the
bottom in the following patch. It's also noteworthy that almost all of the
spill changes are in Deus Ex both hurt and helped.
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Didn't turn out as useful as I'd hoped, but it will help alot more on
i965 by reducing regressions when we drop brw_do_channel_expressions()
and brw_do_vector_splitting().
I'm not sure how much sense 'is_not_used_by_conditional' makes on
platforms other than i965 but since this is a new opt it at least
won't do any harm.
shader-db BDW:
total instructions in shared programs: 13029581 -> 13029415 (-0.00%)
instructions in affected programs: 15268 -> 15102 (-1.09%)
helped: 86
HURT: 0
total cycles in shared programs: 247038346 -> 247036198 (-0.00%)
cycles in affected programs: 692634 -> 690486 (-0.31%)
helped: 183
HURT: 27
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Blob won't render to this format, and sampling from it it uses the same
fmt value for r8g8b8_snorm and r8g8b8a8_snorm. But this is what is what
blocks us from jumping from gl30/gles20 to gl31/gles30. So a hack it
is!
Signed-off-by: Rob Clark <robdclark@gmail.com>
Support supertiled textures on hardware that has the appropriate
feature flag SUPERTILED_TEXTURE.
Most of the scaffolding was already in place in etna_layout_multiple:
case ETNA_LAYOUT_SUPER_TILED:
*paddingX = 64;
*paddingY = 64;
*halign = TEXTURE_HALIGN_SUPER_TILED;
So this is just a matter of allowing it.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
It is always the draw ring. Except for a5xx queries like time-elapsed,
where we will eventually want to emit cmds into both binning and draw
rings.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Some queries on a4xx and all queries on a5xx can do result accumulation
on CP so we don't need to track per-tile samples. We do still need to
handle pausing/resuming while switching batches (in case the query is
active over multiple draws which are executed out of order).
So introduce new accumulated-query helpers for these sorts of queries,
since it doesn't really fit in cleanly with the original query infra-
structure.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Move a bit more of the logic shared by all query types (active tracking,
etc) into common code. This avoids introducing a 3rd copy of that logic
for a5xx.
Signed-off-by: Rob Clark <robdclark@gmail.com>
For a5xx (and actually some queries on a4xx) we can accumulate results
in the cmdstream, so we don't need this elaborate mechanism of tracking
per-tile query results. So make it into vfuncs so generation specific
backend can use it when it makes sense.
Signed-off-by: Rob Clark <robdclark@gmail.com>
opt_register_coalesce() was optimizing sequences such as:
mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D
mach(8) vgrf5.xy:D, attr18.xyyy:D, attr19.xyyy:D
mov(8) m4.zw:F, vgrf5.xxxy:F
into:
mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D
mach(8) m4.zw:D, attr18.xxxy:D, attr19.xxxy:D
This doesn't work - if we're going to reswizzle MACH, we'd need to
reswizzle the MUL as well. Here, the MUL fills the accumulator's .zw
components with attr18.yy * attr19.yy. But the MACH instruction expects
.z to contain attr18.x * attr19.x. Bogus results ensue.
No change in shader-db on Haswell. Prevents regressions in Timothy's
patches to use enhanced layouts for varying packing (which rearrange
code just enough to trigger this pre-existing bug, but were fine
themselves).
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
From Chapter 5 'Shared Objects and Multiple Contexts' of
the OpenGL 4.5 spec:
"Objects which contain references to other objects include
framebuffer, program pipeline, query, transform feedback,
and vertex array objects. Such objects are called container
objects and are not shared"
For we leave locking in place for framebuffer objects because
the EXT fbo extension allowed sharing.
We could maybe just replace the hash with an ordinary hash table
but for now this should remove most of the unnecessary locking.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This pattern was only useful when we used mutex locks, which the previous
commit removed.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
From Chapter 5 'Shared Objects and Multiple Contexts' of
the OpenGL 4.5 spec:
"Objects which contain references to other objects include
framebuffer, program pipeline, query, transform feedback,
and vertex array objects. Such objects are called container
objects and are not shared"
For we leave locking in place for framebuffer objects because
the EXT fbo extension allowed sharing.
V2: (Timothy Arceri)
- rebased and dropped changes to framebuffer objects
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We should never get here if this is 0 unless there is a
bug. Replace the check with an assert.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
As mentioned in the manual - comparing pthread_t handles via the C
comparison operator is incorrect and pthread_equal() should be used
instead.
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Fixes: d8d81fbc31 ("mesa: Add infrastructure for a worker thread to process GL commands.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
As pointed out by compiler
./llvm/codegen.hpp:52:22: error: ‘<::’ cannot begin a template-argument list [-fpermissive]
./llvm/codegen.hpp:52:22: note: ‘<:’ is an alternate spelling for ‘[’. Insert whitespace between ‘<’ and ‘::’
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
This function is actually a wrapper for component_slots()
and it always returns 1 (or N) for samplers. Since
component_slots() now return 1 for samplers, it can go.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It looks inconsistent to return 1 for image types and 0 for
sampler types. Especially because component_slots() is mostly
used by values_for_type() which always returns 1 for samplers.
For bindless, this value will be bumped to 2 because the
ARB_bindless_texture states that bindless samplers/images
should consume two components.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The OpenGL extension KHR_no_error is exposed since commit
d42d150ad2 by Timothy Arceri. Therefore it
should be marked as "started" in the features.txt
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This reverts commit 4d4558411d.
This was a wrong call, while it fixed issue with 3DMark it
actually introduced regression elsewhere.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
This enables support on GM200+ for:
- GL_AMD_vertex_shader_layer
- GL_AMD_vertex_shader_layer_viewport_index
- GL_ARB_shader_viewport_layer_array
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
[lyude: add relnotes/TES cap]
Signed-off-by: Lyude <lyude@redhat.com>
[imirkin: move relnotes to right place, add features.txt]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
EMIT only applies to geometry shaders. For everything else, we want to
export the viewport normally.
Signed-off-by: Lyude <lyude@redhat.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This breaks the guts of MI_MATH (the instruction part) out into its own
structure with proper named values.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed by: Iago Toral Quiroga <itoral@igalia.com>
Looking at some Talos shaders vs radeonsi, I noticed they use
tex_lz in a few places, so we should be able to as well.
Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is nicer on caches, and the next commit will need to access
the structure from a different place.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
min/max_index are typically hints for the u_vbuf module, not the driver.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Most drivers don't need it and shouldn't need it because it can't be used
in some cases (indirect draws, primitive restart, count from streamout).
Reviewed-by: Brian Paul <brianp@vmware.com>
It helps to find the envvar option you are looking for.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Show the commit hash and the title in a way that it is easier to copy
and paste in the bin/.cherry-ignore-extra file if we want to ignore
those commits for the future.
v2:
- Use printf instead echo (Eric Engestrom)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Both scripts does not use a file with the commits to ignore. So if we
have handled one of the suggested commits and decided we won't pick it,
the scripts will continue suggesting them.
v2:
- Mark the candidates in bin/get-extra-pick-list.sh (Juan A. Suarez)
- Use bin/.cherry-ignore to store rejected patches (Emil)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gives me approximately a 2% perf increase in bot dota2 & talos.
Having descriptors (both sets and vertex buffers) prefetched
didn't help so I didn't include that.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
With this we don't have any operations on a pool with non-freeable
descriptors left that have O(#descriptors) complexity.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
v2: Handle out of pool memory error.
v3: Actually use VK_ERROR_OUT_OF_POOL_MEMORY_KHR for the error condition.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
If VDPAU is installed in the non-default location, we'll fail to find
the headers and error at build time.
../../src/gallium/include/state_tracker/vdpau_dmabuf.h:37:25: fatal error: vdpau/vdpau.h: No such file or directory
#include <vdpau/vdpau.h>
^
Fixes: faba96bc60 ("st/vdpau: add new interop interface")
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Resolves build issues like the following:
src/gallium/winsys/sw/dri/dri_sw_winsys.c:203:31: error: pointer of type ‘void *’ used in arithmetic [-Werror=pointer-arith]
data = dri_sw_dt->data + (dri_sw_dt->stride * box->y) + box->x * blsize;
^
src/gallium/winsys/sw/dri/dri_sw_winsys.c:203:62: error: pointer of type ‘void *’ used in arithmetic [-Werror=pointer-arith]
data = dri_sw_dt->data + (dri_sw_dt->stride * box->y) + box->x * blsize;
^
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The macro is introduced with pkgconfig v0.28 which isn't universally
available. Thus it will error at configure stage.
Reported-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Fixes: ce562f9e3f ("EGL: Implement the libglvnd interface for EGL (v3)")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
These checks do not generate any errors. Move them so we can add
KHR_no_error support and still make sure we do these checks.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We also move _mesa_update_array_format() into the caller.
This gets these functions ready for KHR_no_error support.
V2: Updated function comment as suggested by Brian.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When any count[i] is negative, we must skip all draws.
Moving to vbo makes the subsequent change easier.
v2:
- provide the function in all contexts, including GLES
- adjust validation accordingly to include the xfb check
v3:
- fix mix-up of pre- and post-xfb prim count (Nils Wallménius)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes the build after:
commit 399ebd2a84
Author: Dave Airlie <airlied@redhat.com>
Date: Wed Apr 19 06:18:23 2017 +1000
radv/meta: add common shader vertex generation function
Signed-off-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes the build after:
commit 224cf2906a
Author: Dave Airlie <airlied@redhat.com>
Date: Mon Apr 17 13:01:52 2017 +1000
radv/ac: add initial pre-pass for shader info gathering
Signed-off-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The vs vertex generate and fs noop shaders are used in a few places,
so refactor them out.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For depth clears we have to pass the depth in the 2nd
component, we can use push constants for some of this
later to drop the vertex buffer completely
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This removes the vertex buffer, and just generates the values
in the shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of passing in the same 1.0, -1.0 combinations via
vertex buffers, we can just use vertex id to have the vertex
shader build them. This function introduces the generator
code needed, later patches will use this.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some of the shaders could just generate the vertex data in the
shader, so add helpers to allow us to move to doing that.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This bumps the limit to the number of sets to 32, now that
we have proper support for it. It also uses 1u in a few places
to make things a bit safer.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We want to expose more descriptor sets to the applications,
but currently we have a 1:1 mapping between shader descriptor
sets and 2 user sgprs, limiting us to 4 per stage. This commit
check if we don't have enough user sgprs for the number of
bound sets for this shader, we can ask for them to be indirected.
Two sgprs are then used to point to a buffer or 64-bit pointers
to the number of allocated descriptor sets. All shaders point
to the same buffer.
We can use some user sgprs to inline one or two descriptor sets
in future, but until we have a workload that needs this I don't
think we should spend too much time on it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds an initial implementation to allocate the user
sgprs and make sure we don't run out if we try to bind
a bunch of descriptor sets.
This can be enhanced further in the future if we add
support for inlining push constants.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
mostly documenting things, since with modern llvm we always have the
spill enabled.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In practice this will probably just drop draw id in a few places.
v2: just do draw_id for now. (Bas)
it might be possible to do something more if we need it in the
future. (nha)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
There is some radv specific info we need to gather from shaders
before we get into converting nir->llvm, so we can make
better decisions especially around user sgpr allocation.
This is just an initial placeholder to gather if sample positions
are required in the frag shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In particular, move per-shader-stage info out to a seperate array of
enum's indexed by shader stage. This will make it easier to add more
shader stages as well as new per-stage state (like SSBOs).
Signed-off-by: Rob Clark <robdclark@gmail.com>
a3xx/a4xx use the generic u_blitter path, which will make state dirty
bits be set appropriately thanks to the automagic of generic code
setting generic state in the driver. And a5xx has a blit/dma engine
(actually, two) so it doesn't need these extra dirty bits set.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This makes it easier to deal with adding additional stages which have
their own driver-params. The duplicated code this introduces can be
refactored out after a later patch moves to per-shader-stage dirty
flags.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Note that this involves juggling around a bit when we emit and clear
texture state. So split out from the patch that adds the helper to set
all state dirty.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Make this an array indexed by shader stage, as is done elsewhere for
other per-shader-stage state. This will simplify things as more shader
stages are eventually added.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Each of the ir3 users has *basically* the same logic for comparing the
previous and current shader key, to see which, if any, shader state
needs to be marked dirty due to shader variant change.
The difference between gen's was just that some lowering flags never get
set on certain generations. But it doesn't really hurt to include the
extra checks (because both keys would have false).
Signed-off-by: Rob Clark <robdclark@gmail.com>
So atexit() is horrible and 4aea8fe7 is probably not a good idea. But
add an extra layer of duct-tape to the problem. Otherwise we hit a
situation where app using an atexit() handler that runs later than ours
doesn't hang when trying to tear down a context.
(gdb) bt
#0 util_queue_killall_and_wait (queue=queue@entry=0x52bc80) at ../../../src/util/u_queue.c:264
#1 0x0000007fb6c380c0 in atexit_handler () at ../../../src/util/u_queue.c:51
#2 0x0000007fb7730e2c in __run_exit_handlers () from /lib64/libc.so.6
#3 0x0000007fb7730e5c in exit () from /lib64/libc.so.6
#4 0x0000007fb7ce17dc in piglit_report_result (result=PIGLIT_PASS) at /home/robclark/src/piglit/tests/util/piglit-util.c:267
#5 0x0000007fb7ef99f8 in process_next_event (x11_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:139
#6 0x0000007fb7ef9a90 in enter_event_loop (winsys_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:153
#7 0x0000007fb7ef8e50 in run_test (gl_fw=0x432c20, argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:88
#8 0x0000007fb7edb890 in piglit_gl_test_run (argc=1, argv=0x7ffffff588, config=0x7ffffff400) at /home/robclark/src/piglit/tests/util/piglit-framework-gl.c:203
#9 0x0000000000401224 in main (argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/bugs/drawbuffer-modes.c:46
(gdb) c
Continuing.
[Thread 0x7fb67580c0 (LWP 3471) exited]
^C
Thread 1 "drawbuffer-mode" received signal SIGINT, Interrupt.
0x0000007fb72dda34 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
(gdb) bt
#0 0x0000007fb72dda34 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1 0x0000007fb6c38304 in cnd_wait (mtx=0x5bdc90, cond=0x5bdcc0) at ../../../include/c11/threads_posix.h:159
#2 util_queue_fence_wait (fence=0x5bdc90) at ../../../src/util/u_queue.c:106
#3 0x0000007fb6daac70 in fd_batch_sync (batch=0x5bdc70) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:233
#4 batch_reset (batch=batch@entry=0x5bdc70) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:183
#5 0x0000007fb6daa5e0 in batch_flush (batch=0x5bdc70) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:290
#6 fd_batch_flush (batch=0x5bdc70, sync=<optimized out>) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:308
#7 0x0000007fb6daba2c in fd_bc_flush (cache=0x461220, ctx=0x52b920) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch_cache.c:141
#8 0x0000007fb6dac954 in fd_context_flush (pctx=0x52b920, fence=0x0, flags=<optimized out>) at ../../../../../src/gallium/drivers/freedreno/freedreno_context.c:54
#9 0x0000007fb6b43294 in st_glFlush (ctx=<optimized out>) at ../../../src/mesa/state_tracker/st_cb_flush.c:121
#10 0x0000007fb69a84e8 in _mesa_make_current (newCtx=newCtx@entry=0x0, drawBuffer=drawBuffer@entry=0x0, readBuffer=readBuffer@entry=0x0) at ../../../src/mesa/main/context.c:1654
#11 0x0000007fb6b7ca58 in st_api_make_current (stapi=<optimized out>, stctxi=0x0, stdrawi=0x0, streadi=0x0) at ../../../src/mesa/state_tracker/st_manager.c:827
#12 0x0000007fb6cc87e8 in dri_unbind_context (cPriv=<optimized out>) at ../../../../../src/gallium/state_trackers/dri/dri_context.c:217
#13 0x0000007fb6cc80b0 in driUnbindContext (pcp=0x5271e0) at ../../../../../../src/mesa/drivers/dri/common/dri_util.c:591
#14 0x0000007fb7d1da08 in MakeContextCurrent (dpy=0x433380, draw=0, read=0, gc_user=0x0) at ../../../src/glx/glxcurrent.c:214
#15 0x0000007fb7a8d5e0 in glx_platform_make_current () from /lib64/libwaffle-1.so.0
#16 0x0000007fb7a894e4 in waffle_make_current () from /lib64/libwaffle-1.so.0
#17 0x0000007fb7ef8c60 in piglit_wfl_framework_teardown (wfl_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_wfl_framework.c:628
#18 0x0000007fb7ef939c in piglit_winsys_framework_teardown (winsys_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:238
#19 0x0000007fb7ef9c30 in destroy (gl_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:212
#20 0x0000007fb7edb7c4 in destroy () at /home/robclark/src/piglit/tests/util/piglit-framework-gl.c:184
#21 0x0000007fb7730e2c in __run_exit_handlers () from /lib64/libc.so.6
#22 0x0000007fb7730e5c in exit () from /lib64/libc.so.6
#23 0x0000007fb7ce17dc in piglit_report_result (result=PIGLIT_PASS) at /home/robclark/src/piglit/tests/util/piglit-util.c:267
#24 0x0000007fb7ef99f8 in process_next_event (x11_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:139
#25 0x0000007fb7ef9a90 in enter_event_loop (winsys_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:153
#26 0x0000007fb7ef8e50 in run_test (gl_fw=0x432c20, argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:88
#27 0x0000007fb7edb890 in piglit_gl_test_run (argc=1, argv=0x7ffffff588, config=0x7ffffff400) at /home/robclark/src/piglit/tests/util/piglit-framework-gl.c:203
#28 0x0000000000401224 in main (argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/bugs/drawbuffer-modes.c:46
(gdb) r
Fixes: 4aea8fe7 ("gallium/u_queue: fix random crashes when the app calls exit()")
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Similar to st_convert_image(), will be useful for bindless. While
we are at it, rename convert_sampler() to convert_sampler_from_unit()
and make 'st' a const argument.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The description under RENDER_SURFACE_STATE::RedClearColor says,
For Sampling Engine Multisampled Surfaces and Render Targets:
Specifies the clear value for the red channel.
For Other Surfaces:
This field is ignored.
This means that the sampler on BDW doesn't support CCS.
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Commit bfee9866 "radv: Use RELEASE_MEM packet for MEC timestamp query."
added WriteTimestamp handling for compute queues but forgot to flip
the flag.
Tested with DOOM (by me) and CTS (by Bas), but without verification
that these tests actually use timestamps on compute queues.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For compute shaders, we need to be able to allocate some "high"
registers (r48.x to r55.w). (Possibly these are global to all threads
in a warp?) Add a new register class to handle this.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The layout of CP_LOAD_STATE packet is slightly different on a4xx+.
Switch to the a4xx+ specific CP_LOAD_STATE4 to get the correct encoding.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The warning should be printed only when one explicitly uses the
deprecated configure toggle.
Fixes: 7748c3f5eb ("configure.ac: deprecate --with-egl-platforms over
--with-platforms")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Currently the former controls more than just EGL. With follow-up commits
we'll unwind and fix things so that one can build the different drivers
with said platform support.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
The configure option is used by more than just EGL and with next commit
we'll rename it accordingly. Thus having the check will (and is atm)
incorrect.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
We are not using either of these. The respecive xcb packages are used
instead.
v2: Rebase, reword commit message.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The new interface mostly just sits on top of the existing library.
The only change to the existing EGL code is to split the client
extension string into platform extensions and everything else. On
non-glvnd builds, eglQueryString will just concatenate the two strings.
The EGL dispatch stubs are all generated. The script is based on the one
used to generate entrypoints in libglvnd itself.
v2: [Kyle]
- Rebased against master.
- Reworked the EGL makefile to use separate libraries
- Made the EGL code generation scripts work with Python 2 and 3.
- Change gen_egl_dispatch.py to use argparse for the command line arguments.
- Assorted formatting and style cleanup in the Python scripts.
v3: [Emil Velikov]
- Rebase
- Remove separate glvnd glx/egl configure toggles
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
We should check the presence in order to determine if we should
[implicitly] set the CFLAGS/LIBS
v2: Drop spurious OMX hunk (Eric)
Cc: Eric Anholt <eric@anholt.net>
Reported-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Earlier commit bumped the requirement for the SWR driver.
v2: Fold the note with the LLVM 3.9 one (Tim)
Fixes: 3c52a7316a ("swr: [configure.ac/scons] require c++14")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
I accidentally moved the bo->bufmgr dereference above the NULL check
when cleaning up this code.
While passing NULL to free() is a common pattern...passing NULL to
unmap seems pretty bad. You really ought to know whether you have
a buffer or not. We don't want to paper over bugs like that. So,
just drop the NULL check altogether.
CID: 1405006
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Starting positions >= 32 are not part of the header, rather than >.
Caught by Coverity, which found that "bits <<= field->start" may shift
by 32, which has undefined behavior.
CID: 1404968
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
If ret is 0, we return. If ret is not 0, we return. This is dead.
CID: 1405013 (Structurally dead code (UNREACHABLE))
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This hides the overhead of everything in the driver after the CS flush and
before returning from pipe_context::flush.
Only microbenchmarks will benefit.
+2% FPS for glxgears.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Not possible with GL and it will make future gallium rework easier.
(also it's something I wouldn't like to support)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This moves the hashing of shader source for the cache lookup to before
the preprocessor. In our experience, shaders are unlikely to hash the
same after preprocessing if they didn't hash the same before, so we can
skip preprocessing for cache hits.
Improves Deus Ex start-up times with a warm cache from ~30 seconds to
~22 seconds.
Also fixes the leaking of state.
V2: fix indentation
v3: add the value of MESA_EXTENSION_OVERRIDE to the hash of the shader.
Tested-by (v2): Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Due to a max limit of 65,536 entries on the index table that we use to
decide if we can skip compiling individual shaders, it is very likely
we will have collisions.
To avoid doing too much work when the linked program may be in the
cache this patch delays calling the optimisations until link time.
Improves cold cache start-up times on Deus Ex by ~20 seconds.
When deleting the cache index to simulate a worst case scenario
of collisions in the index, warm cache start-up time improves by
~45 seconds.
V2: fix indentation, make sure to call optimisations on cache
fallback, make sure optimisations get called for XFB.
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This prevents a user from using a cache created on one hardware
generation on a different one. Of course, with Intel hardware, this
requires moving their drive from one machine to another but it's still
possible and we should prevent it.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
This adds native fence fd support to etnaviv, similarly to commit
0b98e84e9b ("freedreno: native fence fd"), enabled for kernel
driver version 1.1 or later.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
v2 (Andreas Boll):
- Mark GL 4.1 as supported by i965/gen7+
- Mark GL_ARB_shader_precision as supported by i965/gen7+
- Update release notes
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This patch adds support for the SINGLE_BUFFER feature on GC3000
GPUs, which allows rendering to a single buffer using multiple pixel
pipes.
This feature is always used when it is available, which means that
multi-tiled formats are no longer being used in that case, and all
buffers will be normal (super)tiled. This mimics the behavior of the
blob on GC3000.
- Because the same format can be used to render to and texture from,
this avoids an extra resolve pass when rendering to texture.
- i.MX6qp includes a PRE which can scan-out directly from tiled formats,
avoiding untiling overhead.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Update to etna_viv commit 8486a97.
austriancoder: changed patch to include isa redefinition fix.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Request chipMinorFeatures bitfields 4 and 5 from the
drm driver.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
When passing render buffers from EGL clients to a wayland compositor,
the resource tile status must be resolved because otherwise the tile
status is lost in the transfer and cleared parts of the buffer will
contain old contents.
The same applies when sampling directly from a renderable resource.
lst: Add seqno tracking, to skip flush when not needed.
Fixes: aadcb5e94b35 ("etnaviv: enable TS, but disable autodisable")
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Before resolving a resource into its scanout prime buffer, check that
the prime resource is actually older. If it is not, the resolve is an
expensive no-op, and we better skip it.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Add polygon stipple functionality to the fragment shader.
Explicitly turn off polygon stipple for lines and points, since we
do them using tris.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
In commit c35fa7a, we changed the "width" of DF source registers to 2,
which is conceptually fine. Unfortunately a VertStride of 2 is not
allowed by align16 instructions on IVB/BYT, and the regular VertStride
of 4 works fine in any case.
See generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-round-double.shader_test
for example:
cmp.ge.f0(8) g18<1>DF g1<0>.xyxyDF -g8<2>DF { align16 1Q };
ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed
cmp.ge.f0(8) g19<1>DF g1<0>.xyxyDF -g9<2>DF { align16 2N };
ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed
v2:
- Add spec quote (Curro).
- Change the condition to only BRW_VERTICAL_STRIDE_2 (Curro)
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This is required for correctness in presence of multiple 4-wide flag
writes (e.g. 4-wide instructions with a conditional mod set) which
update a different portion of the same 8-bit flag subregister.
Right now we keep track of flag dataflow with 8-bit granularity and
consider flag writes to have killed any previous definition of the
same subregister even if the write was less than 8 channels wide,
which can cause live flag register updates to be dead
code-eliminated incorrectly.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
horiz_offset() shouldn't be doing anything for scalar registers,
because all channels of any SIMD instructions will end up reading or
writing the same component of the register, so shifting the register
offset would be wrong.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Re-implement in terms of is_uniform() for
simplicity. Pass argument by const reference. Clarify commit
message. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Otherwise for a pack_double_2x32_split opcode, we emit:
vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134
mov(8) g5<1>UD g5<4>.xUD { align16 1Q compacted };
mov(8) g7<2>UD g5<4,4,1>UD { align1 1Q };
ERROR: When the destination spans two registers, the source must span two registers
(exceptions for scalar source and packed-word to packed-dword expansion)
mov(8) g8<2>UD g5.4<4,4,1>UD { align1 2N };
ERROR: The offset from the two source registers must be the same
mov(8) g5<1>UD g6<4>.xUD { align16 1Q compacted };
mov(8) g7.1<2>UD g5<4,4,1>UD { align1 1Q };
ERROR: When the destination spans two registers, the source must span two registers
(exceptions for scalar source and packed-word to packed-dword expansion)
mov(8) g8.1<2>UD g5.4<4,4,1>UD { align1 2N };
ERROR: The offset from the two source registers must be the same
The intention was to emit mov(4)s for the instructions that have ERROR
annotations.
See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test
for example.
v2 (Samuel):
- Instead of setting the exec size to a fixed value, don't double it
(Curro).
- Add PICK_{HIGH,LOW}_32BIT to the condition.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Trivial rebase changes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop useless vec4_visitor dependencies. Demote to
static stand-alone function. Don't write unused components in the
result. Use vec4_builder interface for register allocation. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Take into account offset values less than a full register (32 bytes)
when getting the var from register.
This is required when dealing with an operation that writes half of the
register (like one d2x in IVB/BYT, which uses exec_size == 4).
v2:
- Take in account this offset < 32 in liveness analysis too (Curro)
v3:
- Change formula in var_from_reg() (Curro)
- Remove useless changes (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On IVB, DF instructions have lowered the SIMD width to 4 but the
exec_size will be later doubled. Fix the assert to avoid crashing in
this case.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Simplify assert. Except for the 'inst->group % 4
== 0' part the assertion was redundant with the previous assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This way we can set the destination type as double to all these new opcodes,
avoiding any optimizer's confusion that was happening before.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop no_spill workaround originally needed due to
the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When doing a 64-bit to a smaller data type size conversion, the destination should
be aligned to 64-bits. Because of that, we need to gather the data after the
actual conversion.
Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but
now we split them explicitely in two different instructions:
VEC4_OPCODE_FROM_DOUBLE just do the conversion and
VEC4_OPCODE_PICK_LOW_32BIT will gather the data.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
In the generator we must generate slightly different code for
Ivybridge/Baytrail, because of the way the stride works in
this hardware.
v2:
- Use stride and don't need to fix dst (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Keep the original type when dealing with null registers. Especially
because we do no want to introduce an implicit conversion between
types that could affect the conditional flags.
This affects especially when the original type is DF, and we are working
on Ivybridge/Baytrail.
v2 (Curro)
- Fix typo.
- Use retype() instead of applying the type directly.
- Remove unneeded retype.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We need to split DF instructions in two on IVB/BYT as it needs an
execsize 8 to process 4 DF values (one GRF in total).
v2:
- Rename helper and make it static inline function (Matt).
- Fix indention and add braces (Matt).
v3:
- Don't edit IR instruction when doubling exec_size (Curro)
- Add comment into the code (Curro).
- Manage ARF registers like the others (Curro)
v4:
- Add get_exec_type() function and use it to calculate the execution
size.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Take
destination type as execution type where there is no valid source.
Assert-fail if the deduced execution type is byte. Clarify comment
in get_lowered_simd_width(). Move SIMD width workaround outside of
'if (...inst->size_written > REG_SIZE)' conditional block, since the
problem should be independent of whether the amount of data written
by the instruction is greater or lower than a GRF. Drop redundant
is_ivb_df definition. Drop bogus inst->exec_size < 8 check.
Simplify channel group assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The hardware applies the same channel enable signals to both halves of
the compressed instruction which will be just wrong under non-uniform
control flow. Fix this by splitting those instructions to SIMD4.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
According to the IVB and HSW PRMs:
"2.When the destination requires two registers and the sources are
indirect, the sources must use 1x1 regioning mode."
So for DF instructions the execution size is not limited by the number
of address registers that are available, but by the EU decompression
logic not handling VxH indirect addressing correctly.
This patch limits the SIMD width to 4 in this case.
v2:
- Fix typo (Matt).
- Fix condition (Curro)
v3:
- Add spec quote (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When converting a DF to 32-bit conversions, we set dst stride to 2,
to fulfill alignment restrictions because the upper Dword of every
Qword will be written with undefined value.
But in IVB/BYT, this is not necessary, as each DF conversion already
writes 2, the first one the real value, and the second one a 0.
That is, IVB/BYT already set stride = 2 implicitly, so we must set it to
1 explicitly to avoid ending up with stride = 4.
v2:
- Fix typo (Matt)
v3:
- Fix stride in the destination's brw_reg, don't modity IR (Curro)
v4:
- Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro)
- Fix comment (Curro).
- Relax hstride assert (Curro)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Minor spelling fixes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This reverts commit 7dccd38b40.
d2x pass fixes SEL instructions when there is a type conversion
by doing a SEL without type conversion and then convert the result.
This pass also takes into account the non-uniform control flow.
Then, 7dccd38b40 is not needed anymore.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Generalize it to lower any unsupported narrower conversion.
v2 (Curro):
- Add supports_type_conversion()
- Reuse existing intruction instead of cloning it.
- Generalize d2x to narrower and equal size conversions.
v3 (Curro):
- Make supports_type_conversion() const and improve it.
- Use foreach_block_and_inst to process added instructions.
- Simplify code.
- Add assert and improve comments.
- Remove redundant mov.
- Remove useless comment.
- Remove saturate == false assert and add support for saturation
when fixing the conversion.
- Add get_exec_type() function.
v4 (Curro):
- Use get_exec_type() function to get sources' type.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On HSW+, scalar DF sources can be accessed using the normal <0,1,0>
region, but on IVB and BYT DF regions must be programmed in terms of
floats. A <0,2,1> region accomplishes this.
v2:
- Apply region <0,2,1> in brw_reg_from_fs_reg() (Curro).
v3:
- Added comment explaining the reason (Curro).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Then the SIMD lowering pass will get rid of any compressed instructions with scalar
source (whether force_writemask_all or not) and we avoid hitting the Gen7 region
decompression bug.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
In IVB and BYT, both regioning parameters and execution sizes are measured as
32-bits element size.
So when we have something like:
mov(8) g2<1>DF g3<4,4,1>DF
We are not actually moving 8 doubles (our intention), but 4 doubles.
We need to double the parameters to cope with this issue. However,
horizontal strides don't behave as they're supposed to on IVB
for DF regions, they will cause each 32-bit half of DF sources to be
strided individually, and doubling the value won't make any difference.
v2:
- Use devinfo directly (Matt).
- Use Baytrail instead of Valleview (Matt).
- Use IvyBridge instead of Ivy (Matt)
- Double the exec_size in code emission (Curro)
v3:
- Change hstride doubling by an assert and fix commit log (Curro).
- Substitute remaining compiler->devinfo by devinfo (Curro).
v4:
- Fix comment (Curro).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The execution data size is the biggest type size of any instruction
operand.
We will use it to know if the instruction deals with DF, because in Ivy
we need to double the execution size and regioning parameters.
v2:
- Fix typo in commit log (Matt)
- Use static inline function instead of fs_inst's method (Curro).
- Define the result as a constant (Curro).
- Fix indentation (Matt).
- Add braces to nested control flow (Matt).
v3 (Curro):
- Add get_exec_type() and other auxiliary functions and use them to
calculate its size.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Fix deduced
execution type for integer vector types. Take destination type as
execution type where there is no valid source. Assert-fail if the
deduced execution type is byte. Move into brw_ir_fs.h header for
consistency with the VEC4 back-end. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On IVB/BYT, region parameters and execution size for DF are in terms of
32-bit elements, so they are doubled. For evaluating the validity of an
instruction, we halve them.
v2 (Sam):
- Add comments.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
4-wide DF operations where NibCtrl applies require and execsize of 8
in IvyBridge/BayTrail.
v2:
- Refactor NibCtrl printing (Matt)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Fix the accounting for memory usage of userptr buffers, which has been wrong
forever (or at least for a long time).
Also initialize flags. Without this initialization, the sparse buffer flag
might end up being set, which leads to staging buffers being used unnecessarily
(and incorrectly) in transfers to or from userptr buffers.
This works around VM faults that occur with the radeon kernel module when
running piglit ./bin/amd_pinned_memory decrement-offset map-buffer -auto
Fixes: e077c5fe65 ("gallium/radeon: transfers and invalidation for sparse buffers")
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Use push descriptors instead of temp descriptor sets.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This allows meta to use push descriptors without disturbing user
push descriptors.
radv_meta_push_descriptor_set differs from vkCmdPushDescriptorSetKHR
in that partial updates are not supported; all descriptors used in
subsequent draw commands must be pushed at the same time.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The Vulkan driver was originally written under the assumption that
VK_ATTACHMENT_UNUSED was basically just for depth-stencil attachments.
However, the way things fell together, VK_ATTACHMENT_UNUSED can be used
anywhere in the subpass description. The blorp-based clear and resolve
code has a bunch of places where we walk lists of attachments and we
weren't handling VK_ATTACHMENT_UNUSED everywhere. This commit should
fix all of them.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
We're about to start requiring it in yet another case and calculating
exactly when one is needed is starting to get prohibitively expensive.
A single surface state doesn't take up that much space so we may as well
create one all the time.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Depending on pipe caps they can be writable in all vertex processing
stages, but only the output of the last stage counts.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Enable code sanitizers by adding -fsanitize=$foo flags for the compiler
and linker.
In addition, this also disables checking for undefined symbols: running
the address sanitizer requires additional symbols which should be provided
by a preloaded libasan.so (preloaded for hooking into malloc & friends
globally), and the undefined symbols check gets tripped up by that.
Running the tests works normally via `make check`, but shows additional
failures with the address sanitizer due to memory leaks that seem to be
mostly leaks in the tests themselves. I believe those failures should
really be fixed. In the mean-time, you can set
export ASAN_OPTIONS=detect_leaks=0
to only check for more serious error types.
v2:
- fail reasonably when an unsupported sanitize flag is given (Eric Engestrom)
Reviewed-by: Bartosz Tomczyk <bartosz.tomczyk86@gmail.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This makes it much easier to throw together a bit of dynamic state. It
also automatically handles flushing so you don't accidentally forget.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This patch enables multisample antialiasing in the OpenSWR software renderer.
MSAA is a proof-of-concept/work-in-progress with bug fixes and performance
on the way. We wanted to get the changes out now to allow several customers
to begin experimenting with MSAA in a software renderer. So as not to
impact current customers, MSAA is turned off by default - previous
functionality and performance remain intact. It is easily enabled via
environment variables, as described below.
It has only been tested with the glx-lib winsys. The intention is to
enable other state-trackers, both Windows and Linux and more fully support
FBOs.
There are 2 environment variables that affect behavior:
* SWR_MSAA_FORCE_ENABLE - force MSAA on, for apps that are not designed
for MSAA... Beware, results will vary. This is mainly for testing.
* SWR_MSAA_MAX_SAMPLE_COUNT - sets maximum supported number of
samples (1,2,4,8,16), or 0 to disable MSAA altogether.
(The default is currently 0.)
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
Removed unnecessary and probably wrong PIPE_BIND_SCANOUT and PIPE_BIND_SHARED
flags in favor of check on single PIPE_BIND_DISPLAY_TARGET flag.
Reference llvmpipe change <bee4c7718a3bd57e3d99f0913d9081cd13fe5fd>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
The context now contains SIMD vectors which must be aligned (specifically
samplePositions in the rastState in the derived state). Failure to align
can result in segv crash on unaligned memory access in vector
instructions.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
The kernel returns frequency in kHz, so to convert to nanosecond
interval that Vulkan uses the dividend should be 1000000.0 and not
100000.0.
This fixes the GPU graph in DOOM and matches the amdgpu-pro blob.
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Probably should have flipped the switch a long time ago, since it
doesn't seem to cause any problems and is a nice perf boost in a number
of cases.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Want to move one of these under ir3_block, so that gives a reason to
migrate the remaining malloc/realloc to ralloc.
Signed-off-by: Rob Clark <robdclark@gmail.com>
When publishing this spec on the OpenGL ES registry, Jon Leech noticed
that it didn't actually mention what the ES dependencies and
interactions were. I looked at extensions_table.h and noted that we
expose it in ES 3.0 contexts, and he added the obvious spec texts.
The updated copy also contains our official extension number.
https://github.com/KhronosGroup/OpenGL-Registry/issues/3
Acked-by: Matt Turner <mattst88@gmail.com>
Needed if we want to allow them taking more than 64 KiB. The calculations
of these already used 32 bits.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This bumps it to the same level as amdgpu-pro, it also
moves a bunch of dEQP-VK.geometry.instanced.* from
NotSupported to Pass.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
gl_BaseVertex is supposed to be 0 in non-indexed draws. Unfortunately, the
way they're implemented, the VGT always generates indices starting at 0,
and the VS prolog adds the start index.
There's a VGT_INDX_OFFSET register which causes the VGT to start at a
driver-defined index. However, this register cannot be written from
indirect draws.
So fix this unlikely case by setting a bit to tell the VS whether the
draw is indexed or not, so that gl_BaseVertex can be adjusted accordingly
when used.
Fixes a bug in
KHR-GL45.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters.*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Especially with subsequent changes, this makes it easier to see the
sequence of state emits at the higher level.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Most trace points happen after an operation, so add a trace point
at the start of the command buffer.
Furthermore, add one after a CmdUpdateBuffer using CP_DMA as that
didn't emit one yet.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
timestamp and pipeline_statistics only do something on begin & end,
so they don't need any action.
Occlusion queries only do something to enable/disable and that
register is set nowhere else so that doesn't need extra support either.
(We technically should fix it to update the reg with the number of
samples, but that hasn't happened yet, so we only change it to
enable/disable counting)
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is only relevant with 0 attachments. In that case we do nothing
on subpass switch already, and the pipeline is the authoritative
source of the number of samples, so this shouldn't change anything.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Flush the HUD value streams to the dump files after every newline.
v2: check that fopen succeeded (Julien)
Reviewed-and-Tested-by: Julien Isorce <jisorce@oblong.com>
The addrlib import meant we'd return after we attempted
to setup the no stencil bits for an S8_UINT, now we break
and use the stencil level info when creating stencil DB
info.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is ported from radeonsi, and avoids the bug in the
addrlib code. This should probably be something addrlib
does for us, but for now this fixes the regression without
changing addrlib and aligns us with radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
state_tracker/st_atom_framebuffer.c:208:27: warning: comparison of constant 4294967295 with expression of type 'uint16_t' (aka 'unsigned short') is always false [-Wtautological-constant-out-of-range-compare]
if (framebuffer->width == UINT_MAX)
~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~
state_tracker/st_atom_framebuffer.c:210:28: warning: comparison of constant 4294967295 with expression of type 'uint16_t' (aka 'unsigned short') is always false [-Wtautological-constant-out-of-range-compare]
if (framebuffer->height == UINT_MAX)
~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~
2 warnings generated.
Fixes: eb0fd0e5f8 ("gallium: decrease the size of pipe_framebuffer_state - 96 -> 80 bytes")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes the following Clang warning.
In file included from radeon_debug.c:32:
./radeon_common_context.h:500:19: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]
extern const char const *radeonVendorString;
v2: - do not remove the duplicate 'const' qualifier, fix it
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Fixes the following Clang warning.
vmw_screen_dri.c:130:1: warning: unused function 'vmw_dri1_intersect_src_bbox' [-Wunused-function]
vmw_dri1_intersect_src_bbox(struct drm_clip_rect *dst,
^
1 warning generated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes the following Clang warnings.
lp_setup_tri.c:55:1: warning: unused function 'subpixel_snap' [-Wunused-function]
subpixel_snap(float a)
^
lp_setup_tri.c:61:1: warning: unused function 'fixed_to_float' [-Wunused-function]
fixed_to_float(int a)
^
v2: - do not remove subpixel_snap() (use !PIPE_ARCH_SSE instead)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Fixes the following Clang warning.
sp_fs_exec.c:56:1: warning: unused function 'sp_exec_fragment_shader' [-Wunused-function]
sp_exec_fragment_shader(const struct sp_fragment_shader_variant *var)
^
1 warning generated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes the following Clang warning.
sp_quad_fs.c:60:1: warning: unused function 'quad_shade_stage' [-Wunused-function]
quad_shade_stage(struct quad_stage *qs)
^
1 warning generated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes the following Clang warning.
sp_tex_sample.c:802:1: warning: unused function 'get_texel_quad_2d' [-Wunused-function]
get_texel_quad_2d(const struct sp_sampler_view *sp_sview,
^
CC sp_tile_cache.lo
1 warning generated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes the following Clang warnings.
main/pack.c:470:1: warning: unused function 'clamp_float_to_uint' [-Wunused-function]
clamp_float_to_uint(GLfloat f)
^
main/pack.c:477:1: warning: unused function 'clamp_half_to_uint' [-Wunused-function]
clamp_half_to_uint(GLhalfARB h)
^
2 warnings generated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes the following Clang warning.
virgl_screen.c:60:12: warning: enumeration value 'PIPE_CAP_DOUBLES' not handled in switch [-Wswitch]
switch (param) {
^
1 warning generated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This removes one level of indentation and will improve readability
for bindless images.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
readInvocationARB() and readFirstInvocationARB() need SHFL.IDX
instruction which is introduced in Kepler.
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: Check if each channel is masked in TGSI_OPCODE_BALLOT (Ilia Mirkin)
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Implementation of readFirstInvocationARB() on nvidia hardware needs a
ballotARB(true) used to decide the first active thread. This expressed
in gm107 asm as (supposing output is $r0):
vote any $r0 0x1 0x1
To model the always true input, which corresponds to the second 0x1
above, we make OP_VOTE accept immediate value 0/1 and emit "0x1" and
"not 0x1" in the src field respectively.
v2: Make sure that asImm() is not NULL (Samuel Pitoiset)
v3: (Ilia Mirkin)
Make the handling more symmetric with predicate version in gm107
Use i->getSrc(s)
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: Make sure that asImm() is not NULL (Samuel Pitoiset)
v3: Check the range of immediate in OP_SHFL (Ilia Mirkin)
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: (Samuel Pitoiset)
Add an assertion to check if the target is Kepler
Make sure that asImm() is not NULL
v3: (Ilia Mirkin)
Check the range of immediate value of OP_SHFL
Use the new setPDSTL API
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
GF100's ISA encoding has a weird form of predicate destination where its
3 bits are split across whole the instruction. Use a dedicated setPDSTL
function instead of original defId which is incorrect in this case.
v2: (Ilia Mirkin)
Change API of setPDSTL() to handle cases of no output
Fix setting of the highest bit in setPDSTL()
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: Emit the original hard-coded 0x1c03 when OP_SHFL is used in gm107's
lowering (Samuel Pitoiset)
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
clang::LangAS::Offset is gone, the behaviour is as if it was 0.
v2: Introduce and use clover::llvm::compat::lang_as_offset (Francisco
Jerez)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
A few functions related to FBOs/renderbuffers should only be used with
window-system buffers, not user-created FBOs. Assert for that.
Add additional comments. No piglit regressions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We only do on-demand renderbuffer allocation for window-system FBOs,
not user-created FBOs. So put the loop inside a conditional.
Plus, add some comments. No piglit regressions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
According to the Vulkan spec, VkPipelineInputAssemblyStateCreateInfo's
primitiveRestartEnable flag should only apply to indexed draws, however
it was being enabled regardless of the type of draw. This could cause
problems for non-indexed draws with >=65535 vertices if the previous
indexed draw used 16-bit indices.
Fixes corruption of the credits text in Mad Max.
v2: Reset primitive restart state after executing a secondary command
buffer.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Makes more sense when we hash the layout for the pipeline cache.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Context: _mesa_add_parameter is sometimes[0] called with a
NULL name as a mean of an unnamed parameter.
Allowing NULL pointer as a name means that it must be NULL checked
each access. So far it isn't always[1] true.
Parameter name is only used for debug purpose (printf) and
to lookup the index/location of the program by the application.
Conclusion, there is no valid reason to use a NULL pointer instead of
an empty string. So it was decided to use an empty string which avoid all
issues related to NULL pointer
[0]: texture gather offsets glsl opcode and st_init_atifs_prog
[1]: at least shader cache, st_nir_lookup_parameter_index and some printfs
Issue found by piglit 'texturegatheroffsets' tests on Nouveau
v4: new patch based on Nicolai/Timothy/ilia discussion
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
unsigned long is a terrible type for a bitfield - if you need fewer
than 32 bits, it wastes 4 bytes. If you need more, things break on
32-bit builds. Just use unsigned.
Even that's a bit ridiculous as we only have one flag today.
Still, it's at least somewhat better.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The drm_i915_gem_create ioctl structure uses a __u64 for the size,
so we should probably use uint64_t to match. In theory, we could
probably have a BO larger than 4GB, using a 48-bit PPGTT - it just
wouldn't be mappable in the CPU's 32-bit address space.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Theoretically, with a 48-bit address space, we could have buffers
with an alignment of >= 4GB. It's a bit silly, but the exec_object
structs (drm_i915_gem_exec_object2) use a __u64 for this, so we may
as well use the same type as the kernel API.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
struct drm_i915_gem_set_tiling's stride field is a __u32.
intel_mipmap_tree::stride is a uint32_t. Using unsigned long just
doesn't make sense. Switching also lets us drop many pointless
locals that only existed to deal with the type mismatch.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The ioctl structs contain __u64 offset and size fields, so make them
uint64_t rather than unsigned long.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
For some reason we passed tiling by pointer, through several layers,
even though the functions only read the initial value, and never
actually change it. We even had a do-while loop that executed until
the tiling mode matched - except it always did, so it only ran once.
We then had bogus error handling in case it changed the tiling mode
to something nonsensical...which it never did.
Drop all this nonsense.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
These calls look like leftover from fallback texture support first
being added to the st in 8f6d9e12be and then later being added
to core mesa in 00e203fe17.
The piglit test fp-incomplete-tex continues to work with this
change.
Reviewed-by: Brian Paul <brianp@vmware.com>
The individual branches of an if/else/endif construct will be executed
some unknown number of times between 0 and 1 relative to the parent
block. Use some factor in between as weight while approximating the
cost of spill/fill instructions within a conditional if-else branch.
This favors spilling registers used within conditional branches which
are likely to be executed less frequently than registers used at the
top level.
Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on
my SKL GT4e. Should have a comparable effect on other platforms. No
significant regressions.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We check these bitfields when computing the Haswell max GL version.
We need to set them ahead of time, or they won't exist, and all our
checks will fail. That sets the max core profile GL version to 4.2.
This introduces the bizarre situation where asking for a GL context
with version 4.3+ fails, but asking for a GL core profile context
with version <= 4.2 actually promotes you a 4.5 context.
GLX_MESA_query_renderer also reported the bogus 4.2 value.
Now it shows 4.5.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reported-and-tested-by: Rafael Ristovski <rafael.ristovski@gmail.com>
Autodisable seems to cause missed rendering in some cases, but
otherwise TS seems to work properly.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Fixes a performance issue with imported winsys buffers as those are
marked with binding sampler view.
This might require a TS flush on single pipe chips that directly
sample from the rendered buffer, but otherwise seems to work fine.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
The TS surface gets cleared by a tiled RS fill. If the chip has
more than 1 pixel pipe the size of the TS surface needs to be
aligned so that each pipe address matches a tile start, otherwise
the RS will hang.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
The TS is only valid after it has been initialized by a fast
clear, so it should not be taken into account when blitting
resources that haven't been cleared. Also the blit itself
invalidates the destination TS, as it's not updated and will
retain data from the previous rendering after the blit.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
The devil is in the shader again, otherwise this is
fairly straightforward.
The CTS contains no pipeline statistics copy to buffer
testcases, so I did a basic smoketest.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
For using them with both occlusion and pipeline statistics queries.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The buffer sizes are specified just a few lines earlier, so don't
repeat ourselves.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Use the new occlusion query copy shader.
We don't use the shader for the waiting as a polling loop ineracts badly
with having caching enabled. I noticed on my GPU (Tonga) that the values
are written out in order, so I just use a WAIT_REG_MEM on the last value.
If it turns out other chips don't do that we may need to look a bit more
into this. Having 8 WAIT_REG_MEM packets per query doesn't sound ideal.
This also restricts the availability word in the pool to timestamp queries
only, as occlusion queries don't use it, and pipeline statistic queries
likely won't either.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Adds a shader for writing occlusion query results to a buffer, as the
CP packet isn't support on SI or secondary buffers, and doesn't handle
the availability bit (or partial results) nor truncation to 32-bit.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
If the buffer is being used, we should wait for those uses to be
complete before returning the map.
Fixes: GL45-CTS.direct_state_access.buffers_functional
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
We currently don't pass the low byte of the address via the surface
info, so in order to work with images, these have to implicitly be
aligned to 256. The proprietary driver also doesn't go out of its way to
provide lower alignment.
Fixes GL45-CTS.texture_buffer.texture_buffer_texture_buffer_range
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This restores the performance warnings removed in:
i965: Drop brw_bo_map[_gtt] wrappers which issue perf warnings.
but adds them for nearly all BO mapping, and also for wait_rendering.
Because we add this to the core bufmgr, we automatically get stall
warnings in all callers, unlike before where only a few callsites used
the wrappers that gave stall warnings.
We also do it a bit differently: we simply measure how long set_domain
takes (the part that stalls), and complain if it's more than 0.01 ms.
We don't bother calling brw_bo_busy(), and we don't measure the mmap
time (which doesn't stall). This should be more accurate.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In theory gcc is free to re-load them, and if a concurrent
execbuf races and updates bo->offset64 then we have a problem:
execbuffer api requires that the ->presumed_offset and the one
we used for the reloc matches. It does not require that the value
is sensible, which means no locks needed, just a consistent load.
Ken said his next series will nuke this, so just hand-roll the
kernel's READ_ONCE idea inline.
FIXME: Most callers of brw_emit_reloc recompute the relocation
themselves, which means this doesn't really fix the race. But the long
term plan is to move to per-context relocation handling, which will
fix this all properly. So leave this for now as just a reminder.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was done because the kernel has 1 global address space, shared
with all render clients, for gtt mmap offsets, and that address space
was only 32bit on 32bit kernels.
This was fixed in
commit 440fd5283a87345cdd4237bdf45fb01130ea0056
Author: Thierry Reding <treding@nvidia.com>
Date: Fri Jan 23 09:05:06 2015 +0100
drm/mm: Support 4 GiB and larger ranges
which shipped in 4.0. Of course you still want to limit the bo cache
to a reasonable size on 32bit apps to avoid ENOMEM, but that's better
solved by tuning the cache a bit. On 64bit, this was never an issue.
On top, mesa never set this, so it's all dead code. Collect an trash it.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
is_reusable was needed by uxa because it couldn't keep track of its
scanout buffers and used this as a proxy. Disabling reuse is a silly
idea, we set this once at start. Remove both.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Iirc this was used by uxa for persistent mmpas of the frontbuffer. For
mesa all the set_domain stuff needed before a synchronized mmap is handled
within the bufmgr, so no reason ever to call this.
Inline the implementation into its only internal user.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Entirely unused, and really shouldn't be used. The alloc functions already
take care of this. And even in a future where we're not going to
h/v-align tiled buffers in the bufmgr, but only in isl, I think we
still want to adjust the tiling mode in the bufmgr, since that ties in
closely to mmaps and stuff like that.
get_tiling is still needed for the import paths (until we have modifiers
everywhere).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
indent -i3 -nut -br -brs -npcs -ce --no-tabs -Tuint32_t -Tuint64_t
plus some manual fixes because those aren't quite the right settings.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The bacon is all gone.
This renames both the class and the related functions. We're about to
run indent on the bufmgr code, so no need to worry about fixing bad
indentation.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The stupid reason for eliminating these functions is that I'm about
to rename drm_bacon_bo_map() to brw_bo_map(), which makes the real
function have the short name, rather than the wrapper.
I'm also planning on reworking our mapping code soon, so we use WC
mappings and proper unsynchronized mappings on non-LLC platforms.
It will be easier to do that without thinking about the stall
warnings and wrappers.
My eventual hope is to put the performance warnings in the BO map
function itself, so all callers gain the warning.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
drm_bacon_context is a malloc'd struct containing a uint32_t context ID
and a pointer back to the bufmgr. The bufmgr pointer is pretty useless,
as everybody already has brw->bufmgr. At that point...we may as well
just use the ctx_id handle directly. A number of places already had to
call drm_bacon_gem_context_get_id() to extract the ID anyway. Now they
just have it.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
We're going to get rid of drm_bacon_context shortly, so we'd have to
change the interface slightly. It's basically just an ioctl wrapper
that isn't terribly bufmgr-related, so We may as well just combine it
with the code in brw_reset.c that actually uses it.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The only difference is that it takes an explicit bufmgr rather than
using bo->bufmgr, but there is only one bufmgr per screen so they
should be identical anyway.
Chris says this was added primarly to avoid bo/bo_gem casting,
which was inconvenient.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The separate class gives us a bit of extra encapsulation, but I don't
know that it's really worth the boilerplate. I think we can reasonably
expect the rest of the driver to be responsible.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
These fields are the same value. In the bad old days, bo->handle could
have been an identifier from the pre-GEM fake bufmgr, but that's long
gone. Keep the "gem_handle" name for clarity.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The execbuf2 kernel API requires us to construct two kinds of lists.
First is a "validation list" (struct drm_i915_gem_exec_object2[])
containing each BO referenced by the batch. (The batch buffer itself
must be the last entry in this list.) Each validation list entry
contains a pointer to the second kind of list: a relocation list.
The relocation list contains information about pointers to BOs that
the kernel may need to patch up if it relocates objects within the VMA.
This is a very general mechanism, allowing every BO to contain pointers
to other BOs. libdrm_intel models this by giving each drm_intel_bo a
list of relocations to other BOs. Together, these form "reloc trees".
Processing relocations involves a depth-first-search of the relocation
trees, starting from the batch buffer. Care has to be taken not to
double-visit buffers. Creating the validation list has to be deferred
until the last minute, after all relocations are emitted, so we have the
full tree present. Calculating the amount of aperture space required to
pin those BOs also involves tree walking, which is expensive, so libdrm
has hacks to try and perform less expensive estimates.
For some reason, it also stored the validation list in the global
(per-screen) bufmgr structure, rather than as an local variable in the
execbuffer function, requiring locking for no good reason.
It also assumed that the batch would probably contain a relocation
every 2 DWords - which is absurdly high - and simply aborted if there
were more relocations than the max. This meant the first relocation
from a BO would allocate 180kB of data structures!
This is way too complicated for our needs. i965 only emits relocations
from the batchbuffer - all GPU commands and state such as SURFACE_STATE
live in the batch BO. No other buffer uses relocations. This means we
can have a single relocation list for the batchbuffer. We can add a BO
to the validation list (set) the first time we emit a relocation to it.
We can easily keep a running tally of the aperture space required for
that list by adding the BO size when we add it to the validation list.
This patch overhauls the relocation system to do exactly that. There
are many nice benefits:
- We have a flat relocation list instead of trees.
- We can produce the validation list up front.
- We can allocate smaller arrays and dynamically grow them.
- Aperture space checks are now (a + b <= c) instead of a tree walk.
- brw_batch_references() is a trivial validation list walk.
It should be straightforward to make it O(1) in the future.
- We don't need to bloat each drm_bacon_bo with 32B of reloc data.
- We don't need to lock in execbuffer, as the data structures are
context-local, and not per-screen.
- Significantly less code and a better match for what we're doing.
- The simpler system should make it easier to take advantage of
I915_EXEC_NO_RELOC in a future patch.
Improves performance in Synmark 7.0's OglBatch7:
- Skylake GT4e: 12.1499% +/- 2.29531% (n=130)
- Apollolake: 3.89245% +/- 0.598945% (n=35)
Improves performance in GFXBench4's gl_driver2 test:
- Skylake GT4e: 3.18616% +/- 0.867791% (n=229)
- Apollolake: 4.1776% +/- 0.240847% (n=120)
v2: Feedback from Chris Wilson:
- Omit explicit zero initializers for garbage execbuf fields.
- Use .rsvd1 = ctx_id rather than i915_execbuffer2_set_context_id
- Drop unnecessary fencing assertions.
- Only use _WR variant of execbuf ioctl when necessary.
- Shrink the arrays to be smaller by default.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I'm about to rewrite how relocation handling works, at which point
drm_bacon_bo_emit_reloc() and drm_bacon_bo_mrb_exec() won't exist
anymore. This code is already largely not using the batchbuffer
infrastructure, so just go all the way and handle relocations, the
validation list, and execbuffer ourselves. That way, we don't have
to think the weird case where we only have a screen, and no context,
when redesigning the relocation handling.
v2: Write reloc.presumed_offset + reloc.delta into the batch, rather
than duplicating the comment, so it's obvious that they match
(suggested by Chris). Also add a comment about why we don't do
any error checking.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This is the threshold after which drm_intel_bufmgr_check_aperture_space
returns -ENOSPC, signalling that it thinks an execbuf is likely to fail
and we need to roll back and flush the batch.
We'll need this when we rewrite aperture space checking, shortly.
In the meantime, we can also use it in GLX_MESA_query_renderer.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
I'm about to make brw_emit_reloc do actual work, so everybody needs
to start using it and not the raw drm_bacon function.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This renames intel_batchbuffer_reloc to brw_emit_reloc and changes the
parameter naming and ordering to match drm_intel_bo_emit_reloc().
For now, it's a trivial wrapper that accesses batch->bo. When we
rework relocations, it will start doing actual work.
target_offset should be expanded to a uint64_t to match the kernel,
but for now we leave it as its original 32-bit type.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This is only useful when doing an incoherent CPU mapping of the current
scanout buffer. That's a terrible plan, so we never do it. We always
use an uncached GTT map.
So, this is useless. Drop the code.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This functionality was added by libdrm commit
743af59669386cb6e063fa4bd85f0a0b2da86295 (intel: make bufmgr_gem
shareable from different API) in an attempt to solve libva/mesa buffer
sharing problems. Specifically, this was working around an issue hit
by Chromium, which used the same drm_fd for multiple APIs, and shared
buffers between them.
This code attempted to work around that issue by using the same bufmgr
for both libva and Mesa. It worked because libdrm_intel was loaded by
both libraries. However, now that Mesa has forked, we don't have a
common library, and this code cannot work.
The correct solution is to have each API open its own file descriptor
(and get a corresponding buffer manager), and then use PRIME export
and import to share BOs across those APIs. Then the kernel can manage
those shared resources. According to Chris, the kernel will pass back
the same handle for a prime FD if the lookup is from the same device FD.
We believe Chromium has since moved to this model.
In Mesa, there is already only one screen per FD, and so there will
only be one bufmgr per FD. We don't need any of this code.
v2: Add a big warning comment written by Chris Wilson.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
No performance data has been gathered about this choice. I just don't
want that many hash tables. Chris points out that this is not
performance critical - we should not be recreating that many handles
from scratch. In the past we used a linear list, which became
unreasonable in stress tests that used hundreds of thousands of BOs.
In real usage, it shouldn't matter that much.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Mesa doesn't use this yet. We'll almost certainly want to, but we can
add the functionality back after we clean up the messy drm code.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
We may want this eventually, but simplify for now. We can add it back
later when we actually intend to use it.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
We'll want userptr support for GL_AMD_pinned_memory support someday,
and possibly some other upload optimizations. Chris says "not in this
form" though. Drop it and simplify for now - we can add it back later
when we're ready to hook it up fully.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This is basically handholding to prevent a bogus caller from trying to
execbuffer on a bogus engine. i965 already does this correctly.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This moves the PCI ID detection to intel_screen.c and makes
drm_bacon_bufmgr_gem_init() take a devinfo pointer.
We also drop the HAS_LLC query stuff - devinfo has that info already,
without kernel queries, and it makes no sense to have two has_llc flags
set by different mechanisms.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The wait-ioctl was introduced in kernel v3.6 (20120930) and that is our
current minimum requirement for screen creation.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The distinction was required when the bufmgr was virtualised, now there
is only one class, we no longer need the distraction of pretending it is
a subclass.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This moves us one step closer to killing off intel_bufmgr_priv.h.
We might want to nuke it altogether, since it's basically just a
uint32_t handle, but for now, let's focus on removing files.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
libdrm_bacon used to have a GEM-based bufmgr and a legacy fake bufmgr,
but that's long since dead (and we never imported it to i965). So,
drop the extra layer of function pointers.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Eliminates some API around this, and more importantly, the last
field in one bufmgr class.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Replace the duplicated macros imported from libdrm:
ARRAY_SIZE, MAX2, ALIGN, STATIC_ASSERT
and remove unused ROUND_UP_TO and ROUND_UP_TO_MB.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
ROUND_UP_TO handles a NPOT alignment, but all the alignments we use
are power of two anyway, so there's no need.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Since gen4, we do not use fence registers for any GPU access and so
never have to account for the fence during batch construction. All the
related fence functions are unused.
Based on Kristian Høgsberg's patch; commit message by Chris Wilson.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Mesa doesn't use these functions or macros, so we can delete them,
and save work refactoring and cleaning them up. We'll delete a lot
more later, too.
Based on a patch by Kristian Høgsberg.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Drop xf86atomic.h in favor of Mesa's util/u_atomic.h. We replace the
atomic_t wrapper struct with a bare integer, switch to the 'p_atomic'
naming conventions, and move over the one extra helper.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Using drm_intel_* as a prefix is hazardous - we don't want to conflict
with the actual libdrm_intel symbols. In particular, I think we could
get into trouble during the final megadrivers linking.
So, rename everything to an different yet arbitrary prefix. bacon and
intel are the same number of characters, so we don't have to reindent
the world. It's also an homage to Ian's "Bacon Trail" platform.
I was going to use "drm_relic" to poke fun at libdrm being ancient,
and so we could explain the name with a "historical reasons" pun,
but it sounds too much like ralloc.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
i965 doesn't use drm_intel_get_aperture_sizes(), so we can delete
support for it. This avoids a build dependency on libpciaccess.
Chris also notes:
"There's a really old bug that hopefully has been closed already
(although as far as I can tell, it has never been fixed) about
how using libpciaccess from libdrm_intel breaks the world (since
libpciaccess uses a singleton that is torn down at the first request
rather than upon the last user)."
This bug should go away in two commits when we switch over to our
internal copy of libdrm_intel.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84325
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
typeof doesn't seem to exist, so this won't compile (but we don't yet
try). Define it to __typeof__. This code is going to die soon anyway.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
We never imported any of this code, so drop the prototypes, unused
enums, and defines.
Based on patches by Emil Velikov.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This imports commit 19c4cfc54918d361f2535aec16650e9f0be667cd of
libdrm/intel/*.[ch], minus a few files that we're never going to use
(and would immediately delete), plus a few necessary dependencies.
We rename intel_bufmgr.h to brw_bufmgr.h to avoid #include conflicts.
We also fix UTF-8 symbol problems in intel_bufmgr_gem.c comments
because vim keeps trying to fix that every time I edit the file,
and we may as well fix it right away.
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Using an incoherent CPU map on the active scanout buffer is really
sketchy - we may need extra flushing via GEM_SW_FINISH, or using
drmModeDirtyFB() and kernel commit a6a7cc4b7db6d (4.10+).
Chris suggests "never ever do that", which seems like a wise plan!
intel_miptree_map_raw() uses CPU maps on linear buffers.
Having a linear scanout buffer should be really rare, and mapping the
front buffer should be similarly rare. Together, it should basically
never happen. But, in case it does somehow...make sure that mapping
the scanout buffer always goes through an uncached GTT map.
v2: Add a giant comment written by Chris Wilson.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
On modern systems with 4GB apertures, the size in bytes is 4294967296,
or (1ull << 32). The kernel gives us the aperture size as a __u64,
which works out great.
Unfortunately, libdrm "helpfully" returns the data as a size_t, which
on 32-bit systems means it truncates the aperture size to 0 bytes.
We've happily reported this value as 0 MB of video memory via
GLX_MESA_query_renderer since it was originally exposed.
This patch bypasses libdrm and calls the ioctl ourselves so we can
use a proper uint64_t, avoiding the 32-bit integer overflow. We now
report a proper video memory size on 32-bit systems.
Chris points out that the aperture size (CPU mappable size limit)
isn't really the right thing to be checking. But libdrm_intel uses
it to fail execbuffer, so it is an actual limit for now. Once that's
fixed we can probably move to something else. In the meantime, fix
the obvious typecasting bug.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Only the Radeon kernel driver exposed the GPU temperature and
the shader/memory clocks, this implements the same functionality
for the AMDGPU kernel driver.
These queries will return 0 if the DRM version is less than 3.10,
I don't explicitely check the version here because the query
codepath is already a bit messy.
v2: - rebase on top of master
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The idea is taken from radeonsi. The code mostly was already checking for null
pixel shader, so little checks had to be added.
Interestingly, acc. to testing with GTAⅣ, though binding of null shader happens
a lot at the start (then just stops), but draw_vbo() never actually sees null
ps.
v2: added a check I missed because of a macros using a prefix to choose
a shader.
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Taken from radeonsi, required to remove dummy pixel shader in the next patch
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The idea is taken from radeonsi. The code lacks some checks for null vs,
and I'm unsure about some changes against that, so I left it in place.
Some statistics for GTAⅣ:
Average tesselation bind skip per frame: ≈350
Average geometric shaders bind skip per frame: ≈260
Skip of binding vertex ones occurs rarely enough to not get into per-frame
counter at all, so I just gonna say: it happens.
v2: I've occasionally removed an empty line, don't do this.
v3: return a check for null tes and gs back, while I haven't figured out
the way to move stride assignment to r600_update_derived_state() (as it
is in radeonsi).
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
We're about to replace blorp's emit code with ISL and it emits them in
the other order. This makes diffing the aubs easier.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Since the inclusion in 7f160efcde
the header used x_biased, while the implementation used y_biased.
This changes the header to macth the implementation since the
uses of the function seems to expect y_biased.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.
Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Bartosz Tomczyk <bartosz.tomczyk86@gmail.com>
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.
Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.
Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.
Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.
Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.
Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.
Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This will be used to take ownership of freashly created renderbuffers,
avoiding the need to call the reference function which requires
locking.
V2: dereference any existing fb attachments and actually attach the
new rb.
v3: split out validation and attachment type/complete setting into
a shared static function.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Bartosz Tomczyk <bartosz.tomczyk86@gmail.com>
The nv50 ir is scalar. Perhaps this was from some early attempts to
integrate the simd aspects of nv30. However at this point it's entirely
unused.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The API/entry point in mesa already checks the correct behavior,
however, it's possible to be handled by another implementation and those
implementations should not be able to abuse a weird combination of count
and pointer.
This fixes CID 1403193
Cc: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Along the way, add missing GL_ONE source support and drop non-existing
GL_ZERO and GL_ONE operand support.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Instead of computing it once again using _mesa_tex_target_to_index.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Change it into filter_fp_input_mask transform function that instead of
returning a mask, transforms input.
Also, simplify the case of vertex program handling by assuming that
fp_inputs is always a combination of VARYING_BIT_COL* and VARYING_BIT_TEX*.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Its only usage is easily replaced by nr_enabled_units. As for cache key
part, unit[i].enabled should be enough.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Since fixed-function shaders are restricted to MAX_TEXTURE_COORD_UNITS
texture units, use this constant instead of MAX_TEXTURE_UNITS. This
reduces the array size from 32 to 8.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This gets rid of one piece of ugliness with the way ISL handles surface
emitting surface states. I've never liked that hand-rolled table but it
was the best we had at the time.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The helper block is extremely general. It takes an string property name
and an object that supports three methods: has_prop, iter_prop, and
get_prop. This way we can easily generalize it to emit more different
types of getter functions.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We tend to try to reduce the number of allocation calls the Vulkan
driver uses by doing a single allocation whenever possible for a data
structure. While this has certain downsides (usually code complexity),
it does mean error handling and cleanup is much easier. This commit
adds a nice little helper struct for getting rid of some of that
complexity.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
We only need to update it if something changes. Also
_mesa_bind_vertex_buffer() will update the mask when binding to a
NULL or default buffer so no need to do that update here.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
radeonsi added stricter checking for correct swizzles in debug builds.
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes: 4cf2942777 ("radeonsi: support 64-bit system values")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In preparation for enabling MSAA in OpenSWR, the state trackers need to
be aware of multisample pixel formats for software renderers. This patch
allows glx-xlib to query the renderer for support of pixel
formats with multisample, and create multisample resources.
This change is benign to softpipe and llvmpipe, as is_format_supported
returns FALSE for any sample_count > 1. OpenSWR does the same at the
moment, but that will change soon.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
With this patch, we will specify the current context
when we invalidate the surface before the surface is
put back to the recycled surface pool. This allows the
winsys layer to use the specified context to do the
invalidation rather than using the last context that
referenced the surface. This prevents race condition if
the last referenced context is now made current in another thread.
Tested with MTT glretrace, NobelClinicianViewer.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
If two contexts wanted to access the same buffer at the same time, it would
end up on two validation lists simultaneously, which might cause a
PIPE_ERROR_RETRY when trying to validate it from one context while the other
context already had it validated but not yet fenced.
In that situation we could spin until the error goes away, or apply various
more or less expensive locking schemes to save cpu.
Here we use a scheme that briefly locks after fencing but avoids locking on
validation in the non-contended case.
v2:
Make sure we broadcast not only on releasing buffers after fencing, but also
after releasing buffers in the pb_validate_validate error path.
v3:
Don't broadcast on PIPE_ERROR_RETRY because that would increase the chance
of starvation.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
3D wasn't officially supported before virtual HW version 8 so we can
remove this old code.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Currently, surface propagation for colliding render target resource is
done at framebuffer emit time for vgpu10. This patch
adds the surface propagation for non-vgpu10 path to emit_fb_vgpu9()
and removes the redundant surface copy at set time.
Tested with MTT glretrace, piglit, NobelClinicianViewer, Turbine, Cinebench.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
The zslice index to svga_texture_copy_handle_resource() is not adjusted
and should be a signed integer.
This patch fixes piglit tests for non-vgpu10 including
spec@arb_framebuffer_object@fbo-generatemipmap-3d
spec@glsl-1.20@execution@tex-miplevel-selection gl2:texture* 3d
Tested with MTT piglit and glretrace
This implementation is based on querying the time just before swap/present
and doing a Sleep() if needed. There is no sync to vblank or actual
coordination with the GPU. This isn't perfect, but basically works.
We've had some request for this functionality, and it sounds like there
are some Windows GL apps that refuse to start if the driver doesn't
advertise this extension.
Note: NVIDIA's Windows OpenGL driver advertises the WGL_EXT_swap_control
string both with wglGetExtensionsStringEXT() and with
glGetString(GL_EXTENSIONS). We're only advertising it with the former at
this time.
Tested with asst. Mesa demos, Google Earth, Lightsmark, etc.
VMware bug 1591534.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
When a backing surface is reused, it is possible that
the original surface has been changed. So before the backing surface
is bound again, we need to sync up the surface.
This patch creates a new helper function svga_texture_copy_handle_resource()
to sync up the backing surface resource.
This patch, together with the backing surface dirty bit fix, fixes
the rendering corruption in NobelClinicianViewer when rotating the model.
Also tested with MTT glretrace, piglit, Cinebench, Turbine.
Reviewed-by: Brian Paul <brianp@vmware.com>
The reset flag specifies if the dirty bit needs to be reset
after the surface is propagated to the texture. This is used
to make sure that the dirty bit is not reset and stay unset
before the surface is unbound.
Reviewed-by: Brian Paul <brianp@vmware.com>
The new has_backed_views flag specifies if any of the render target
views or depth stencil view is a backing surface view.
The flag is used in svga_propagate_rendertargets() so it can return early
if there is no surface to propagate.
Reviewed-by: Brian Paul <brianp@vmware.com>
A texture can be destroyed from a different context from which it is
created, but destroying the render target view from a different context
will cause svga device errors. Similar to shader resource view,
this patch skips destroying render target view or depth stencil view
from a non-parent context.
Fixes driver errors running NobelClinician Viewer application.
Tested with NobelClinician Viewer, MTT piglit, glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
With this patch, rasterization will be disabled if the
rasterizer_discard flag is set or the fragment shader
is undefined due to missing position output from the
vertex/geometry shader.
Tested with piglit test glsl-1.50-geometry-primitive-id-restart.
Also tested with full MTT glretrace and piglit.
v2: As suggested by Roland, to properly disable rasterization, besides
setting FS to NULL, we will also need to disable depth and stencil test.
v3: As suggested by Brian, set SVGA_NEW_DEPTH_STENCIL_ALPHA dirty bit
in svga_bind_rasterizer_state() if the rasterizer_discard flag is
changed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Emulating wide points in geometry shader when doing transform feedback
is problematic. This patch disables the emulation.
Tested with piglit test ext_transform_feedback-points.
Also tested with MTT glretrace, mesa demos pointblast and spriteblast.
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit b2c97bc789 which made us start
using a busy-wait for individual query results also messed up cache
flushing on !LLC platforms. For one thing, I forgot the mfence after
the clflush so memory access wasn't properly getting fenced. More
importantly, however, was that we were clflushing the whole query range
and then waiting for individual queries and then trying to read the
results without clflushing again. Getting the clflushing both correct
and efficient is very subtle and painful. Instead, let's side-step the
problem by just snooping.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
To work with addr2line.sh we also need the relative offset within the
DSO. And addr2line.sh gets confused by the leading stackframe number.
Signed-off-by: Rob Clark <robdclark@gmail.com>
We don't need to call _mesa_reference_renderbuffer() for the first
assignment as refCount starts at 1. For swrast we work around the
fact we will indirectly call _mesa_reference_renderbuffer() by
resetting refCount to 0.
Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
On Broadwell we still need to do a resolve between the subpass
that writes and the subpass that reads when there is a
self-dependency because HW could not see fast-clears and works
on the render cache as if there was regular non-fast-clear surface.
Fixes 16 tests on BDW:
dEQP-VK.renderpass.formats.*.input.clear.store.self_dep*
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This appears to be a leftover from an earlier version of this function.
Nothing is emitted into the CS.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
All offsets and strides are precomputed by
radv_CreateDescriptorUpdateTemplateKHR and stored in the template.
v2: Move the new struct declarations from radv_descriptor_set.h
to radv_private.h (Bas)
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Replace the !binding_layout->immutable_samplers assertion in
radv_update_descriptor_sets with a conditional.
The Vulkan specification does not say that it is illegal to update
a sampler descriptor when it is immutable; only that pImageInfo is
ignored.
This change is also needed for push descriptors, because valid
descriptors must be pushed for all bindings accessed by shaders,
including immutable sampler descriptors.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Move the implementation into a separate function that takes a
cmd_buffer and a dstSetOverride parameter.
When cmd_buffer is not NULL, radv_update_descriptor_sets calls
cs_add_buffer directly instead of updating the buffer list.
This will be used to implement VK_KHR_push_descriptor.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Commit f938354362 recently increased the
alignment on vertex buffer data from 32 to 64. This caused us to
consume a bit more batch than we were before and we now go over the
estimate by a small amount on certain blits on gen8+. This commit bumps
then gen8 batch estimate by a bit to compensate. Haswell and older
still seems to be well within the limit.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100582
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Decoding with aubinator encountered a command of 0xffffffff. With the
previous code, it caused aubinator to jump 255 + 2 dwords to start
decoding again.
Instead we can attempt to detect the known instruction formats. If the
format is not recognized, then we can advance just 1 dword.
v2:
* Update aubinator_error_decode
* Actually convert the length variable returned into a *signed* integer
in aubinator.c, intel_batchbuffer.c and aubinator_error_decode.c.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The system value only has an X component, and radeonsi started
checking that in debug builds.
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes: 4cf2942777 ("radeonsi: support 64-bit system values")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Before, we were just looking at whether or not the user wanted us to
wait and waiting on the BO. Some clients, such as the Serious engine,
use a single query pool for hundreds of individual query results where
the writes for those queries may be split across several command
buffers. In this scenario, the individual query we're looking for may
become available long before the BO is idle so waiting on the query pool
BO to be finished is wasteful. This commit makes us instead busy-loop on
each query until it's available.
This significantly reduces pipeline bubbles and improves performance of
The Talos Principle on medium settings (where the GPU isn't overloaded
with drawing) by around 20% on my SkyLake gt4.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
mako is already a mesa build requirement, extra copy not needed.
Tested building against mesa build baseline (mako-0.8.0).
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This avoids validation and looking up the buffer target for a second time.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This allows internal users to pass buffer objects directly and
allows for KHR_no_error support to be more easily added.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Possibly more efficient, either way it makes the code easier to
follow.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
42aaa548 changed the renderbuffer initialisation of RefCount from
1 to 0.
This is inconsitent with how we use RefCount elsewhere. Also every
driver implementation of NewRenderbuffer() calls
_mesa_init_renderbuffer() so its safe to set it there.
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit 658568941d.
With the help of shader variants we can render to rb-swapped
formats now. Fixes about 60 piglits.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
If we render to rb swapped format we will create a shader variant doing
the involved swizzing in the pixel shader.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
If shader-db run, create a standard variant immediately
(as otherwise nothing will trigger the shader to be
actually compiled).
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
In the long run the compiler needs to know the specifc variant
'key' in order to compile appropriate assembly. With this commit
the variant knows its shader and we are able pass the preallocated
variant into etna_compile_shader(..). This saves us from passing
extra ptrs everywhere.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
This commit adds some basic infrastructure to handle shader
variants. We are still creating exactly one shader variant
for each shader.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Introduce stubs to anv_gem_stub.c that match the anv_gem.c ones.
Otherwise we may get link-time errors, when building the tests.
v2: Introduce all the missing stubs at once.
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Vinson Lee <vlee@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100574
Fixes: c964f0e485 ("anv: Query the kernel for reset status")
Fixes: 651ec926fc ("anv: Add support for 48-bit addresses")
Fixes: 060a6434ec ("anv: Advertise larger heap sizes")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
---
I've intentionally kept the order the same identical to the anv_gem.c.
This way we can easily grep & diff in the future ;-)
Require LLVM 5.0 or later because LLVM 4.0 is easily fooled into
putting the lane select of llvm.amdgcn.readlane into a VGPR and then
fails to continue to compile.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Notably, llvm.amdgcn.readfirstlane and llvm.amdgcn.icmp may be hoisted
out of loops or if/else branches in cases like
if (cond) {
v = readFirstInvocationARB(x);
... use v ...
} else {
v = readFirstInvocationARB(x);
... use v ...
}
===>
v = readFirstInvocationARB(x);
if (cond) {
... use v ...
} else {
... use v ...
}
The optimization barrier is a heavy hammer to stop that until LLVM
is taught the semantics of the intrinsic properly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
LLVM will lift inline assembly out of if-else-blocks if both paths have
the same inline assembly. Prevent this by adding an irrelevant unique
text to the assembly.
This requires the LLVM assembly parser to be initialized.
Furthermore, allow forcing subsequent computations to happen after the
optimization barrier by defining a data dependency.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For simplicitly, always store system values as 32-bit values or arrays
of 32-bit values. 64-bit values are unpacked and packed accordingly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2 (Nicolai):
- BALLOT isn't per-channel
- expand the documentation (also for VOTE_*)
v3:
- only BALLOT returns a 64-bit lanemask (Boyan)
- relax the requirement on READ_INVOC: the invocation number to read
from must be uniform within a sub-group. This matches the
GL_ARB_shader_ballot spect (and the v_readlane instruction of AMD
GCN)
v4:
- hopefully really fix the doc of VOTE_* returns (Ilia)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
VM faults cannot be disabled for SDMA on <= VI.
We could still use SDMA by asking the winsys about which parts of the
buffers are committed. This is left as a potential future improvement.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We never add fences to backing buffers during submit. When we free a
backing buffer, it must inherit the sparse buffer's fences, so that it
doesn't get re-used prematurely via the cache.
v2:
- remove pipe_mutex_*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
... and implement the corresponding fence handling.
v2:
- add missing bit in amdgpu_bo_is_referenced_by_cs_with_usage
- remove pipe_mutex_*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is the bulk of the buffer allocation logic. It is fairly simple and
stupid. We'll probably want to use e.g. interval trees at some point to
keep track of commitments, but Mesa doesn't have an implementation of those
yet.
v2:
- remove pipe_mutex_*
- fix total_backing_pages accounting
- simplify by using the new VA_OP_CLEAR/REPLACE kernel interface
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This probably has only minor performance effects, but it simplifies some
subsequent code slightly.
Ideally, it could also be used to simplify the handling of slab buffers
in the same way, but unfortunately that's not possible as long as we need
indices for relocations.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Instead of just advertising the aperture size, we do something more
intelligent. On systems with a full 48-bit PPGTT, we can address 100%
of the available system RAM from the GPU. In order to keep clients from
burning 100% of your available RAM for graphics resources, we have a
nice little heuristic (which has received exactly zero tuning) to keep
things under a reasonable level of control.
Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
This commit adds support for using the full 48-bit address space on
Broadwell and newer hardware. Thanks to certain limitations, not all
objects can be placed above the 32-bit boundary. In particular, general
and state base address need to live within 32 bits. (See also
Wa32bitGeneralStateOffset and Wa32bitInstructionBaseOffset.) In order
to handle this, we add a supports_48bit_address field to anv_bo and only
set EXEC_OBJECT_SUPPORTS_48B_ADDRESS if that bit is set. We set the bit
for all client-allocated memory objects but leave it false for
driver-allocated objects. While this is more conservative than needed,
all driver allocations should easily fit in the first 32 bits of address
space and keeps things simple because we don't have to think about
whether or not any given one of our allocation data structures will be
used in a 48-bit-unsafe way.
Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
This fixes issues seen when adding support for full 48-bit addresses.
The 48-bit addresses themselves have nothing to do with it other than
that it caused the kernel to place buffers slightly differently so they
interacted differently with the caches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
When a client causes a GPU hang (or experiences issues due to a hang in
another client) we want to let it know as soon as possible. In
particular, if it submits work with a fence and calls vkWaitForFences or
vkQueueQaitIdle and it returns VK_SUCCESS, then the client should be
able to trust the results of that rendering. In order to provide this
guarantee, we have to ask the kernel for context status in a few key
locations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's possible that the device could have been lost while we were
waiting. We should let the user know if this has happened.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When the shader does not set one of these values, they are supposed to
get a default value of 0. We have hardware bits in 3DSTATE_CLIP for
this but haven't been setting them. This fixes the intermittent failure
of dEQP-VK.geometry.layered.3d.render_to_default_layer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
We already provide a default LOD for textureQueryLevels and texture() on
non-fragment stages. However, there are more cases where one is needed
such as textureSize(gsampler2DMS*) in SPIR-V. Instead of trying to list
out all of the cases one at a time, just provide the default for all TXS
and TXL operations. This fixes a shader validation error in the new
Sascha deferredmultisampling demo which uses textureSize(gsampler2DMS).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100391
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
This patch makes glCopyImageSubData require mipmap completeness when the
texture object's built-in sampler object has a mipmapping MinFilter.
Fixes (on i965):
dEQP-GLES31.functional.debug.negative_coverage.*.buffer.copy_image_sub_data
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Fix linking error.
CXXLD libGL.la
../../../../src/gallium/auxiliary/.libs/libgallium.a(u_debug_stack.o): In function `debug_backtrace_capture':
src/gallium/auxiliary/util/u_debug_stack.c:59: undefined reference to `_Ux86_64_getcontext'
src/gallium/auxiliary/util/u_debug_stack.c:60: undefined reference to `_ULx86_64_init_local'
src/gallium/auxiliary/util/u_debug_stack.c:62: undefined reference to `_ULx86_64_step'
src/gallium/auxiliary/util/u_debug_stack.c:71: undefined reference to `_ULx86_64_get_proc_info'
src/gallium/auxiliary/util/u_debug_stack.c:73: undefined reference to `_ULx86_64_get_proc_name'
src/gallium/auxiliary/util/u_debug_stack.c:65: undefined reference to `_ULx86_64_step'
Fixes: 70c272004f ("gallium/util: libunwind support")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100562
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Adding the actual table from the docs makes it clearer exactly what the
restrictions are. In particular, it becomes clear that compressed
textures ignore the alignment parameters in RENDER_SURFACE_STATE.
Reviewed-by: Chad Versace <chadversary@chromium.org>
This fixes the stripes of garbage rendered on the floor of the vehicle
assembly building among other rendering issues. The reason for the
misrendering seems to be that some of the GLSL shaders used by the
application use variables before initializing them, incorrectly
assuming that they will be implicitly set to zero by the
implementation.
Acked-by: Matt Turner <mattst88@gmail.com>
This is pretty much the same tool as what i-g-t has, only with a more
fancy decoding of the instructions/registers. It also doesn't support
anything before gen4.
v2 (from Matt): Drop authors
Remove undefined automake variable
v3: Fix incorrect offsets for dword > 1 (Jordan)
v4: Fix decompression error with large blobs (Jordan)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Specifically, non-line primitives skipped, and defaulting to reset on
each packet.
The skip of non-line primitives saves ≈110 resetting of
PA_SC_LINE_STIPPLE register per frame in Kane&Lynch2.
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Also change gs_output_prim type: unsigned → pipe_prim_type. The idea of
the code is mostly taken from radeonsi. The new code operating on
prev/curr rast_primitives saves ≈15 reloads of PA_SC_LINE_STIPPLE per
frame in Kane&Lynch2
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Note: si_shader.h has also "type" variable that should be changed to
"enum pipe_prim_type", however it triggers a bunch of warnings about
unhandled switches, so due not knowing the correct way to handle them, I
decided to leave it as is.
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
udivmod64 appears in src/compiler/glsl/builtin_int64.h and src/compiler/glsl/udivmod.h
The second file seems unused.
Fix commit 6b03b345eb
This change doesn't affect shader-db.
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Highlights:
- Display needs tiled pitch alignment to be at least 32 pixels
- Implement Addr2ComputeDccAddrFromCoord().
- Macro-pixel packed formats don't support Z swizzle modes
- Pad pitch and base alignment of PRT + TEX1D to 64KB.
- Fix support for multimedia formats
- Fix a case "PRT" entries are not selected on SI.
- Fix wrong upper bits in equations for 3D resource.
- We can't support 2d array slice rotation in gfx8 swizzle pattern
- Set base alignment for PRT + non-xor swizzle mode resource to 64KB.
- Bug workaround for Z16 4x/8x and Z32 2x/4x/8x MSAA depth texture
- Add stereo support
- Optimize swizzle mode selection
- Report pitch and height in pixels for each mip
- Adjust bpp/expandX for format ADDR_FMT_GB_GR/ADDR_FMT_BG_RG
- Correct tcCompatible flag output for mipmap surface
- Other fixes and cleanups
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Recent changes in Makefile.sources merged the aubinator files in
a unique list of generated files and genxml/genX_xml.h is now needed
to avoid the following building error:
ninja: error: '.../genxml/genX_xml.h', needed by '.../genxml/genX_xml.h',
missing and no known rule to make it
build/core/ninja.mk:148: recipe for target 'ninja_wrapper' failed
Fixes: 0f83c05 "intel: genxml: compress all gen files into one"
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We supported more generally. Decreased the dynamic buffers though, as
we only support 16 for uniform+storage.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Address sanitizer reports lot of misaligned access:
SUMMARY: AddressSanitizer: undefined-behavior main/marshal.c:276:31 in
main/marshal.c:276:31: runtime error: load of misaligned address 0x631000104866 for type
'const GLuint' (aka 'const unsigned int'), which requires 4 byte alignment
0x631000104866: note: pointer points here
92 88 00 00 00 00 00 00 4a 03 0c 00 93 88 00 00 00 00 00 00 02 01 0c 00 40 8d 00 00 00 00 00 00
^
SUMMARY: AddressSanitizer: undefined-behavior main/marshal_generated.c:28725:12 in
main/marshal_generated.c:28726:12: runtime error: member access within misaligned address 0x6310003fc874 for type
'struct marshal_cmd_VertexAttribPointer', which requires 8 byte alignment
0x6310003fc874: note: pointer points here
01 00 00 00 7a 02 20 00 00 00 00 00 be be be be be be be be be be be be be be be be be be be be
^
SUMMARY: AddressSanitizer: undefined-behavior main/marshal_generated.c:28726:12 in
main/marshal_generated.c:28726:12: runtime error: store to misaligned address 0x6310003fc87c for type
'GLint' (aka 'int'), which requires 8 byte alignment
0x6310003fc87c: note: pointer points here
00 00 00 00 be be be be be be be be be be be be be be be be be be be be be be be be be be be be
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We want the guardband_x/y to be the largerst scalars such that each
viewport scaled by that amount is still a subrange of [-32767, 32767].
The old code has a couple of issues:
1) It used scissor instead of viewport_scissor, potentially taking into
account a viewport that is too small and therefore selecting a scale
that is too large.
2) Merging the viewports isn't ideal, as for example viewports with
boundaries [0,1] and [1000, 1001] would allow a guardband scale of ~30k,
while their union [0, 1001] only allows a scale of ~32.
The new code just determines the guardband per viewport and takes the minimum.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Just enabling the driver-independent implementation that Jason did.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Interp at sample needs to use the center, since the sample
positions it retrieves are relative to the center.
This fixes a bunch of CTS tests with multisample_interpolation.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The current code was broken, and I decided to redesign it instead.
This puts the sample positions for all samples into the queue
constant descriptor buffer after all the spill/ring descriptors.
It then uses a single offset register to point how far into the
samples the samples for num_samples are. This saves one user sgpr
and means we only generate the sample position data in the rare
single case where we need it currently.
This doesn't fix the failing CTS tests without the followup
fix.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some packets like 3DSTATE_VF_STATISTICS, 3DSTATE_DRAWING_RECTANGLE,
3DPRIMITIVE, PIPELINE_SELECT, etc... have configurable fields in
dword0, we probably want to print those.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The ordering NIR gives us is correct for the hw, this fixes:
dEQP-VK.glsl.texture_functions.texturegrad.* (mainly trigged
on isampler/usampler 3d textures.).
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes:
dEQP-VK.glsl.texture_functions.texture.samplercubearray*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
To silence
C:\Users\Brian\projects\mesa\src\util/u_vector.h(41) : warning C4146: unary
minus operator applied to unsigned type, result still unsigned
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Otherwise, we were getting the definition for 'inline' by chance from
some other preceeding #include.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
There are still some distributions trying to support unfortunate people
with old or exotic CPUs that don't have 64bit atomic operations. When
compiling for such a machine, gcc conveniently inserts a library call to
a helper, but it's implementation is missing and we get a linker error.
This allows us to provide our own implementation, which is marked weak
to prefer a better implementation, should one exist.
v2: changed copyright, some style adjustments
v3: [mattst88] Print results with AC_MSG_CHECKING/AC_MSG_RESULT
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93089
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It's kinda sad that (a) we don't have debug_backtrace support on !X86
and that (b) we re-invent our own crude backtrace support in the first
place. If available, use libunwind instead. The backtrace format is
based on what xserver and weston use, since it is nice not to have to
figure out a different format.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Prep work for next patch.
Ideally 'struct debug_stack_frame' would be opaque, but it is embedded
in a bunch of places. But at least we can treat it opaquely.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Should be used by the state tracker when glGetImageHandleARB()
is called in order to create a pipe_image_view template.
v3: - move the comment to *.c
v2: - make 'st' const
- describe the function
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Call it directly when batch queue is empty. This avoids costly thread
synchronisation. This commit improves performance of games that have
previously regressed with mesa_glthread=true.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We could re-enable it also but I haven't tested that yet, and I'm
not sure we care much anyway.
V2: don't disable it from with the call itself. We need a custom
marshalling function or we get stuck waiting for thread to
finish.
V3: tidy up redundant code copied from generated version.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
All the -Wunused-but-set-variable ones.
Found a way to do it with a oneliner.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
si_state.c: In function ‘si_make_texture_descriptor’:
si_state.c:3240:25: warning: ‘num_format’ may be used uninitialized
si_state.c:3240:12: warning: ‘data_format’ may be used uninitialized
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
1-st is obvious because of assert, 2-nd stolen frmo si_draw_vbo(),
and 3-rd is just a small refactoring.
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
It removes a need to copy whole struct every call for no reason. Comparing
objdump -d output for original and this patch compiled with -O2, shows reduce
of the function by 16 bytes.
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Needed to get rid of a separate struct allocation in the next patch, because
the one in argument is a constant, and don't allow changing its fields.
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Clears can happen before a rast is set, which can in turn cause scissors
and fragprog to be validated. Make sure that we handle this case.
Reported-by: Andrew Randrianasulu <randrianasulu@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
y is vert, x is horiz.
Noticed in visual inspection compared to radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Will be more convenient for bindless because the 64bit handle is
actually the base_ptr of the descriptor (ie. 'list' will be fetched
from TGSI_FILE_CONSTANT/TGSI_FILE_TEMPORARY instead).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
lp_build_emit_fetch() is useful when the source type can be
infered from the instruction opcode.
However, for bindless samplers/images we can't do that easily
because tgsi_opcode_infer_src_type() returns TGSI_TYPE_FLOAT for
TEX instructions, while we need TGSI_TYPE_UNSIGNED64 if the
resource register is bindless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Since 63684a9a ("glsl: Combine many instruction lowering passes
into one.", Thu Nov 18 2010), we no longer have anything called
ir_explog_to_explog2. So it's only confusing to have those
references there.
Update with the appropriate method, so people can grep for it in
the current tree if they encounter it.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
geom was removed in e968975 ("gallium: remove the geom_flags param
from is_format_supported", Tue Mar 8 00:01:58 2011 +0100), but the
documentation of it was left over. Let's bring the documentation up
to date.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
st_finalize_texture always accesses image at face 0, but it may not be
set if we are working with cubemap that had other face set.
This fixes crash in piglit
same-attachment-glFramebufferTexture2D-GL_DEPTH_STENCIL_ATTACHMENT.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Helps Feral-ported games, due to their use of fma()
shader-db changes:
total instructions in shared programs : 3934925 -> 3934327 (-0.02%)
total gprs used in shared programs : 481563 -> 481563 (0.00%)
total local used in shared programs : 27469 -> 27469 (0.00%)
total bytes used in shared programs : 36061888 -> 36056504 (-0.01%)
local gpr inst bytes
helped 0 0 228 228
hurt 0 0 0 0
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: renamed commit
reordered modifiers
add assert(dst == src2)
v3: reordered modifiers again
v5: no rounding bit for limms
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
changes for GpuTest /test=pixmark_piano /benchmark /no_scorebox /msaa=0
/benchmark_duration_ms=60000 /width=1024 /height=640:
score: 1026 -> 1045
changes for shader-db:
total instructions in shared programs : 3943335 -> 3934925 (-0.21%)
total gprs used in shared programs : 481563 -> 481563 (0.00%)
total local used in shared programs : 27469 -> 27469 (0.00%)
total bytes used in shared programs : 36139384 -> 36061888 (-0.21%)
local gpr inst bytes
helped 0 0 3587 3587
hurt 0 0 0 0
v2: removed TODO
reorderd to show changes without RA modification
removed stale debugging print() call
v3: remove predicate checks
enable only for gf100 ISA
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
we might want to add more folding passes here, so make it a bit more generic
v2: leave the comment and reword commit message
v4: rename it to PostRaLoadPropagation
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Helps mainly Feral-ported games, due to their use of fma()
shader-db changes:
total instructions in shared programs : 3941587 -> 3940749 (-0.02%)
total gprs used in shared programs : 481511 -> 481460 (-0.01%)
total local used in shared programs : 27469 -> 27481 (0.04%)
total bytes used in shared programs : 36123344 -> 36115776 (-0.02%)
local gpr inst bytes
helped 2 48 243 243
hurt 2 3 32 32
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
I hit an assert in the emiter while toying around with optimizations, because
ConstantFolding immediated a big int into a mad.
There is special handling for FMA/MAD in insnCanLoad, which is broken. With
this patch the special path should be not hit anymore. Anyway, the constraints
for the LIMMS can't be guarenteed in SSA form and I have patches pending to
use it via a post-SSA optimization pass.
As a result, immediates get immediated for int mad/fmas as well.
changes in shader-db:
total instructions in shared programs : 3943335 -> 3941587 (-0.04%)
total gprs used in shared programs : 481563 -> 481511 (-0.01%)
total local used in shared programs : 27469 -> 27469 (0.00%)
total bytes used in shared programs : 36139384 -> 36123344 (-0.04%)
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
[imirkin: remove extra bit from insnCanLoad as well]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This enables support for the GL_NV_fill_rectangle extension on the
GM200+ for Desktop OpenGL.
Signed-off-by: Lyude <lyude@redhat.com>
Changes since v1:
- Fix commit message
- Add note to reldocs
Changes since v2:
- Remove unnessecary parens in nvc0_screen_get_param()
- Fix sorting in release notes
- Don't execute FILL_RECTANGLE method on pre-GM200+ GPUs
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Changes since v1:
- Add pipe caps for etnaviv, freedreno, swr and virgl
Signed-off-by: Lyude <lyude@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Since we don't have the bits required to support this in OpenGLES yet,
this only enables support for Desktop OpenGL
Signed-off-by: Lyude <lyude@redhat.com>
Changes since v1:
- Simply _mesa_PolygonMode() a little bit
- Fix formatting in OpenGL spec excerpts
- Move polygon mode checking into _mesa_valid_to_render()
Changes since v3:
- Improve error message for invalid drawings with GL_FILL_RECTANGLE_NV
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This enables tessellation shaders and sets some values for
the maximums.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This seems to get lost in the rebases, should fix
the tessellation demos, crash in llvm.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This emits the tessellation shaders and state to the command stream.
It contains the logic to emit the LS/HS shaders.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
So tess shaders have some circular dependencies,
TCS needs the TES primitive mode
TES needs the TCS vertices out
This builds the nir for each shader first to get the
info, executes a tes specific nir pass, then builds
the LLVM shaders.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This ports the code from radeonsi to build the if/endif,
and ports the tess factor emission code. This code has
an optimisation TODO that we can deal with later.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support for the tessellation inputs/outputs to the
shader compiler, this is one of the main pieces of the patch.
It is very similiar to the radeonsi code (post merge we should
consider if there are better sharing opportunities). The main
differences from radeonsi, is that we can have "compact" varyings
for clip/cull/tess factors, and we have to add special handling
for these.
This consists of treating the const index from the deref different
depending on the compactness.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds support for the nir intrinsics that tessellation uses.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This hooks up the tessellation shader info to the nir values
and ctx generated ones.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This calculates the pipeline state for tessellation.
It moves the gs ring calculation down to below
where the tessellation shaders will be compiled,
as it needs the info from those shaders.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This add support for tessellation patch inputs to the code
that finds the unique parameter index.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This ports the VGT_VERTEX_REUSE register settings
for Polaris GPUs from radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just configures all the register inputs for the tessellation
related stages.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just sets up the necessary pointers on the compiler
side for the rings needed for tessellation.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch adds support for the offchip rings for storing
tessellation factors and attribute data.
It includes the register setup for the TF ring
v2: always do tess ring size calcs (Bas)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the tess pieces for shader keys and shader info,
it adds the necessary bits to the vertex key/info as well.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds support for tess to the shader stage conversion
and emits the per-stage descriptors/constants for tess stages.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some versions of MinGW-w64 such as 5.3.1 and 6.2.0 produce bad code
with -O2 or -O3 causing a random driver crash when running programs
that use GLSL. Most Mesa demos in the glsl/ directory trigger the
bug, but not the fragcoord.c test.
Use a #pragma to force -O1 for this file for later MinGW versions.
Luckily, this is basically one-time setup code. I suspect the bug
is related to the sheer size of this file.
This should let us move to newer versions of MinGW-w64 for Mesa.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
To avoid dereferencing a null pointer in case wglMakeCurrent() wasn't
called. Found while debugging SWKOTOR game.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
for threaded gallium, which can't use pipe_context in create_surface
v2: don't add a new decompress helper function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Fix a bug that was caused by a type mismatch in the shift count between
GLSL and TGSI. I briefly considered adjusting the TGSI semantics, but
since both LLVM and AMD GCN require both arguments to be of the same type,
it makes more sense to keep TGSI as-is -- it reflects the underlying
implementation better.
I'm also sending out piglit tests that expose this error.
v2: use the right number of components for the temporary register
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The commit mentioned below required the __DRI2FlushExtension to have
version 4 or above, for GBM functionality. That broke GBM with some
classic dri drivers. Relax that requirement so that we only flush
after unmap if we have version 4 or above. Drivers that require the flush
for correct functionality should implement the desired version.
Fixes: ba8df228 ("gbm/dri: Flush after unmap")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Dylan Baker <dylan@pnwbakers.com>
This allows us to run 32bit Vulkan apps on Android, ftruncate
call would fail on 2GB (max size being 2GB - 1).
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is to fix following compile error with libmesa_isl:
mesa/src/intel/isl/isl.c:28:10: fatal error: 'genxml/genX_bits.h' file not found
Fixes: f0eaf38 ("genxml: New generated header genX_bits.h (v6)")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emli Velikov <emil.velikov@collabora.com>
This is confusing because is only applys to GL_ARB_vertex/fragment_program,
and because of that its also not very useful.
If someone requires this for debugging they can just make an ad-hoc
code change.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Combining all the files into a single string didn't make any
difference in the size of the aubinator binary.
With this change we now also embed gen4/4.5/5 descriptions, which
increases the aubinator size by ~16Kb.
v2 (Lionel): rebase makefiles
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Vulkan Clipping is defined in terms of vertices, the scissor based
clipping happens on pixels. There is a difference with points and
lines, as a vertex can be outside the viewport while some pixels are in.
On Vulkan thoise pixels shouldn't be drawn, while they would be with
the guardband.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
For opcodes such as the nir_op_pack_64_2x32 for which all sources and
destinations have explicit sizes, the bit_size parameter to the evaluate
function is pointless and *should* do nothing. Previously, we were
always switching on the bit_size and asserting if it isn't one of the
sizes in the list. This generates way more code than needed and is a
bit cruel because it doesn't let us have a bit_size of zero on an ALU op
which shouldn't need a bit_size.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Eric renamed these from dri_bufmgr_* and intel_bufmgr_* to drm_intel_*
in libdrm commit 4b9826408f65976a1a13387beda748b65e03ec52, circa 2008,
but we've been using the legacy names this whole time.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Only common/decoder.[ch] requires it [for intel_aub.h].
v2: The code was moved to from intel/tools to intel/common,
update accordingly.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The only part which requires libdrm_intel tools/aubinator is not built
on Android.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/home/marek/dev/mesa-main/src/gallium/drivers/softpipe/sp_compute.c:178:
warning: 'grid_size' may be used uninitialized in this function
[-Wmaybe-uninitialized]
/home/marek/dev/mesa-main/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c:3598:
warning: 'level' may be used uninitialized in this function [-Wmaybe-uninitialized]
out1 = lp_build_cmp(&leveli_bld, PIPE_FUNC_GREATER, level, last_level);
^
All tests pass on Fiji now. This prevents DCC disablement due to
incompatible DCC formats due to the fallback.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Create new function to get correct alignment based on Asics, and change
the corresponding decode message buffer and dpb buffer size calculations
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The border color swizzle logic was copied from Vulkan. It doesn't make any
sense to me, but it passes all piglits except the stencil ones.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Both GFX6 and GFX9 fields are printed next to each other in parsed IBs.
The Python script parses both headers like one stream and tries to merge
all definitions.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
the DATA_FORMAT and NUM_FORMAT fields are the same, but some of the enums
differ, thus add GFX6 and GFX9 suffixes, so that the IB parser can show
enums for both.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Add _GFX6 and _GFX9 suffixes to conflicting definitions.
sid.h and gfx9d.h can now be included in the same file.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This resolves trivial conflicts with gfx9d.h caused by different formatting.
Some fields are also renamed.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The usage should be client first call AddrComputeSurfaceInfo() on
depth surface with flag "matchStencilTilecfg", AddrLib will use
2DThin1 tile index for depth as much as possible and do not down grade
unless alignment requirement cannot be met.
1. If there is a matched 2DThin1 tile index for stencil which make
sure they will share same tile config parameters, then return the
stencil 2DThin1 tile index as well.
2. If using 2DThin1 tile mode cannot make sure such thing happen, and
TcCompatible flag was set, then ignore this flag then try 2DThin1 tile
mode for depth and stencil again.
3. If 2DThin1 tile mode cannot make sure depth and stencil to have
same tile config parameters, then down grade depth surface tile mode
to 1DThin1.
4. If depth surface's tile mode was 1DThin1, then return 1DThin1 tile
index for stencil.
5. If depth surface's tile mode is PRT, then return invalid tile index
to stencil since their tile config parameters will never be met.
Client driver then check the returned tile index of stencil -- if it
is not invalid tile index, then call AddrComputeSurfaceInfo() on
stencil surface with the returned stencil tile index to get full
output information. Please note, client needs to set flag
"useTileIndex" when AddrLib get created.
1) minimizePadding - Use 1D tile mode if padded size of 2D is bigger
than 1D
2) maxBaseAlign - Force PRT tile mode if macro block size is bigger than
requested alignment.
Also, related changes to tile mode optimization for needEquation.
1. Add new surface flags needEquation for client driver use to force
the surface tile setting equation compatible. Override 2D/3D macro
tile mode to PRT_* tile mode if this flag is TRUE and num slice > 1.
2. Add numEquations and pEquationTable in ADDR_CREATE_OUTPUT structure
to return number of equations and the equation table to client driver
3. Add equationIndex in ADDR_COMPUTE_SURFACE_INFO_OUTPUT structure to
return the equation index to client driver
Please note the use of address equation has following restrictions:
1) The surface can't be splitable
2) The surface can't have non zero tile swizzle value
3) Surface with > 1 slices must have PRT tile mode, which disable
slice rotation
Sometimes client driver passes valid tile info into address library,
in this case, the tile index is computed in function
HwlPostCheckTileIndex instead of CiAddrLib::HwlSetupTileCfg.
We need to call HwlPostCheckTileIndex to calculate the correct tile
index to get tile split bytes for this case.
When clients queries tile Info from tile index and expects accurate
tileSplit info, bits per pixel info is required to be provided since
this is necessary for computing tileSplitBytes; otherwise Addrlib will
return value of "tileBytes" instead if bpp is 0 - which is also
current logic. If clients don't need tileSplit info, it's OK to pass
bpp with value 0.
Kaveri (2-pipe) macro tiling mode table was initially set to all
4-aspect-ratio so the swizzling path did not work for it and then we
chose to pad the offset. We now discover the root cause is that if
ratio > 2, the swizzling path does not work. So we can safely use the
same path for Kaveri.
Even if surface info input flag "tcComaptible" is enabled, tc
compatible may be not supported if tile split happens for depth
surfaces. Add a new flag in output structure to notify client to
disable tc compatible in this case.
Carrizo row size is 1K, while tileSplitBytes is 2K for a 4xAA 32bpp
depth surface. Remove the sanity check that tileSplitBytes must be
greater than row size. There could be performance loss but may be
covered by non-split depth which enables tc-compatible read.
Change the logic to compute tc compatible stencil info via depth's
tileIndex instead of using depth's tileInfo. So the clients can get
the stencil's tileInfo computed from macroModeTable. If the stencil
tileInfo is same as depth tileInfo, then stencil is tc compatible;
otherwise, stencil is not tc compatible. The current suggestion is to
create another stencil buffer with the tc compatible tileInfo, use
depth-to-color copy to decompress and tile convert the rendered
stencil to tc compoatible stencil (And use the new stencil buffer to
program TC).
Debian, Ubuntu set default build flag: -Werror=format-security
CC state_tracker/st_cb_texturebarrier.lo
state_tracker/st_cb_eglimage.c: In function ‘st_egl_image_get_surface’:
state_tracker/st_cb_eglimage.c:64:7: error: format not a string literal and no format arguments [-Werror=format-security]
_mesa_error(ctx, GL_INVALID_VALUE, error);
^~~~~~~~~~~
state_tracker/st_cb_eglimage.c:71:7: error: format not a string literal and no format arguments [-Werror=format-security]
_mesa_error(ctx, GL_INVALID_OPERATION, error);
^~~~~~~~~~~
Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Fixes: 83e9de25f3 ("st/mesa: EGLImageTarget* error handling")
These two functions do the exact same thing. One returns a uint64_t,
and the other takes the same uint64_t and truncates it to a uint32_t.
We only need the uint64_t variant - the caller can truncate if it wants.
This patch gives us one function, intel_batchbuffer_reloc, that does
the 64-bit thing.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Doing this before tessellation makes doing some bits of
tessellation a bit cleaner. It also cleans up a bit of the
llvm generator code.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix codegen build break that was introduced earlier
v2: update rules for gen_knobs.cpp and gen_knobs.h
v3: Introduce bldroot and revert generator file changes, making patch simpler.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
The old code would sync and then throw a cryptic error message.
There is no need for a custom error, we can just fallback to
the real function and have it do proper validation.
Fixes piglit test:
glsl-uniform-out-of-bounds
Which was returning the wrong error code.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Stop trying to specify texture or renderbuffer objects for unsupported
EGL images. Generate the error codes specified in the OES_EGL_image
extension.
EGLImageTargetTexture2D and EGLImageTargetRenderbuffer would call
the pipe driver's create_surface callback without ever checking that
the given EGL image is actually compatible with the chosen target
texture or renderbuffer. This patch adds a call to the pipe driver's
is_format_supported callback and generates an INVALID_OPERATION error
for unsupported EGL images. If the EGL image handle does not describe
a valid EGL image, an INVALID_VALUE error is generated.
v2: fixed get_surface to actually use the usage and error parameters
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The only callers are here, and we will add generation of GL errors in
the following patch. Rename the function to st_egl_image_get_surface,
pass the gl_context instead of st_context, and move the cast from
GLeglImageOES to void* into st_egl_image_get_surface.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Technically those hw operations are only available on gen7, as gen8+
support the conversion on the MOV. But, when using the builder to
implement nir operations (example: nir_op_fquantize2f16), it is not
needed to do the gen check. This check is done later, on the final
emission at brw_F32TO16 (brw_eu_emit), choosing between the MOV or the
specific operation accordingly.
So in the middle, during optimization phases those hw operations can
be around for gen8+ too.
Without this patch, several (at least 95) vulkan-cts quantize tests
crashes when using INTEL_DEBUG=optimizer. For example:
dEQP-VK.spirv_assembly.instruction.graphics.opquantize.too_small_vert
v2: simplify the code using GEN_GE (Ilia Mirkin)
v3: tweak brw_instruction_name instead of changing opcode_descs
table, that is used for validation (Matt Turner)
Reviewed-by: Matt Turner <mattst88@gmail.com>
No performance testing has been done, because it makes sense to make this
change regardless of that. Also, _NEW_TEXTURE is still used in many places,
but the obvious occurences are replaced here.
It's now possible to split _NEW_TEXTURE_OBJECT further.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
MSVC has been including a xtime definition in thr/xtimec.h ever since
MSVC 2013 (which is the minimum we require for building Mesa), and
including it prevents duplicate definitions when it gets included by
LLVM.
In fact, it looks that MSVC has been including a partial C11 threads
implementation too for some time, which we should consider migrating to
once we eliminate the use of _MTX_INITIALIZER_NP in our tree.
Thanks to the anonymous helper from
https://bugs.freedesktop.org/show_bug.cgi?id=100201#c4 for spotting
this.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100201
CC: "17.0" <mesa-stable@lists.freedesktop.org>
Drivers may queue dma operations on the context at unmap time so we need
to flush to make sure the data gets to the bo. Ideally the application
would take care of this, but since there appears to be no exported gbm
flush functionality we need to explicitly flush at unmap time.
This fixes a problem where kmscube on vmwgfx in rgba textured mode would
render using an uninitialized texture rather than the intended
rgba pattern.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Per the Vulkan spec, memory objects may be deleted before the buffers
and images using them are deleted, although those resources then
cannot be used except for deletion themselves.
For the virtual buffers, we need to access them on resource destruction
to unmap the regions, so this results in a use-after-free. Implement
reference counting to avoid this.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
v2: - Added comments.
- Fixed a double unmap bug.
- Actually unmap the non-edge old ranges.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
While having the _3d and _gpgpu versions is nice, there's no reason why
we need to have duplicated logic for tracking the current pipeline.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The programming note that says we need to do this still exists in the
SkyLake PRM and, from looking at the bspec, seems like it may apply to
all hardware generations SNB+. Unfortunately, this isn't particularly
clear cut since there is also language in the bspec that says you can
skip the flushing and stall to get better throughput. Experimentation
with the "Car Chase" benchmark in GL seems to indicate that some form of
flushing is still needed. This commit makes us do the full set of
flushes regardless of hardware generation. We can always reduce the
flushing later.
Reported-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
A bunch of code was indented in such a way that it looked like it went
with the if statement above but it definitely didn't.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
We're not using anything in it, and we don't want to inherit struct
definitions from some other package anyway.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Like done in si_state_draw.c::si_draw_vbo
u_upload_alloc can fail, i.e. set output param *ptr to NULL, for 2 reasons:
alloc fails or map fails. For both there is already a fprintf/stderr in
radeon_create_bo and radeon_bo_do_map.
In src/gallium/drivers/ it is a common usage to just avoid to crash by doing
a silent check. But defer fprintf where the error comes from, libdrm calls.
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
All callers of isl_surf_init() that set 'min_row_pitch' wanted to
request an *exact* row pitch, as evidenced by nearby asserts, but isl
lacked API for doing so. Now that isl has an API for that, update the
code to use it.
v2: Assert that isl_surf_init() succeeds because the callers assume
it. [for jekstrand]
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> (v1)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
The caller does so by setting the new field
isl_surf_init_info::row_pitch.
v2: Validate the requested row_pitch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
Validate that isl_surf::row_pitch fits in the below bitfields,
if applicable based on isl_surf::usage.
RENDER_SURFACE_STATE::SurfacePitch
RENDER_SURFACE_STATE::AuxiliarySurfacePitch
3DSTATE_DEPTH_BUFFER::SurfacePitch
3DSTATE_HIER_DEPTH_BUFFER::SurfacePitch
v2:
-Add a Makefile dependency on generated header genX_bits.h.
v3:
- Test ISL_SURF_USAGE_STORAGE_BIT too. [for jekstrand]
- Drop explicity dependency on generated header. [for emil]
v4:
- Rebase for new gen_bits_header.py script.
- Replace gen_10x with gen_device_info*.
v5:
- Drop FINISHME for validation of GEN9 1D row pitch. [for jekstrand]
- Reformat bit tests. [for jekstrand]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v4)
genX_bits.h contains the sizes of bitfields in genxml instructions,
structures, and registers. It also defines some functions to query those
sizes.
isl_surf_init() will use the new header to validate that requested
pitches fit in their destination bitfields.
What's currently in genX_bits.h:
- Each CONTAINER::Field from gen*.xml that has a bitsize has a macro
in genX_bits.h:
#define GEN{N}_CONTAINER_Field_bits {bitsize}
- For each set of macros whose name, after stripping the GEN prefix,
is the same, genX_bits.h contains a query function:
static inline uint32_t __attribute__((pure))
CONTAINER_Field_bits(const struct gen_device_info *devinfo);
v2 (Chad Versace):
- Parse the XML instead of scraping the generated gen*_pack.h headers.
v3 (Dylan Baker):
- Port to Mako.
v4 (Jason Ekstrand):
- Make the _bits functions take a gen_device_info.
v5 (Chad Versace):
- Fix autotools out-of-tree build.
- Fix Android build. Tested with git://github.com/android-ia/manifest.
- Fix macro names. They were all missing the "_bits" suffix.
- Fix macros names more. Remove all double-underscores.
- Unindent all generated code. (It was floating in a sea of whitespace).
- Reformat header to appear human-written not machine-generated.
- Sort gens from high to low. Newest gens should come first because,
when we read code, we likely want to read the gen8/9 code and ignore
the gen4 code. So put the gen4 code at the bottom.
- Replace 'const' attributes with 'pure', because the functions now
have a pointer parameter.
- Add --cpp-guard flag. Used by Android.
- Kill class FieldCollection. After Jason's rewrite, it was just
a dict.
v6 (Chad Versace):
- Replace `key not in d.keys()` with `key not in d`. [for dylan]
Co-authored-by: Dylan Baker <dylan@pnwbakers.com>
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v5)
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> (v6)
Move common codegen functions into gen_common.py.
v2: change gen_knobs.py to find the template file internally, like
the rest of the gen scripts.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
When using an overlayfs system (like a Docker container), rmrf_local()
fails because part of the files to be removed are in different mount
points (layouts). And thus cache-test fails.
Letting crossing mount points is not a big problem, specially because
this is just for a test, not to be used in real code.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Otherwise manual invokation of the script from elsewhere than
`dirname $0` will fail.
With these all the artefacts should be created in the correct location,
and thus we can remove the old (and slighly strange) clean-local line.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Current definitions work fine for the manual invokation of the script,
although the whole script does not consider that one can run it OOT.
The latter will be handled with latter patches, although it will be
extensively using the two variables.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Rather than hardcoding the binary location (which ends up wrong in a
number of occasions) in the python script, pass it as argument.
This allows us to remove a couple of dirname/basename workarounds that
aimed to keep this working, and succeeded in the odd occasion.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
At the moment we look for generator script(s) in builddir while they
are in srcdir, and we proceed to generate the tests and expected output
in srcdir, which is not allowed.
To untangle:
- look for the generator script in the correct place
- generate the files in builddir, by extending create_test_cases.py to
use --outdir
With this in place the test passes `make check' for OOT builds - would
that be as standalone or part of `make distcheck'
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Now that we have srcdir we can use it to correctly manage/point to the
script. Effectively fixing OOT invokation of `make check'.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
With later commits we'll fix the generators to produce the files in the
correct location. That in itself will cause an issue since the files
will be left dangling and make distcheck will fail.
v2: Use -r only as needed (Eric)
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
we anyway allow for multiple slices
v2: do not remove assert to check for buf->size
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Only a small tail needs to be uploaded manually.
This is only partly a performance measure (apps are expected to use
aligned access). Mostly it is preparation for sparse buffers, which the
old code would incorrectly have attempted to map directly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This allows the next patches to be simple while still being able
to make use of SDMA even in some unusual cases.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
With tess this becomes a bit more complex. so move to pipeline
for now.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just a precursor for tess support, which needs to
pass different values here.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In order to faciliate adding tess support, split the vs/es
output info into a separate block, so we make it easier to
have the tess shaders export the same info.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The logic was different than radeonsi, fix it up before adding
tess support.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If rasterization is disabled, we can get a NULL multisample
state.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
It reads @ writes the DB cache, and we haven't flushed dst caches yet,
so DB cache may be stale. Also the user might be shader read (and probably is),
so also flush after.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Yf/Ys tiling never got used in i965 due to not delivering
the expected performance benefits. So, this patch is deleting
this dead code in favor of adding it later in ISL when we
actually find it useful. ISL can then share this code between
vulkan and GL.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fast copy blit was primarily added to support Yf/Ys detiling.
But, Yf/Ys tiling never got used in i965 due to not delivering
the expected performance benefits. Also, replacing legacy blits
with fast copy blit didn't help the benchmarking numbers. This
is probably due to a h/w restriction that says "start pixel for
Fast Copy blit should be on an OWord boundary". This restriction
causes many blit operations to skip fast copy blit and use legacy
blits. So, this patch is deleting this dead code in favor of
adding it later when we actually find it useful.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We've already required Kernel 3.6 on Gen6+ since Mesa 9.2 (May 2013,
commit 92d2f5acfa). It seems reasonable
to require it for Gen4-5 as well, bumping the requirement from 2.6.39.
This is necessary for glClientWaitSync with a timeout to work, which
is a feature we expose on Gen4-5. Without it, we would fall back to an
infinite wait, which is pretty bad.
See kernel commit 172cf15d18889313bf2c3bfb81fcea08369274ef in 3.6+.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously we would just escape the loop and move everything
following the loop inside the if to the else branch of a new if
with a return flag conditional. However everything outside the
if the loop was nested in would still get executed.
Adding a new return to the then branch of the new if fixes this
and we just let a follow pass clean it up if needed.
Fixes:
tests/spec/glsl-1.10/execution/vs-nested-return-sibling-loop.shader_test
tests/spec/glsl-1.10/execution/vs-nested-return-sibling-loop2.shader_test
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
If we had no rasterization, we'd emit SPI color
format as all 0's the hw dislikes this, add the workaround
from radeonsi.
Found while debugging tessellation
v2: handle at pipeline stage, we have to handle
it after we process the fragment shader. (Bas)
v3: simplify even further, remove old fallback.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix 'make check' linking error with glibc < 2.17.
CXXLD main-test
../../../../src/mesa/.libs/libmesa.a(libmesautil_la-u_queue.o): In function `u_thread_get_time_nano':
src/util/../../src/util/u_thread.h:84: undefined reference to `clock_gettime'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
SEL can only convert between a few integer types, which we basically
never do.
Fixes fs/vs-double-uniform-array-direct-indirect-non-uniform-control-flow
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
We should use anv_get_layerCount() to access layerCount of VkImageSub-
resourceRange in anv_CmdClearColorImage and anv_CmdClearDepthStencil-
Image, which handles the VK_REMAINING_ARRAY_LAYERS (~0) case.
Test: Sample multithreadcmdbuf from LunarG can run without crash
Signed-off-by: Xu Randy <randy.xu@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
From page 45 (page 52 of the PDF) of the GLSL ES 3.00 v.6 spec:
" When instance names are present on matched block names, it is
allowed for the instance names to differ; they need not match for
the blocks to match.
From page 51 (page 57 of the PDF) of the GLSL 4.30 v.8 spec:
" When instance names are present on matched block names, it is
allowed for the instance names to differ; they need not match for
the blocks to match."
Therefore, no cross linking validation is needed for the instance name
of an Interface Block.
This patch will make that no link error will be reported on a program
like this:
"# VS
layout(binding = 1) Block1 {
vec4 color;
} uni_block;
...
# FS
layout(binding = 2) Block2 {
vec4 color;
} uni_block;
..."
Fixes GL45-CTS.enhanced_layouts.ssb_layout_qualifier_conflict
Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
From page 140 (page 147 of the PDF) of the GLSL ES 3.10 v.4 spec:
" 9.2 Matching of Qualifiers
The following tables summarize the requirements for matching of
qualifiers. It applies whenever there are two or more matching
variables in a shader interface.
Notes:
1. Yes means the qualifiers must match.
...
9.2.1 Linked Shaders
| Qualifier | Qualifier | in/out | Default | uniform | buffer|
| Class | | | Uniforms | Block | Block |
...
| Layout | binding | N/A | Yes | Yes | Yes |"
From page 93 (page 110 of the PDF) of the GL 4.2 (Core Profile) spec:
" 2.11.7 Uniform Variables
...
Uniform Blocks
...
When a named uniform block is declared by multiple shaders in a
program, it must be declared identically in each shader. The
uniforms within the block must be declared with the same names and
types, and in the same order. If a program contains multiple
shaders with different declarations for the same named uniform
block differs between shader, the program will fail to link."
From page 129 (page 150 of the PDF) of the GL 4.3 (Core Profile) spec:
" 7.8 Shader Buffer Variables and Shader Storage Blocks
...
When a named shader storage block is declared by multiple shaders
in a program, it must be declared identically in each shader. The
buffer variables within the block must be declared with the same
names, types, qualification, and declaration order. If a program
contains multiple shaders with different declarations for the same
named shader storage block, the program will fail to link."
Therefore, if the binding qualifier differs between two linked Uniform
or Shader Storage Blocks of the same name, a link error should happen.
This patch will make that a link error will be reported on a program
like this:
"# VS
layout(binding = 1) Block {
vec4 color;
} uni_block1;
...
# FS
layout(binding = 2) Block {
vec4 color;
} uni_block2;
..."
Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
While it's legal to have an active blocks count > 0 on link failure.
Unless we actually assign memory for the blocks array we can end up
segfaulting in calls such as glUniformBlockBinding().
To avoid having to NULL check these api calls we simply reset the
block count to 0 if the array was not created.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
A resolve is not needed on Skylake in this case. We were forcing
a resolve because we set the input_aux_usage to ISL_AUX_USAGE_NONE.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The checks were only looking at the first byte, while the intention
seems to be to check if the whole sha1 is zero. This prevented all
shaders with first byte zero in their sha1 from being saved.
This shaves around a second from Deus Ex load time on a hot cache.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Even though the programs themselves stay in cache and are loaded, the
shader keys can be evicted separately. If that happens, unnecessary
compiles are caused that waste time, and no matter how many times the
program is re-run, performance never recovers to the levels of first
hot cache run. To deal with this, we need to refresh the shader keys
of shaders that were recompiled.
An easy way to currently observe this is running Deux Ex, then piglit
and Deux Ex again, or deleting just the cache index. The later is
causing over a minute of lost time on all later Deux Ex runs, with this
patch it returns to normal after 1 run.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Surfaces and Volumes can be freed in the worker thread.
Without this patch, pending_uploads_counter could be non-zero
in the Surfaces or Volumes dtor, leading to deadlock.
Instead decrease properly the counter before releasing the
item.
Also avoid another potential deadlock if the item is not
properly unlocked: Do not call UnlockRect which will cause deadlock,
but free directly using the deadlock safe
nine_context_get_pipe_multithread.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99246
CC: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Tested-by: James Harvey <lothmordor@gmail.com>
Fix regression caused by
abb1c645c4
The patch made csmt use context.pipe instead of
secondary_pipe, leading to thread safety issues.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
These generated source files depend not only upon gl_and_es_API.xml, but
all other XML files that are included by it.
This change updates the generation rules to depend on all gen/*.xml
files, like done for other SCons generation rules, and should fix
incremental broken SCons builds due to missing dependencies.
Trivial.
Fix 'make check' linking errors with glibc < 2.17.
CXXLD glsl/glsl_test
glsl/.libs/libglsl.a(libmesautil_la-u_queue.o): In function `u_thread_get_time_nano':
src/util/../../src/util/u_thread.h:84: undefined reference to `clock_gettime'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
If we get EOF earlier than expected, the current read loops will
deadlock. This may easily happen if the disk cache gets corrupted.
Fix it by using a helper function that handles EOF.
Steps to reproduce (on a build with asserts disabled):
$ glxgears
$ find ~/.cache/mesa/ -type f -exec truncate -s 0 '{}' \;
$ glxgears # deadlock
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
While at it, also fix up a failure message to not reference timestamp
and gpu dirs as those are no longer being made.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Drop ir3_compiler_destroy(), since it is only ralloc_free() and we
shouldn't really have an ir3 dependency in core. If some future hw
has a new compiler, as long as all it's resources are ralloc()d then
things will all just work.
(In practice, I suppose you never really see this leak, but removing
it at least cleans up some noise in valgrind.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Some field names had extra spaces and some had places where we should
have had a space but didn't.
Reviewed-by: Chad Versace <chadversary@chromium.org>
We've never used it, it only exists on gen8, and the name of the struct
contains piles of bad characters.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Otherwise blitter would still hold a ref to, for example, sampler-
views.
To reproduce:
glmark2 -b desktop:duration=2 --run-forever
Fixes: a8e6734 ("freedreno: support for using generic clear path")
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
It's indexed by buffer, not stream. BRW_MAX_SOL_BUFFERS and
MAX_VERTEX_STREAMS happen to both be 4, so there's no actual bug.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We create the BO when creating a transform feedback object, and only
destroy it when deleting that object. So it won't be NULL.
CID: 1401410
Reviewed-by: Matt Turner <mattst88@gmail.com>
The state tracker no longer uploads those attributes for us,
so we must conservatively upload the size of the largest
attribute, which is a dvec4.
Fixes a regression of GL45-CTS.gpu_shader_fp64.varyings and
GL45-CTS.vertex_attrib_64bit.limits_test.
Fixes: 9b91e0b54c ("radeonsi: allow unaligned vertex buffer offsets and strides on CIK-VI")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Earlier commit unintentionally dropped the mkdir, as it was rebased.
Some versions of autotools will not create the output directory for
generated sources. Thus the issue went unnoticed by the original author.
Cc: Dylan Baker <dylan@pnwbakers.com>
Cc: Steven Newbury <steve@snewbury.org.uk>
Reported-by: Steven Newbury <steve@snewbury.org.uk> Fixes:
Fixes: 1610b3dede ("anv: don't pass xmlfile via stdin anv_entrypoints_gen.py")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
We don't need to make the caller (CmdCopyQueryPoolResults) aware of the
problem since compute_query_result() only emits state. The caller is also
expected to hit OOM in this scenario right after calling this function, but
it is already handling it safely.
Fixes:
dEQP-VK.api.out_of_host_memory.cmd_copy_query_pool_results
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We need to know if sample shading has been requested during shader
compilation since that affects the way fragment coordinates are
computed.
Notice that the semantics of fragment coordinates only depend on
whether sample shading has been requested, not on whether more
than one sample will actually be produced (that is,
minSampleShading and rasterizationSamples do not affect this
behavior).
Because this setting affects the code we generate for the shader, we also
need to include it in the WM prog key. Notice we don't need to alter the
OpenGL code because it doesn't ever use this behavior, so they key's
value is always false (the default).
Fixes:
dEQP-VK.glsl.builtin_var.fragcoord_msaa.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
According to section 14.6 of the Vulkan specification:
"When sample shading is enabled, the x and y components of FragCoord
reflect the location of the sample corresponding to the shader
invocation."
So add a boolean parameter to the lowering pass to select this behavior
when we need it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If we know the device has been lost we should return this error code for
any command that can report it before we attempt to do anything with the
device.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The Vulkan specs say:
"A logical device may become lost because of hardware errors, execution
timeouts, power management events and/or platform-specific events. This
may cause pending and future command execution to fail and cause hardware
resources to be corrupted. When this happens, certain commands will
return VK_ERROR_DEVICE_LOST (see Error Codes for a list of such commands).
After any such event, the logical device is considered lost. It is not
possible to reset the logical device to a non-lost state, however the lost
state is specific to a logical device (VkDevice), and the corresponding
physical device (VkPhysicalDevice) may be otherwise unaffected. In some
cases, the physical device may also be lost, and attempting to create a
new logical device will fail, returning VK_ERROR_DEVICE_LOST."
This means that we need to track if a logical device has been lost so we can
have the commands referenced by the spec return VK_ERROR_DEVICE_LOST
immediately.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
So that we don't have to do things like rolling back address relocations in
case that we ran into OOM after computing them, etc
Also, make sure that if the queue submission comes with a fence, we set it up
correctly so it behaves according to the spec after returning
VK_ERROR_DEVICE_LOST.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
GL_AMD_pinned_memory requires memory to be aligned correctly, so
we skip marshalling in this case. Also copying the data defeats
the purpose of EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD.
Fixes GL_AMD_pinned_memory piglit tests when glthread is enabled.
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
This can be used to deal with key hash collisions from different
versions (should we find that to actually happen) and to find
which mesa version produced the cache entry.
V2: use blob created at cache creation.
v3: remove left over var from v1.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
This allows to get rid of the arch and gpu name directories.
v2: (Timothy Arceri) don't use an opaque data type to store
pointer size and gpu name.
v3: (Timothy Arceri) use blob to store driver keys just make sure
to store null terminator for strings, and make sure blob is
defined by disk_cache and not it's users.
v4: (Timothy Arceri) fix typo, and make ptr_size a uint8_t.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Instead of using a directory, hash the timestamps into the cache keys
themselves. Since there is no more timestamp directory, there is no more
need for deleting the cache of other mesa versions and we rely on
eviction to clean up the old cache entries. This solves the problem of
using several incarnations of disk_cache at the same time, where one
deletes a directory belonging to the other, like when both OpenGL and
gallium nine are used simultaneously (or several different mesa
installations).
v2: using additional blob instead of trying to clone sha1 state
v3: (Timothy Arceri) don't use an opaque data type to store
timestamp.
V4: (Timothy Arceri) use blob to store driver keys just make sure
to store null terminator for strings, and make sure blob is
defined by disk_cache and not it's users.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100091
We want to be able to check the progress of each pass and dump the NIR
for debugging purposes if it changed.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Workaround an unknown bug with inside the transfer_map for certain
ASIC, also tested with un-affected ASICs, the performance actually
improved slightly.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The actual offset returned is uint32_t, however int64_t was used as the
return type from gbm_bo_get_offset to allow negative returns to signal
errors to the caller.
In case of an error getting the offset, the user will also be unable to
get the handle/FD, and thus have nothing to offset into. This means that
returning 0 as an error value is harmless, allowing us to change the
return type to uint32_t in order to avoid signed/unsigned confusion in
callers.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Recent change to use drmGetDevices2() made me realize that
build configured using
PKG_CONFIG_PATH=my_drm_lib_path/pkgconfig ./autogen.sh
considers the libdrm path gotten from pkgconfig only during
make. When invoking "make install" the relink command puts
system library ahead of the path gotten from pkgconfig
(and starts to fail as system libdrm isn't new enough).
This change forces the relink command to respect pkgconfig
settings.
It looks to me that in
https://bugs.freedesktop.org/show_bug.cgi?id=100259
with Emil et al considering it a libtool bug.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
[Emil Velikov: add inline comment]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
patch adds DECODER_FILES for libintel_common, this is so that platforms
such as Android not currently using this functionality can opt out.
Fixes: 7d84bb3 ("intel: Move tools/decoder.[ch] to common/gen_decoder.[ch].")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Patch fixes entrypoint generation for libmesa_anv_entrypoints that
still used old style of calling generator script.
Also small fixes to libmesa_vulkan_common where there was a typo
in target name (vulknan) and files were generated to wrong folder.
Fixes: 8211e3e6 ("anv: Generate anv_entrypoints header and code in one command")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Automake generation rules are replicated for android.
$* macro was expected to return "hsw" but instead gives "hsw.{h,c}"
so $(basename $*) is used as a workaround
to set the correct --chipset option for brw_oa.py script.
Build tested with nougat-x86
Fixes: e565505 "i965: Add script to gen code for OA counter queries"
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Robert Bragg <robert@sixbynine.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Original naming was following Vulkan HAL naming scheme for no good
purpose and we need same binary name for build-id code.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
It's written in C rather than pure python and is strictly faster, the
only reason not to use it that it's classes cannot be subclassed.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
This has the potential to mask errors, since Element.get works like
dict.get, returning None if the element isn't found. I think the reason
that Element.get was used is that vulkan has one extension that isn't
really an extension, and thus is missing the 'protect' field.
This patch changes the behavior slightly by replacing get with explicit
lookup in the Element.attrib dictionary, and using xpath to only iterate
over extensions with a "protect" attribute.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Instead of using an if and a check, use dict.get, which does the same
thing, but more succinctly.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
This produces the header and the code in one command, saving the need to
call the same script twice, which parses the same XML file.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
This produces a file that is identical except for whitespace, there is a
table that has 8 columns in the original and is easy to do with prints,
but is ugly using mako, so it doesn't have columns; the data is not
inherently tabular.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
This does two things, first it updates both the .h and the .c file to
have the same do not edit string. Second, it uses __file__ to ensure
that even if the file is moved or renamed that the name will be correct.
One thing to note is the use of '{{' and '}}' in the C template. This is
to instruct python to print a literal '{' and '}' respectively, rather
than treating the contents as a formatter specifier.
v3: - add this patch
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
This is groundwork for the next patches, it will allows porting the
header and the code to mako separately, and will also allow both to be
run simultaneously.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
It's slow, and has the potential for encoding issues.
v2: - pass xml file location via argument
- update Android.mk
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
These are all fairly small cleanups/tweaks that don't really deserve
their own patch.
- Prefer comprehensions to map() and filter(), since they're faster
- replace unused variables with _
- Use 4 spaces of indent
- drop semicolons from the end of lines
- Don't use parens around if conditions
- don't put spaces around brackets
- don't import modules as caps (ET -> et)
- Use docstrings instead of comments
v2: - Replace comprehensions with multiplication
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
CP DMA and PKT3_WRITE_DATA (in CmdUpdateBuffer) don't (currently) write
through L2. Therefore, to make these writes visible to later accesses
we must invalidate L2 rather than just writing it back, to avoid the
possibility that stale data is read through L2.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix linking error on CentOS 6.
CXXLD glsl_compiler
glsl/.libs/libstandalone.a(lt16-libmesautil_la-u_queue.o): In function `u_thread_get_time_nano':
src/util/../../src/util/u_thread.h:84: undefined reference to `clock_gettime'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Otherwise for apps that don't seed the regular rand() we will always
remove old cache entries from the same dirs.
V2: assume bits returned by rand are independent uniformly distributed
bits and grab our hex value without taking the modulus of the whole
value, this also fixes a bug where 'f' was always missing.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
V2: pass the seed to the seed function so that we can isolate
its uses. Stop leaking fd when urandom couldn't be read.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
V2: pass the seed to rand_xorshift128plus() so that we can isolate
its uses.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will allow to force computing the absolute value for sqrt()
and inversesqrt() in order to follow D3D9 behaviour for buggy
apps that rely on it.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Function::getArgumentList() doesn't exist anymore, switch to using
arg_begin() (existed back to at least llvm-3.6.0).
Reviewed-by: Vedran Miletić <vedran@miletic.net>
CC: <mesa-stable@lists.freedesktop.org>
Any users of KitKat are likely using an older version of Mesa and
KitKat support adds complexity to the make files. Dropping support
allows removing the MESA_LOLLIPOP_BUILD make variable in various make
files.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The Android version defines are only needed for versions less than 4.2
which aren't really supported or tested.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Commit 6facb0c08f ("android: fix libz dynamic library dependencies")
added libz as a dependency, but this breaks host targets as the host
dependency is libz-host. As no host lib needs libz, just remove the
dependency for them.
Fixes: 6facb0c08f "android: fix libz dynamic library dependencies"
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixed with the following command:
perl -pe 'BEGIN{undef $/;} s/ \\\n\n/\n\n/smg' $(find . -name 'Android.*')
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Through the glsl headers we had an odd mix of guards be that
"ifndef", "pragma once" neither or both.
Simplify things by using the more common ones (ifndef) and annotating
all the sources, barring the generated builting header -
builtin_int64.h.
The final header - udivmod64.h - is [seemingly] unused and on its way
out (patch purge it is on the mailing list).
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
This should just work (tm) with the default options. Plus the one we
pass is already the default, so just drop it.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
The header provides the LINUX_VERSION_CODE and KERNEL_VERSION macros.
With neither of which being used by any part of mesa.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
The query is a properties query so it needs to be handled in
GetPhysicalDeviceProperties2, not GetPhysicalDeviceFeatures2.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Rather than using 3 different ways to wrap _mesa_sha1_*() to SHA1*()
functions (a macro, prototype with implementation in .c and an inline
function), make all 3 inline functions.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
At the moment, we would honour any system headers - vulkan_intel.h in
particular over the ones in-tree.
Thus, if one does incremental build of mesa, without the vulkan.h
already installed (or at least not in the same directory as
vulkan_intel.h) the build will fail.
In the future we might want to upstream the vulkan_intel.h within
vulkan.h or use other ways to make vulkan_intel.h obsolete. In either
case, the more robust thing is to rely on our own copy.
v2: Move AM_CPPFLAGS just above LIBDRM_CFLAGS (Grazvydas, Jason)
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Fixes: ee8044fd "intel/vulkan: Get rid of recursive make"
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
primcount must be a GLsizei as in the signature for MultiDrawElements
or bad things can happen.
Furthermore, an error should be flagged when primcount is negative.
Curiously, this code used to work somewhat correctly even when primcount
was negative, because the loop that checks count[i] would iterate out of
bounds and almost certainly hit a negative value at some point.
Found by an ASAN error in
GL45-CTS.gtf32.GL3Tests.draw_elements_base_vertex.draw_elements_base_vertex_primcount
Note that the OpenGL spec seems to have s/primcount/drawcount/ at some
point, and the code still reflects the old language.
v2: provide the correct spec quotes (pointed out by Ian)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
MATERIALFV may end up reading up to 4 floats from the passed parameter.
This should really set a GL_INVALID_ENUM error in the cases where it
matters, but does anybody really care?
Found by ASAN in piglit gl-1.0-beginend-coverage.
v2: fix a trivial compiler warning
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
The calculations of row_pitch, the row pitch's alignment, surface size,
and base_alignment were mixed together. This patch moves the calculation
of row_pitch and its alignment to occur before the calculation of
surface_size and base_alignment.
This simplifies a follow-on patch that adds a new member, 'row_pitch',
to struct isl_surf_init_info.
v2:
- Also extract the row pitch alignment.
- More helper functions that will later help validate the row pitch.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
isl has a giant comment that explains the hardware's padding
requirements. (Hint: Cache lines and page faults). But the comment is in
the wrong place, in isl_calc_linear_row_pitch(), which is unrelated to
padding.
The important parts of that comment were copied to
isl_apply_surface_padding() long ago. So drop the misplaced comment.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
All the plumbing is in place so the extension just needs to be
advertised.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This doesn't really "do" anything because the default tiling for the
winsys buffer is X tiled. We do however want the X tiled modifier to
work correctly from the API perspective, which would imply that if you
set this modifier, and later do a get_modifier, you get back at least X
tiled.
Running with a modified kmscube, here are the bandwidth measurements.
Linear:
Read bandwidth: 1039.31 MiB/s
Write bandwidth: 1453.56 MiB/s
Y-tiled:
Read bandwidth: 458.29 MiB/s
Write bandwidth: 542.12 MiB/s
X-tiled:
Read bandwidth: 575.01 MiB/s
Write bandwidth: 606.25 MiB/s
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This patch begins introducing how we'll actually handle the potentially
many modifiers coming in from the API, how we'll store them, and the
structure in the code to support it.
Prior to this patch, the Y-tiled modifier would be entirely ignored. It
shouldn't actually be used until this point because we've not bumped the
DRIimage extension version (which is a requirement to use modifiers).
Measuring later in the series with kmscube:
Linear:
Read bandwidth: 1048.44 MiB/s
Write bandwidth: 1483.17 MiB/s
Y-tiled:
Read bandwidth: 471.13 MiB/s
Write bandwidth: 589.10 MiB/s
Similar functionality was introduced and then reverted here:
commit 6a0d036483
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Thu Apr 21 20:14:58 2016 -0700
i965: Always use Y-tiled buffers on SKL+
v2: Use last set bit instead of first set bit in modifiers to address
bug found by Daniel Stone.
v3: Use the new priority modifier selection thing. This nullifies the
bug fixed by v2 also.
v4: Get rid of modifier compaction which originally served another
purpose and now serves none (Jason)
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At image creation create a path for dealing with the linear modifier.
This works exactly like the old usage flags where __DRI_IMAGE_USE_LINEAR
was specified.
During development of this patch series, it was decided that a lack of
modifier was an insufficient way to express the required modifiers. As a
result, 0 was repurposed to mean a modifier for a LINEAR layout.
NOTE: This patch was added for v3 of the patch series.
v2: Rework the algorithm for modifier selection to go from a bitmask
based selection to this priority value.
v3: Make DRM_FORMAT_MOD_INVALID allowed at selection as a way of
identifying no modifiers found (because 0 is LINEAR) (Jason)
v4: Remove the logic to prune unknown modifiers (like those from other
vendors) and simply handle is in select_best_modifier (Jason)
Requested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
New to the patch series after reordering things for landing smaller
chunks.
This will essentially enable modifiers from clients that were just
enabled in previous patches. A client could use the modifiers by
setting all of them at create, but had no way to actually query them
after creating the surface (ie. stupid clients could be broken before
this patch, but in more ways than this).
Obviously, there are no modifiers being actually stored yet - so this
patch shouldn't do anything other than allow the API to get back 0 (or
the LINEAR modifier).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I intend to need to get to the devinfo structure, and storing the screen
is an easy way to do that.
It seems to be the consensus that you cannot share an image between
multiple screens.
Scape-goat: Rob Clark <robdclark@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Recent glibc generates this warning:
brw_performance_query.c:1648:13: warning: In the GNU C Library, "minor" is defined
by <sys/sysmacros.h>. For historical compatibility, it is
currently defined by <sys/types.h> as well, but we plan to
remove this soon. To use "minor", include <sys/sysmacros.h>
directly. If you did not intend to use a system-defined macro
"minor", you should undefine it after including <sys/types.h>.
min = minor(sb.st_rdev);
So, include sys/sysmacros.h to shut up the warning.
v2: Use the AC_HEADER_MAJOR defines to figure out the right header
(thanks to Jonathan Gray for helping me not break non-glibc systems)
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
This was used for aubdumping (deleted a while ago) and INTEL_DEBUG=bat
decoding (deleted recently).
While we're changing parameters, delete the wrapper macro and make the
actual function brw_state_batch instead of __brw_state_batch.
This subsumes a patch by Emil Velikov to drop this from BLORP.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This deletes all of our handwritten code in favor of autogenerated
genxml-based decoding. This should be much more usable, as the old
code isn't entirely accurate - we updated some things for new
generations, but not everything.
Aubinator has one annoying limitation: it has no idea how many entries
to print when encountering e.g. 3DSTATE_BINDING_TABLE_POINTERS_VS. It
picks an arbitrary number, which may skip decoding valid data, and may
print extra garbage entries.
We do a better job here by making brw_state_batch track the size of the
data stored at a particular batchbuffer offset. Then, we can divide by
the structure size to obtain the exact number of entries.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This should give substantially better decoding, as the public libdrm
decoder hasn't been properly maintained in years.
For now, we reuse the existing state dumping mechanism. We'll improve
that in the next patch.
To avoid increasing the size of the driver, we restrict this feature
to debug builds of Mesa. There's probably very little use for it in
release builds anyway.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fix build with Python < 2.7.
File "src/compiler/nir/nir_builder_opcodes_h.py", line 46, in <module>
from nir_opcodes import opcodes
File "src/compiler/nir/nir_opcodes.py", line 178, in <module>
unop_convert("{}2{}{}".format(src_t[0], dst_t[0], bit_size),
ValueError: zero length field name in format
Fixes: 762a6333f2 ("nir: Rework conversion opcodes")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Like done in another place in that same file.
CID 1250588
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Simplifies the write code a bit and handles EINTR.
V2: (Timothy Arceri) Drop EINTR handling. To do it
properly we would need a retry limit but it's
probably best to just avoid trying to write if
we hit EINTR and try again next time we see
the program.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
There is no need to hardcode it, we can just use blob_key[0].
This is needed because the next patches are going to change how cache
keys are computed.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This will allow to hash additional data into the cache keys or even
change the hashing algorithm easily, should we decide to do so.
v2: don't try to compute key (and crash) if cache is disabled
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Per pixel stats are cached but were not always being flushed as threads
moved from one draw context to the next. Added an explicit flush to allow
all archrast objects to flush any cached events.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Performance is now 50x faster with archrast now that we're properly
filtering out all of the rdtsc begin/end.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Autogen functions that instantiates different BackendPixelRate templates.
Functions get split into separate files after reaching a user defined
threshold (currently 512 per file) to speed up compilation.
This change will enable the addition of more template flags in the pixel
back end.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Detecting register write support by trial and error introduces a
stall at screen creation time, which it would be nice to avoid.
Certain command parser versions guarantee this will work (see the
giant comment in intelInitScreen2 below, or a few commits ago):
- Ivybridge: version >= 1 (kernel v3.16)
- Baytrail: version >= 2 (kernel v3.19)
- Haswell: version >= 7 (kernel v4.8)
For simplicity, we don't bother with version 1 in this patch.
This assumes that the user hasn't disabled aliasing PPGTT via a kernel
command line parameter. Don't do that - you're only breaking things.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
If we can't write registers, then the effective command parser version
is 0 - it may exist, but it's not usefully enabling anything.
See kernel commit 1ca3712ca3429a617ed6c5f87718e4f6fe4ae0c6 (in v4.8)
where the kernel starts doing this for us. This makes us do more or
less the same thing on older kernels.
This should preserve a bit of sanity by allowing us to perform a
screen->cmd_parser_version > N check to determine that we really can
use the features promised by command parser version N.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This should help us figure out the complexities of which kernel
versions we need to get various features on various platforms.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
In commit d2590eb65f I enabled GL 4.5
on Haswell...but failed to check if we could do indirect compute
shader dispatch...and query buffer objects.
Indirect compute shader dispatch requires command parser version 5
(kernel commit 7b9748cb513a6bef4af87b79f0da3ff7e8b56cd8, which is in
Linux v4.4). On earlier kernels we would have disabled
ARB_compute_shader, which is a mandatory part of OpenGL 4.3+.
Query buffer objects currently require MI_MATH and MI_LOAD_REGISTER_REG,
which mean command parser version 7 (Linux v4.8). On earlier kernels
we would have disabled ARB_query_buffer_object, which is a mandatory
part of OpenGL 4.4+.
The new version support looks like:
- Kernel 4.1 and older => OpenGL 3.3
- Kernel 4.2-4.3 => OpenGL 4.2
- Kernel 4.4-4.7 => OpenGL 4.3
- Kernel 4.8+ => OpenGL 4.5
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The author is Heiko Przybyl(CC'ing), the patch is rebased on top of Bartosz Tomczyk's one per Dieter Nützel's comment.
Tested-by: Constantine Charlamov <Hi-Angel@yandex.ru>
v2: Resend the patch again through git-email. The prev. rebase was sent
through Thunderbird, which screwed up tab characters, making the patch
not apply.
--------------
When fixing the stalls on evergreen I introduced leaking of the useinfo
structure(s). Sorry. Instead of allocating a new object to hold 3 values
where only one is actually used, rework the list to just store the node
pointer. Thus no allocating and deallocation is needed. Since use_info
and use_kind aren't used anywhere, drop them and reduce code complexity.
This might also save some small amount of cycles.
Thanks to Bartosz Tomczyk for finding the bug.
Reported-by: Bartosz Tomczyk <bartosz.tomczyk86 at gmail.com <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>>
Signed-off-by: Heiko Przybyl <lil_tux at web.de <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>>
Supersedes: https://patchwork.freedesktop.org/patch/135852
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
When the iterator encounters a structure field, it now looks up the
gen_group for that structure definition and saves a pointer to it.
This lets us drop a lot of ridiculous code in the caller, which looked
at item->value (<struct NAME dword>), strtok'd the structure name back
out, and looked it up itself.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The iterator code already computed this value, then we stored it in
the structure name, strtok'd it back out, and also manually computed
it when printing dword headers.
Just put the value in the struct and use it. Way simpler.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It made more sense when decode_group() took a bunch of extra options,
but now that there's only one...we may as well pass 0 and call it a day.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
I added this flag in 65a9d5eabb but
it was completely unused. Both callers appear to have printed dword
headers, so we can just drop the flag and continue doing it
unconditionally.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
When decoding a structure field within a group, we may want to look up
that structure type. Having a gen_spec pointer makes it easy to do so.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
gen_field_iterator_next() produces a string representing the value of
the field. For enum values, it also produced a separate "description"
string containing the textual name of the enum.
The only caller of this function combines the two, printing enums as
"<numeric value> (<texture enum name>)". We may as well just store
that in item->value directly, eliminating the description field, and
a layer of wrapping.
v2: Use non-overlapping source and destination strings in snprintf.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
CID 1399479: Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking velems suggests that it may be null,
but it has already been dereferenced on all paths leading to the check.
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Like in a few other places in that radeon_drm_bo.c file.
CID 715739.
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
fc_sp variable should indicate number of elements in
fc_stack array, but fc_sp was increased at beginning of fc_pushlevel
function. It leads to situation where idx=0 was never used, and last
32 element was stored outside fs_stack array.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The second check in the old code looked pretty much unreachable, esp.
because it's not obvious that "max_entries" could be zero. To find out
that it was intentional I had to run some checks, and to dig into
the old versions of the file.
So, rewrite the check to make the intention clear.
v2: s/r600/r600g in the title, and per Dieter Nützel's comment wrap
lines of condition.
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
The crash is due to NULL pColorBlendState, which is legal if the
pipeline has rasterization disabled or if the subpass of the render pass
the pipeline is created against does not use any color attachments.
Test: Sample subpasses from LunarG can run without crash
Signed-off-by: Xu,Randy <randy.xu@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
The current code evaluated to always true, we only want to flush
on the first submit. Rename the variable to do_flush, and only
emit on the first iteration.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since we already do fabs on the one source, we're guaranteed to get
positive infinity if we get any infinity at all. Since +inf only has
one IEEE 754 representation, we can use an integer comparison and avoid
all of the ordered/unordered issues.
Cc: Dave Airlie <airlied@redhat.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This reverts commit 2845a108a9.
This break VK-GL-CTS randomly.
./deqp-vk --deqp-case=dEQP-VK.texture.filtering.3d.formats.r4g4b4a4*
bounces around here from 6/6 to 3/6 or 4/6 to hanging.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This was meant to be checking the index type to get the correct
index not the last emitted one. This fixes:
dEQP-VK.pipeline.input_assembly.primitive_restart.index_type_uint32.triangle_strip_with_adjacency
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I haven't seen this causing problems in practice, but for correctness
we should also check if rename succeeded to avoid breaking accounting
and leaving a .tmp file behind.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
At the time of target file check, .tmp file is already created and file
lock is held, so we should remove the .tmp, like in other error paths.
With this, piglit no longer leaves large amount of empty .tmp files
behind, which waste directory entries and may interfere with eviction.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
It seems there is a bug because:
- 20 bytes are compared, but only 1 byte stored_keys step is used
- entries can overlap each other by 19 bytes
- index_mmap is ~1.3M in size, but only first 64K is used
With this fix for Deus Ex:
- startup time (from launch to Feral logo): ~38s -> ~16s
- disk_cache_has_key() hit rate: ~50% -> ~96%
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
There should be minimal gain, if any, for nvc0, but nv50 may end up
noticing more often that the lod argument is uniform. This, in turn,
will remove the need for some unnecessary transformations, which were
being hit due to the checks being done pre-ssa.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Helps mainly Feral-ported games, due to their use of fma()
shader-db changes:
total instructions in shared programs : 3901147 -> 3842505 (-1.50%)
total gprs used in shared programs : 471258 -> 467359 (-0.83%)
total local used in shared programs : 27405 -> 27361 (-0.16%)
total bytes used in shared programs : 35749888 -> 35214176 (-1.50%)
local gpr inst bytes
helped 17 1829 4091 4091
hurt 4 44 3 3
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Since switching to LRU eviction the only user of these predicate
functions now resolves directory entry stats itself so pass them
directly saving calling fstat and strlen twice (and the
expensive strlen is skipped entirely if access time is newer).
v2: Update for empty cache dir detection changes
v3: Fix passing string length to predicate with the +1 for NULL
termination and also pass sb as pointer
v4: Missed ampersand for passing sb as pointer
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Previously each time we saw a variable we just created a duplicate
entry in the list. This is particularly bad for loops were we add
everything twice, and then throw nested loops into the mix and the
list was growing expoentially.
This stops the glsl-vs-unroll-explosion test which has 16 nested
loops from reaching the tests mem usage limit in this pass. The
test now hits the mem limit in opt_copy_propagation_elements()
instead.
I suspect this was also part of the reason this pass can be so
slow with some shaders.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Fixes a bunch of piglit crashes that hit an assert() when trying
to delete the framebuffer. The assert() was triggered because
WinSysDrawBuffer was set to NULL before glDeleteFramebuffers()
was called.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In the end, pipeline statistics queries look a lot like occlusion
queries only with between 1 and 11 begin/end pairs being generated
instead of just the one.
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In order to get accurate statistics, we need to disable statistics for
blits, clears, and the surface state memcpy at the top of each secondary
command buffer. There are two possible approaches to this:
1) Disable before the blit/memcpy and re-enable afterwards
2) Move emitting 3DSTATE_VF_STATISTICS from initialization and make it
part of pipeline state and then just disabale statistics before
blits and memcpy operations.
Emitting 3DSTATE_VF_STATISTICS should be fairly cheap so it doesn't
really matter which path we take. We choose the second option as it's
more consistent with the way the rest of the statistics are enabled and
disabled.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It's in 3DSTATE_CLIP, so it doesn't really need the extra detail. This
matches what we do for VS, FS, etc.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The new version is a nice GPU parallel to cpu_write_query_result and it
nicely handles things like dealing with 32 vs. 64-bit offsets in the
destination buffer.
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Not all queries are the same. Even the two queries we support today
require a different amount of data per slot. Once we introduce pipeline
statistics queries, the size will vary wildly.
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We're about to make slots variable-length and always having the
available bits at the front makes certain operations substantially
easier once we do that.
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
From the Vulkan 1.0.39 Specification:
"If VK_QUERY_RESULT_64_BIT is not set and the result overflows a
32-bit value, the value may either wrap or saturate."
So we can either clamp or wrap. Wrapping is both easier and what the
user gets if they use vkCmdCopyQueryPoolResults and we should be
consistent. We could make vkCmdCopyQueryPoolResults clamp but it's
annoying and ends up burning extra batch for something the spec clearly
doesn't require.
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
threaded gallium can't use pipe_context's LLVM target machine, because
create_shader_selector can be called from a non-driver thread.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Now that there's a timebase_scale in gen_device_info which is
effectively the 'period' this switches anv_GetPhysicalDeviceProperties
to using this common device info to initialize the timestampPeriod
device limit.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Prior to Skylake the Gen HW timestamps were driven by a 12.5MHz clock
with the convenient property of being able to scale by an integer (80)
to nanosecond units.
For Skylake the frequency is 12MHz or a scale factor of 83.333333
This updates gen_device_info to track a floating point timebase_scale
factor and makes corresponding _queryobj.c changes to no longer assume a
scale factor of 80 works across all gens.
Although the gen6_ code could have been been left alone, the changes
keep the code more comparable, and it now shares a few utility functions
for scaling raw timestamps and calculating deltas. The utility for
calculating deltas takes into account 32 or 36bit overflow depending on
the current kernel version.
Note: this leaves the timestamp handling of ARB_query_buffer_object
untouched, which continues to use an incorrect scale of 80 on Skylake
for now. This is more awkward to solve since the scaling is currently
done using a very limited uint64 ALU available to the command parser
that doesn't support multiply or divide where it's already taking a
large number of instructions just to effectively multiple by 80.
This fixes piglit arb_timer_query-timestamp-get on Skylake
v2: (Ken) Update timebase_scale for platforms past Skylake/Broxton too.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Older versions of GCC don't like compound literals in static const
variable declarations because they don't think it's an actual constant
value.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This adds some missing return value checks for all uses of snprintf in
brw_performance_query.c. This also switches a use of strncpy + strncat
for snprintf for consistency and to avoid the chance of the strncpy
leaving an unterminated string in the dest buffer if the src is too
long.
This issue with strncpy was picked up by Coverity.
CID: 1402201
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Otherwise it'll be missing in the tarball and make distcheck will fail.
Fixes: 05dd4a1104 ("glapi: Generate GL API marshalling code from the XML.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Using $< in non-suffix make rules is a GNU extension. Explicitly use
the name of the python script to fix the build on OpenBSD.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabore.com>
The index passed to get_shared_memory_ptr is an attribute slot index,
i.e. the index of a vec4 within LDS. Therefore this must be scaled by
sizeof(vec4) to give the LDS byte offset.
Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: <mesa-stable@lists.freedesktop.org>
Avoid a buffer overflow in ac_nir_to_llvm.c's create_function when
using more than 4 descriptor sets. radv claims support for 8.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
According to dl_iterate_phdr man page first object visited is the
main program where dlpi_name is an empty string. This fixes segfault
on Android when using build-id as identifier.
Fixes: d4fa083e11 ("util: Add utility build-id code.")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Function droid_swap_buffers may get called without dri2_surf->buffer set,
in these cases we don't have a back buffer set either. Patch fixes segfault
seen with 3DMark that uses android.opengl.GLSurfaceView for rendering it's UI.
backtrace:
#00 pc 00013f88 /system/lib/egl/libGLES_mesa.so (droid_swap_buffers+104)
#01 pc 000117b2 /system/lib/egl/libGLES_mesa.so (dri2_swap_buffers+50)
#02 pc 000058b2 /system/lib/egl/libGLES_mesa.so (eglSwapBuffers+386)
#03 pc 00011329 /system/lib/libEGL.so (eglSwapBuffersWithDamageKHR+553)
#04 pc 000118e7 /system/lib/libEGL.so (eglSwapBuffers+55)
#05 pc 000754dc /system/lib/libandroid_runtime.so
v2: do like other backends, call get_back_bo (Emil Velikov)
Fixes: 2acc69d ("EGL/Android: Add EGL_EXT_buffer_age extension")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Apps can limit the size of the cache via VkAllocationCallbacks so we
can't be sure that both are always in the cache.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will allow us to use fallback in-memory and on-disk caches
should the app not provide a pipeline cache.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Otherwise we have a race condition between vbo calls in the
glthread and the _vbo_DestroyContext() call.
This fixes a bunch of piglit crashes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The Vulkan spec is fairly clear about when we should and should not
write query pool results. We're also supposed to return VK_NOT_READY if
VK_QUERY_RESULT_PARTIAL_BIT is not set and we come across any queries
which are not yet finished. This fixes rendering corruptions on The
Talos Principle where geometry flickers in and out due to bogus query
results being returned by the driver. These issues are most noticable
on Sky Lake GT4 2hen running on "ultra" settings.
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100182
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
There's not much point to having them or not having them but this
reduces some pointless diff from the version we can auto-generate
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The code for decoding structures and commands was almost identical.
The only differences are: we print dword headers for commands, and
we skip the first one (with the command opcode and lengths).
So, generalize decode_structure to add a starting DWord, and a flag
for printing the DWord headers, and reuse it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
handle_struct_decode() is just a wrapper around decode_structure()
with a NULL check. But the only caller already does that NULL check.
So, just use decode_structure() directly.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fix the build on OpenBSD by removing an uneeded include for asm/unistd.h.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
% pattern rules are a GNU extension. As there is only one file here
avoid patterns and globbing entirely to fix the build on non-GNU make.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
v2 [Emil Velikov: brw_oa.py dependency]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
In the odd case where a patch needs to be fixed, squash the appropriate
fix and document how. Add a note in the pre-release notes, such that
devs can quickly spot it.
v2: Grammar/typo fixes (Eric). Use upstream commit [SHA] as reference.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The only use of the header is to provide the _X_INLINE macro. We already
require (and provide where needed) 'inline', plus it's used in the file
already.
So replace the macro and drop the include. This fixes the build on
platforms which lack the header - from X-less Linuxes to Androids.
Fixes: 05dd4a1104 ("glapi: Generate GL API marshalling code from the XML.")
Reported-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100223
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
The registries were migrated to git and are now hosted on GitHub.
The old svn is now read-only, and will not be updated anymore.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Specifically, report 'out of memory' errors that might have happened while
emitting the pipeline's batch.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
These can fail to allocate device memory, however, the driver can recover
from this error by allocating a new binding table block and trying again.
v2:
- Instead of tracking the errors in these functions and making callers
reset the batch's status before attempting to allocate a new block
for the binding table, simply make callers responsible for setting
the error status if they fail to allocate memory during the second
attempt (Jason).
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Also, we had a couple of instances in flush_descriptor_sets() were
we were returning a VkResult directly upon error, but the return
value of this function is not a VkResult but a uint32_t dirty mask,
so simply return 0 in these cases which reduces the amount of
work the driver will do after the error has been raised.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Instead of asserting inside the function, and then use use that information
to return early from its callers upon failure.
v2:
- Make sure that clear_color_attachment() and
clear_depth_stencil_attachment() get the VkResult as well so they
avoid executing the batch if an error happened. (Topi)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Any errors that may have happened during the command buffer recording are
reported by vkEndCommandBuffer() and it is the application's reponsibility
to not submit broken commands to a queue.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
v2: Assert on secondary commands, applications should've called
vkEndCommandBuffer() and received an error for them before (Jason)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Growing the reloc list happens through calling anv_reloc_list_add() or
anv_reloc_list_append(). Make sure that we call these through helpers
that check the result and set the batch error status if needed.
v2:
- Handling the crashes is not good enough, we need to keep track of
the error, for that, keep track of the errors in the batch instead (Jason).
- Make reloc list growth go through helpers so we can have a central
place where we can do error tracking (Jason).
v3:
- Callers that need the offset returned by anv_reloc_list_add() can
compute it themselves since it is extracted from the inputs to the
function, so change the function to return a VkResult, make
anv_batch_emit_reloc() also return a VkResult and let their callers
do the error management (Topi)
v4:
- Let anv_batch_emit_reloc() return an uint64_t as it originally did,
there is no real benefit in having it return a VkResult.
- Do not add an is_aux parameter to add_surface_state_reloc(), instead
do error checking for aux in add_image_view_relocs() separately.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Most of the time we use macros that handle this situation transparently,
but there are some cases were we need to handle this explicitly.
This patch makes sure we don't crash, notice that error handling takes
place in the function that actually failed the allocation,
anv_batch_emit_dwords(), which will set the status field of the batch
so it can be used at a later moment to report the error to the user.
v2:
- Not crashing is not good enough, we need to keep track of the error
(Topi, Jason). Iago: now that we track errors in the batch, this
is being handled.
- Added guards in a few more places that needed it (Iago)
v3:
- Check result of anv_batch_emitn() for NULL before calling memset()
in emit_vertex_input() (Topi)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The anv_batch_set_error() helper will track the first error that happened
while recording a command buffer. The helper returns the currently tracked
error to help the job of internal functions that may generate errors that
need to be tracked and return a VkResult to the caller.
We will use the anv_batch_has_error() helper to guard parts of the driver
that are not safe to execute if an error has been generated while recording
a particular command buffer.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The vkCmd*() functions do not report errors, instead, any errors should be
reported by the time we call vkEndCommandBuffer(). This means that we
need to make the driver robust against incosistent and/or imcomplete
command buffer states through the command recording process, particularly,
avoid crashes due to access to memory that we failed to allocate previously.
The strategy used to do this is to track the first error ocurred while
recording a command buffer in the batch associated with it. We use the
batch to track this information because the command buffer may not be
visible to all parts of the driver that can produce errors we need to be
aware of (such as allocation failures during batch emissions).
Later patches will use this error information to guard parts of the driver
that may not be safe to execute.
v2: Move the field from the command buffer to the batch so we can track
errors from batch emissions (Jason)
v3: Registering errors in the command buffer's batch during
anv_create_cmd_buffer() is unnecessary, since the command buffer
is freed at the end of the function in that case (Topi)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This situation can happen if we failed to allocate memory for the shader.
v2:
- We shouldn't see NULL shaders in anv_shader_bin_ref so we should not check
for that (Jason). Make sure that callers don't attempt to call this
function with a NULL shader and assert that this never happens (Iago).
v3:
- All callers to anv_shader_bin_unref seem to check for NULL before calling,
so just assert that it is not NULL (Topi)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The function is defined right after the prototype declaration. Also, the
protoype for it is included in anv_genX.h which is included via anv_private.h.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
While a context only has a single glthread, the context itself can be
attached to several threads. Therefore the dispatch table must be
updated in all threads before the destruction of glthread. In others
words, glthread can only be destroyed safely when the context is deleted.
Fixes remaining crashes in the glx-multithread-makecurrent* tests.
V2: (Timothy Arceri) updated gl_API.dtd marshal_fail description.
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
We want to support glthread on GLES contexts with reasonable apps, and on
desktop for apps that use VBOs but haven't completely moved to core GL.
To do so, we have to deal with the "the user may or may not pass user
pointers to draw calls" problem.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
glBegin() swaps dispatch tables, and we don't have any code in place for
handling that in glthread (which also messes with dispatch tables), and I
don't particularly care to at this point.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
The threading for GL core is in place, but there are so few applications
actually using a core GL context that it would be nice to extend support
back. However, some of the features of compat GL (particularly user
vertex arrays) would be so expensive to track state for that we want to be
able to disable threading when we discover that the app is using them.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
This avoids an extra pointer dereference in the marshalling functions,
which, with the instruction count doing in the low 30s, could actually
matter for main-thread performance.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
These don't actually read data out of the pointers, they set the
pointers (or offsets in a VBO) to be used in a later draw call.
v2: Don't forget glVertexAttribIPointer, and don't bother with annotations
on aliases.
v3: Mark CompressedTexSubImage1D as sync also.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
v2: Rebase on the Begin/End changes, and just disable this feature on
non-GL-core.
v3: (Timothy Arceri) enable for non-GL-core contexts. Remove
unrelated safe_mul() hunk. while loop style fix.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
This patch splits the context's CurrentDispatch pointer into two
pointers, CurrentClientDispatch, and CurrentServerDispatch, so that
when doing multithread marshalling, we can distinguish between the
dispatch table that's being used by the client (to serialize GL calls
into the marshal buffer) and the dispatch table that's being used by
the server (to execute the GL calls).
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
v2: Keep an allocated buffer around instead of checking for one at the
start of every GL command. Inline the now-small space allocation
function.
v3: Remove duplicate !glthread->shutdown check, process remaining work
before shutdown.
v4: Fix leaks on destroy.
V5: (Timothy Arceri) fix order of source files in makefile
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
This is not yet used in the build, just generated.
v2: Add missing build dependencies.
v3: Avoid mixing declarations and code, remove logic for avoiding emitting
code that the compiler's optimizer can deal with anyway.
v4: (Timothy Arceri) move safe_mul() genereation here from a later patch.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Without doing some additional tracking, we won't know whether the data
will be immediate user data, or will be loaded from a PBO. The normal
teximage functions will be sync by default because they don't know up
front what the size of their image data is. But for compressed teximage,
we have the count information, so they would end up async by default.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Several API functions require special treatment in order to be marshalled
to a background thread. Others can't be safely executed in a background
thread and need to be executed synchronously (e.g. since they return data
through a pointer argument).
This annotation will be used when code generating thread marshalling code,
to ensure that each function is marshalled in the correct way.
Note that PixelMap functions are marked as synchronous for now since
their pointer may be relative to buffer on the GPU, so we'll need
special logic to marshal them properly.
v2: Move description of attribute types to a comment in the dtd file.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
This makes bin/gl-3.2-layered-rendering-gl-layer-render fail only with
2DMS_ARRAY, which is expected given the lackluster MSAA support. However
all the regular types pass.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Currently the GLSL-to-TGSI translation pass assumes it can use
floating point source modifiers on the UCMP instruction. See the bug
report linked below for an example where an unrelated change in the
GLSL built-in lowering code for atan2 (e9ffd12827)
caused the generation of floating-point ir_unop_neg instructions
followed by ir_triop_csel, which is translated into UCMP with a negate
modifier on back-ends with native integer support.
Allowing floating-point source modifiers on an integer instruction
seems like rather dubious design for a transport IR, since the same
semantics could be represented as a sequence of MOV+UCMP instructions
instead, but supposedly this matches the expectations of TGSI
back-ends other than tgsi_exec, and the expectations of the DX10 API.
I take no responsibility for future headaches caused by this
inconsistency.
Fixes a regression of piglit glsl-fs-tan-1 on softpipe introduced by
the above-mentioned glsl front-end commit. Even though the commit
that triggered the regression doesn't seem to have made it to any
stable branches yet, this might be worth back-porting since I don't
see any reason why the bug couldn't have been reproduced before that
point.
Suggested-by: Roland Scheidegger <sroland@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99817
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
cache_put() first creates a .tmp file and then tries to do eviction.
The recently added LRU eviction code selects non-empty directory with
the oldest access time, but that may easily be the one with just the
new .tmp file, especially on Linux where atime is updated lazily
(with "relatime" mount option, which is the default). So when cache is
small, if random doesn't hit another dir LRU keeps selecting the same
dir with just the .tmp and not deleting anything. To fix this (and the
tests), do eviction earlier.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
General protection and prevents us from smashing the stack
on the first clear state validation (a7b8d50bcb). Fixes crash
using icc.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This patch originally had i965 specific code and was named:
commit 61cd3c52b868cf8cb90b06e53a382a921eb42754
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Thu Oct 20 18:21:24 2016 -0700
gbm: Get modifiers from DRI
To accomplish this, two new query tokens are added to the extension:
__DRI_IMAGE_ATTRIB_MODIFIER_UPPER
__DRI_IMAGE_ATTRIB_MODIFIER_LOWER
The query extension only supported 32b queries, and modifiers are 64b,
so we needed two of them.
NOTE: The extension version is still set to 13, so none of this will
actually be called.
v2: Error handling of queryImage (Emil)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Nothing special here other than a brief introduction to modifier
selection. Originally this was part of another patch but was split out
from
gbm: Introduce modifiers into surface/bo creation by request of Emil.
Requested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The idea behind modifiers like this is that the user of GBM will have
some mechanism to query what properties the hardware supports for its BO
or surface. This information is directly passed in (and stored) so that
the DRI implementation can create an image with the appropriate
attributes.
A getter() will be added later so that the user GBM will be able to
query what modifier should be used.
Only in surface creation, the modifiers are stored until the BO is
actually allocated. In regular buffer allocation, the correct modifier
can (will be, in future patches be chosen at creation time.
v2: Make sure to check if count is non-zero in addition to testing if
calloc fails. (Daniel)
v3: Remove "usage" and "flags" from modifier creation. Requested by
Kristian.
v4: Take advantage of the "INVALID" modifier added by the GET_PLANE2
series.
v5: Don't bother with storing modifiers for gbm_bo_create because that's
a synchronous operation and we can actually select the correct modifier
at create time (done in a later patch) (Jason)
v6: Make modifier condition outside the check so that dri_use will work
properly (Jason)
Cc: Kristian Høgsberg <krh@bitplanet.net>
References (v4): https://lists.freedesktop.org/archives/intel-gfx/2017-January/116636.html
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
This is just a stub for now and will be filled in later.
This was split out of an earlier patch
Requested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Modifiers will be obtained or guessed by the client and passed in during
image creation/import. In guessing, a client might decide to simply pass
along all known modifiers
This requires bumping the DRIimage version.
As of this patch, the modifiers aren't plumbed all the way down, this
patch simply makes sure the interface level stuff is correct.
v2: Don't allow usage + modifiers
v3: Make NAND actually NAND. Bug introduced in v2. (Jason)
v4:
- s/obtains/obtained (Jason)
- Pull out i965 imlemnentation into a later patch (Emil)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
This massively decreases VGPR spilling for DiRT Showdown, because we
no longer have to use v4i32 for 2D fetches when level == 0.
We now use v2i32 for those cases.
DiRT Showdown - Spilled VGPRs: -26 (-81%)
This surprisingly doesn't have any useful effect on performance (+ 0.05%).
Initially this was a workaround for a bug introduced in LLVM 4.0
in the SimplifyCFG pass that caused image instrinsics to disappear
(because they were badly sunk). Finally, this is a win because it
decreases SGPR spilling and increases the number of waves a bit.
Although, shader-db results are good I think we might want to
remove it in the future once the issue is fixed. For now, enable
it for LLVM >= 4.0.
This also fixes a rendering issue with the speedometer in Dirt Rally.
More information can be found here https://reviews.llvm.org/D26348.
Thanks to Dave Airlie for the patch.
v2: - add a FIXME comment
- use if (HAVE_LLVM >= 0x0400) instead
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99484
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97988
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Will also help when the src sampler register will be
TGSI_FILE_CONSTANT for bindless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
On platforms that require it, we bump the requirement to 0.4 or later.
Due to an issue with the project [design] any version earlier than it,
is bound to cause issues. For the specifics see the pthread-stubs README
Cc: Uli Schlachter <psychon@znc.in>
Cc: Jonathan Gray <jsg@jsg.id.au>
Cc: Jean-Sébastien Pédron <dumbbell@FreeBSD.org>
Cc: François Tigeot <ftigeot@wolfpond.org>
Cc: Tobias Nygren <tnn@NetBSD.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
drmGetDevices2() provides us with enough flexibility to build heuristics
upon. Opening a random node on the other hand will wake up the device,
regardless if it's the one we're interested or not.
v2: Rebase, explicitly require/check for libdrm
v3: Return VK_ERROR_INCOMPATIBLE_DRIVER for no devices (Ilia)
v4: Rebase
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
drmGetDevices2() provides us with enough flexibility to build heuristics
upon. Opening a random node on the other hand will wake up the device,
regardless if it's the one we're interested or not.
v2: Rebase.
v3: Return VK_ERROR_INCOMPATIBLE_DRIVER for no devices (Ilia)
Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
By this allows us to fetch the device list/info w/o the revision field.
At the moment retrieving the latter wakes up the device.
Note: kernel patch to resolve that should be in 4.10.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Unused/unchecked by any of the callers.
v2: Fix the glsl cases that have crept in since v1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Rather than having an extra memory allocation [that we currently do not
and act accordingly] just make the API take an pointer to a stack
allocated instance.
This and follow-up steps will effectively make the _mesa_sha1_foo simple
define/inlines around their SHA1 counterparts.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Using typedef(s) is not always the answer and makes it harder for people
to do clever (or one might call nasty) things with the code.
Add a struct name which we will use with follow-up commit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
This refactors out the code and fixes it up to be used
for images later. It uses the code in the current RAT binding
for compute.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the code to create CB info out into
a separate function so it can be reused in images
code to create RATs.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This refactors out the code to setup a texture resource
so we can reuse it later from the images code.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This takes the code required to initialise a buffer resource
out of the texture buffer code, into it's own function.
This is going to be used for the image support later.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In order to make ARB_shader_image_load_store, we have to share
the CB space with RATs, so we should only steal the dual src
space if we have dual src enabled.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The GL driver had a driconf option (which doesn't make much sense) and
the Vulkan driver had a hand-rolled environment variable. Instead,
let's tie both into the INTEL_DEBUG mechanism and unify things.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This makes it so that you don't get an "Implement gen7 HiZ" perf warning
when you manually disable HiZ on gen8.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Select higher of current 1G default or 10% of filesystem where
cache is located.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Currently only a one in one out eviction so if at max_size and
cache files were to constantly increase in size then so would the
cache. Restrict to limit of 8 evictions per new cache entry.
V2: (Timothy Arceri) fix make check tests
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Still using fast random selection of two-character subdirectory in
which to check cache files rather than scanning entire cache.
v2: Factor out double strlen call
v3: C99 declaration of variables where used
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
If we fail to randomly select a two letter cache dir, don't select
an empty dir on fallback.
In real world use we should never hit the fallback path but it can
be hit by tests when the cache is set to a very small max value.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
This should help reduce any overhead added by the shader cache
when programs are not found in the cache.
To avoid creating any special function just for the sake of the
tests we add a one second delay whenever we call dick_cache_put()
to give it time to finish.
V2: poll for file when waiting for thread in test
V3: fix poll delay to really be 100ms, and simplify the wait function
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
V2: Make a copy of the data so we don't have to worry about it being
freed before we are done compressing/writing.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
LLVM 4.0 released with a pretty messy regression, that hopefully
get fixed in the future.
This work around was proposed by Tom, and it fixes the CTS regressions
here at least, I'm not sure if this will cause any major side effects,
but correctness over speed and all that.
radeonsi should possibly consider the same workaround until an llvm
fix can be found.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fix is extracted from amdgpu-pro shader traces.
It appears the gather4 workaround for integer types doesn't
work for cubes, so instead if forces a float scaled sample,
then converts to integer.
It modifies the descriptor before calling the gather.
This also produces some ugly asm code for reasons specified
in the patch, llvm could probably do better than dumping
sgprs to vgprs.
This fixes:
dEQP-VK.glsl.texture_gather.basic.cube.rgba8*
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I couldn't really find an encoding in the spec. I'm not sure it
prescribes VK_MAKE_VERSION format, but vulkan.gpuinfo.org interprets
it that way by default. vulkaninfo gives the raw number, so we could
alternatively do something like 17001000, but that doesn't show
up right on vulkan.gpuinfo.org again. Looking at that site, the -pro
driver also uses VK_MAKE_VERSION, so keeping consistency is probably
best.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
I've skimmed to changes from 1.0.5 to 1.0.42 and I think we have all
changes. We're still not conformant ofcourse, but this should not
regress stuff,
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Need to flush before updating the buffer to ensure that the copy is
ordered after previous accesses (assuming the app has performed the
appropriate barriers).
This fixes potential issues due to draws prior to an update reading
the new buffer content, despite having the necessary barriers between
them.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
BSD regex library doesn't support extended RE escapes (e.g. \+) and
shorthand character classes (e.g. \s, \S) and SVR4-style word
delimiters[1] (on DragonFly and NetBSD). Both GNU and BSD sed support
-E and -r to enable extended RE but OS X still lacks -r.
[1] https://www.illumos.org/issues/516
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com> (GNU sed)
Apart from avoiding some unneeded size cases, this shouldn't have any
actual functional impact.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The NIR story on conversion opcodes is a mess. We've had way too many
of them, naming is inconsistent, and which ones have explicit sizes was
sort-of random. This commit re-organizes things and makes them all
consistent:
- All non-bool conversion opcodes now have the explicit size in the
destination and are named <src_type>2<dst_type><size>.
- Integer <-> integer conversion opcodes now only come in i2i and u2u
forms (i2u and u2i have been removed) since the only difference
between the different integer conversions is whether or not they
sign-extend when up-converting.
- Boolean conversion opcodes all have the explicit size on the bool and
are named <src_type>2<dst_type>.
Making things consistent also allows nir_type_conversion_op to be moved
to nir_opcodes.c and auto-generated using mako. This will make adding
int8, int16, and float16 versions much easier when the time comes.
Reviewed-by: Eric Anholt <eric@anholt.net>
The original version was very convoluted and tried way too hard to not
just have the nested switch statement that it needs. Let's just write
the obvious code and then we know it's correct. This fixes a bunch of
missing cases particularly with int64.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
The original bit-size validation wasn't capable of properly dealing with
instructions with variable bit sizes. An attempt was made to handle it
by looking at source and destinations but, because the validation was
done in validate_alu_(src|dest), it didn't really have the needed
information. The new validation code is much more straightforward and
should be more correct.
Reviewed-by: Eric Anholt <eric@anholt.net>
We've always required bit sizes to match but the rules for number of
components have been a bit loose. You've never been allowed to source
from something with less components than you consume, but more has
always been fine. This changes the validator to require that they match
exactly. The fact that they don't always match has been a source of
confusion in NIR for quite some time and it's time we got rid of it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Using coord_components of the source texture is correct for everything
except cube maps where it's off by one.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Some SPIR-V texturing instructions pack more than the texture coordinate
into the coordinate source. We need to mask off the unused channels.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
In the near future we are going to require that the num_components in a
src dereference match the num_components of the SSA value being
dereferenced. To do that, we need copy_prop to not remove our MOVs from
a larger SSA value into an instruction that uses fewer channels.
Because we suddenly have to know how many components each source has,
this makes the pass a bit more complicated. Fortunately, copy
propagation is the only pass that cares about the number of components
are read by any given source so it's fairly contained.
Shader-db results on Sky Lake:
total instructions in shared programs: 13318947 -> 13320265 (0.01%)
instructions in affected programs: 260633 -> 261951 (0.51%)
helped: 324
HURT: 1027
Looking through the hurt programs, about a dozen are hurt by 3
instructions and the rest are all hurt by 2 instructions. From a
spot-check of the shaders, the story is always the same: They get a
vec4 from somewhere (frequently an input) and use the first two or three
components as a texture coordinate. Because of the vector component
mismatch, we have a mov or, more likely, a vecN sitting between the
texture instruction and the input. This means that the back-end inserts
a bunch of MOVs and split_virtual_grfs() goes to town. Because the
texture coordinate is also used by some other calculation, register
coalesce can't combine them back together and we end up with an extra 2
MOV instructions in our shader.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
For render passes with multiple subpasses on gen7, we only fast-clear at
the top but an input attachment use can cause us to do a resolve in the
middle of the render pass. Once we've done so, we are no longer have a
fast-cleared surface so we can just set aux_usage to NONE.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
fixes build error when brw_nir.h not found in the generated file
brw_nir_trig_workarounds.c.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This patch adds missing error-checking and fixes resource leak in
allocation failure path on anv_CreateDevice()
v2: Fixes from Jason Ekstrand's review
a) Add missing destructors for all of the state pools on allocation
failure path
b) Add missing destructor for batch bo pools on allocation failure path
v3: Fixes from Emil Velikov's review
Add missing destructor for queue and scratch_pool on allocation failure
path
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ported from radeonsi, pointed out by Tom.
"This prevents LLVM from using sext instructions for local memory
offsets and allows the backend to fold immediate offsets into the
instruction. This also prevents some incorrect code generation for
ptrtoint and inttoptr instructions."
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This must be set to ICD_LOADER_MAGIC by vkAllocateCommandBuffers, which
was being done when allocating a new buffer but not when reusing an
existing one in the cache. This would hit an assertion and crash in
debug builds of the Vulkan loader.
Fixes: 682248db45 ("radv: Cache command buffers in command pool.")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
No intended change in behavior. Just a refactor.
v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For
Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is a wrapper for a Vulkan output array. A Vulkan output array is
one that follows the convention of the parameters to
vkGetPhysicalDeviceQueueFamilyProperties().
v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For
Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes the following segmentation fault:
radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c
-> if (!bo->handle)
(gdb) bt
0 radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c
1 0x00007fffe73575de in radeon_cs_create_fence radeon_drm_cs.c
2 0x00007fffe7358c48 in radeon_drm_cs_flush radeon_drm_cs.c
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
wayland-drm-client-protocol.h is generated in builddir, so when
builddir != srcdir the header is not found, and compilation of
wsi_common_wayland.c will fail.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We have a performance problem with dynamic buffer descriptors. Because
we are currently implementing them by pushing an offset into the shader
and adding that offset onto the already existing offset for the UBO/SSBO
operation, all UBO/SSBO operations on dynamic descriptors are indirect.
The back-end compiler implements indirect pull constant loads using what
basically amounts to a texelFetch instruction. For pull constant loads
with constant offsets, however, we use an oword block read message which
goes through the constant cache and reads a whole cache line at a time.
Because of these two things, direct pull constant loads are much faster
than indirect pull constant loads. Because all loads from dynamically
bound buffers are indirect, the user takes a substantial performance
penalty when using this "performance" feature.
There are two potential solutions I have seen for this problem. The
alternate solution is to continue pushing offsets into the shader but
wire things up in the back-end compiler so that we use the oword block
read messages anyway. The only reason we can do this because we know a
priori that the dynamic offsets are uniform and 16-byte aligned.
Unfortunately, thanks to the 16-byte alignment requirement of the oword
messages, we can't do some general "if the indirect offset is uniform,
use an oword message" sort of thing.
This solution, however, is recommended for a few of reasons:
1. Surface states are relatively cheap. We've been using on-the-fly
surface state setup for some time in GL and it works well. Also,
dynamic offsets with on-the-fly surface state should still be
cheaper than allocating new descriptor sets every time you want to
change a buffer offset which is really the only requirement of the
dynamic offsets feature.
2. This requires substantially less compiler plumbing. Not only can we
delete the entire apply_dynamic_offsets pass but we can also avoid
having to add architecture for passing dynamic offsets to the back-
end compiler in such a way that it can continue using oword messages.
3. We get robust buffer access range-checking for free. Because the
offset and range are baked into the surface state, we no longer need
to pass ranges around and do bounds-checking in the shader.
4. Once we finally get UBO pushing implemented, it will be much easier
to handle pushing chunks of dynamic descriptors if the compiler
remains blissfully unaware of dynamic descriptors.
This commit improves performance of The Talos Principle on ULTRA
settings by around 50% and brings it nicely into line with OpenGL
performance.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
During initial CCS bring-up, I discovered that you have to do a full CS
stall prior to doing a CCS resolve as well as afterwards. It appears
that the same is needed for fast-clears as well. This fixes rendering
corruptions on The Talos Principle on Sky Lake GT4. The issue hasn't
been demonstrated on any other hardware however, given that this appears
to be a "too many things in the pipe" problem, having it be easier to
reproduce on a system with more EUs makes sense. The issues with
resolves is demonstrable on a GT3 or GT2 so this is probably also a
problem on all GTs.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
The number of dynamic descriptors is limited by both the number of
descriptors and the total number of dynamic things. Because there isn't
a single "maximum dynamic things" limit, we need to divide by two so
that they can create the maximum of both UBOs and SSBOs.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
Rely on nir for optimization, to reduce compile times. Very minimal impact
on shader-db:
total instructions in shared programs: 104170 -> 104199 (0.03%)
total dwords in shared programs: 209664 -> 209728 (0.03%)
total full registers used in shared programs: 7156 -> 7161 (0.07%)
total half registers used in shader programs: 109 -> 109 (0.00%)
total const registers used in shared programs: 24222 -> 24224 (0.01%)
half full const instr dwords
helped 12 107 103 112 98
hurt 11 104 105 115 102
But shader db runtime dropped from ~29.3s user to ~20.4s user.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This reduces the size of the aubinator binary from ~1.4Mb to ~700Kb.
With can now drop the checks on xxd in configure.
v2: Fix incorrect makefile dependency (Lionel)
v3: use $(PYTHON2) (Emil)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
compiler/brw_vec4_gs_visitor.cpp:744:39: error:
‘GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES’ was not declared in this scope
output_vertex_size_bytes <= GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES);
Fixes: d0d4a5f43b ("i965: split EU defines to brw_eu_defines.h")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The project is a thing only for BSD platforms. Or in other words - for
any other platforms building/installing pthread-stubs results only in a
pthread-stub.pc file.
And even where it provides a DSO, there's a fundamental design issue
with it - see the pthread-stubs mailing list for the specifics.
v2: Update comment above the switch statement (Jon Turney).
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Acked-by: Gary Wong <gtw@gnu.org>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Randy Fishel <randy.fishel@oracle.com>
Cc: Niveditha Rau <niveditha.rau@oracle.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
As of last few commits we have the two split, thus we no longer require
the i965 in order to have the ANV driver.
Even though ANV does not link against libdrm nor libdrm_intel, we still
require those as dependencies due to the headers they provide.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2 [Emil Velikov]
- Various fixes and initial stab at the Android build.
- Keep the generation rules/EXTRA_DIST outside the conditional
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At the moment all the tests but test_eu_compact are actual C++ gtests.
To simplify things, we can move the gtest.la to the common TEST_LIBS.
As we're here, we can rename change the test extension [to .cpp] to
avoid using the confusing dummy.cpp.
Add a nice comment in the makefile for posterity.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The test/binary was removed back in 2012. With that one gone, we can
drop the .gitignore file all together.
Cc: Eric Anholt <eric@anholt.net>
Fixes: c885039442 ("i965: Drop the missing symbols link test.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Mostly a dummy git mv with a couple of noticable parts:
- With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
- Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
- brw_util.[ch] is not really compiler specific, so it's moved to i965.
v2:
- move brw_eu_defines.h instead of brw_defines.h
- remove no-longer applicable includes
- add missing vulkan/ prefix in the Android build (thanks Tapani)
v3:
- don't list brw_defines.h in src/intel/Makefile.sources (Jason)
- rebase on top of the oa patches
[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Split out the EU defines from the 'generic' ones, as the former are more
compiler oriented.
With a later commit we'll move brw_eu_defines.h alongside the compiler
infra to src/intel/. Pulling all the defines in there seems overzealous.
Some defines are used by both i965 and the i965 compiler. Those are
moved to brw_eu_defines.h, and annotated accordingly. The i965 users
were updated to have the extre include to indicate that.
With future work we might provide a better, split but for now this seems
reasonable.
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Otherwise we'll get errors such as
error: conflicting types for ‘ffs’
error: conflicting types for ‘ffsll’
We might want to improve the heuristics and provide a definition only
when a native one is missing. We can address that at a later stage.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
File is using MI_LOAD_REGISTER_IMM, GEN7_CACHE_MODE_1 and others as
defined in the header.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The follow three groups are not used by neither the DRI module nor the
compiler.
BRW_POLYGON_*_FACING
BRW_POLYGON_FACING_*
BRW_STATELESS_BUFFER_*
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Neither of the changed files requires the brw_program.h include. Since
we're about to move them [to src/intel/compiler] with the next commit
there's no point in having the include.
Let alone the very confusing compiler include directive
[-I${top_srcdir}/src/mesa/drivers/dri/i965/] that one would have to use.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Function was made static and moved to another header with earlier
commit.
Fixes: 760c8a1d95 ("i965: Make mark_surface_used a static inline in brw_compiler.h")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously, we were depending on EGL for generating the headers and
providing the protocol symbols. However, since neither Vulkan driver
actually wants to link against EGL, this is kind of pointless. It also
creates a weird build dependency.
v2 [Jason]
- Add missing wsi/ prefix, MKDIR_GEN
v3 [Emil Velikov]
- include BUILT_SOURCES/generation rules outside of conditional
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Unused and we'll rework the way wayland-drm-client-protocol.h is
generated with later commit.
v2 [Emil]
- Also remove wayland-client.h
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In some cases, we can end up calling WAYLAND_SCANNER even when
there's no binary. Do follow the other's approach set by
AX_PROG_FLEX/BISON and set the variable to :
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Strictly speaking things work as-is, but let's move the file alongside
the artefacts it references. Analogous to all other places in mesa.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Advertise 10bpp support if the driver supports decoding to a P016 surface.
v2: Advertise 10bpp for the decoder as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Mark Thompson <sw@jkqxz.net>
We support P010 and P016 as targets for 10bpp video decoding.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
No hardware I know off can actually support P010 natively. But we can easily
support P016 and as long as nobody decodes anything into the lower 6bits it
doesn't make any difference to P010.
v2: allow P0160 for post processing as well
v3: fix post processing once more
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
This makes debugging of decoding problems quite a bit easier.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Just use whatever the state tracker allocated.
v2: fix msb mode
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
The firmware expects the value in pixel not bytes. Didn't made a difference
so far because we only used 8bpp surfaces.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Same layout as NV12, but 16bit per channel instead of 8.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Less IFETCH latency on misses. Shader code is write once read many,
so GTT doesn't make much sense anyway.
If it turns out to fragment the CPU visible VRAM too much, we can upload with SDMA.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This will help us move u_queue.c here eventually and also provide
string function wrappers for anyone wishing to port disk_cache.c
to windows.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is not used anywhere and Visual Studio looks to have
supported memmove() for a long time if not always.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Because we optimistically skip compiling shaders if we have seen them
before we may need to compile them later at link time if they haven't
yet been use in a specific combination to create a program.
Rather than always recompiling we take advantage of the
gl_compile_status enum introduced in the previous patch to only
compile when we have previously skipped compilation.
This helps with regressions in app start-up times on cold cache
runs, compared with no cache.
Deus Ex: Mankind Divided start-up times:
cache disabled: ~3m15s
cold cache master: ~4m23s
cold cache with this patch: ~3m33s
Acked-by: Marek Olšák <marek.olsak@amd.com>
This will allow us to tell if a shader really has been compiled or
if the shader cache has just seen it before.
Acked-by: Marek Olšák <marek.olsak@amd.com>
... so that we can avoid threading complications or unnecessary
compaction table initializations (which just consists of setting some
pointers based on devinfo->gen).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We don't use DRYRUN (and no others scripts have one) so just drop it.
This allows us to rework the loop to the more commonly used "git .... |
while read foo; do ... done"
That in itself gets rid of the only remaining bashism and we can toggle
the shebang to /bin/sh.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
All of those should be executed $PYTHON2/python2 [or equivalent] hence
why they are missing the execute bit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Seemingly there is nothing bash specific in these. The Debian
checkbashisms does not spot neither run in zsh.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The file is used to generate svgadump/svga_dump.c... in theory at least.
Atm. the file is checked in-tree but that is about to change later
commits.
As we get to that we'll use $PYTHON2 or equivalent as used throughout
the tree.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
All of the scripts are [must be] executed via $PYTHON2 [or equivalent]
hence why they are missing the execute bit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Nearly all the python scripts used in-tree are invoked via $PYTHON2 or
equivalent. As such having the execute bit not needed and generally
ill-advised.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This makes it easier/clearer as to:
- if the file should have the execute bit set (.py should not)
- do we need the shebang in the first place and if so what it should be
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Unlike stride, there was no previous offset getter, so it can be right
on the first try.
v2: Return EINVAL when plane is greater than total planes to make it
match the similar APIs.
Avoid leak after fromPlanar (Daniel)
Make sure when getting offsets we consider dumb images (Daniel)
v3: Use Jason's recommendation for handling the non-planar case.
v4: Return int64_t so we can get real errors
v5: Add an assertion for dumb BOs (Jason)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
v2: Preserve legacy behavior when plane is 0 (Jason Ekstrand)
EINVAL when input plane is greater than total planes (Jason Ekstrand)
Don't leak the image after fromPlanar (Daniel)
Move bo->image check below plane count preventing bad index succeeding (Daniel)
v3: Fix DRIimage leak (using Jason's recommended change)
Make plane 0 return planar stride. This might break legacy behavior (Jason)
v4: Move bogus hunk for get_handle_for_plane to the right patch (Jason)
Fix error handling path to be cleaner (Jason)
v5: Add assert for dumb BOs to make sure plane == 0 (Jason)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
v2: Make the error return be -1 instead of 0 because I think 0 is
actually valid.
v3: Set errno to EINVAL when the specified plane is above the total
planes. (Jason Ekstrand)
Return the bo's handle if there is no image ie. for dumb images like cursor (Daniel)
v4:
- Add assertions about plane == 0 (Jason)
- Add a comment about new restriction on planar dumb bo which is not an
earlier patch in the series.
- Correctly refactor from v2 in this patch; it ended up rebased into the
wrong patch.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
This will be used by clients that need to know the number of planes
allocated for them on behalf of the GL or other API. The best current
example of this is when an extra "plane" is allocated to store
compression data for the primary plane.
v2: Return 1 for cases where there is no image, ie. dumb bo (Daniel)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
As more GBM functionality support planes is being evaluated, it becomes
clear that a dumb bo can never actually be planar. It's questionable
whether it was ever feasible to do this, and later functionality will
implicitly assume a dumb BO is non-planar.
v2: Include stdbool.h
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Daniel Stone <daniels@collabora.com>
This adds support for exposing basic Observation Architecture
performance counters on Haswell.
This support is based on the i915 perf kernel interface which is used
to configure the OA unit, allowing Mesa to emit MI_REPORT_PERF_COUNT
commands around queries to collect counter snapshots.
To take into account the small chance that some of the 32bit counters
could wrap around for long queries (~50 milliseconds for a GT3 Haswell @
1.1GHz) the implementation also collects periodic metrics.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Avoiding lots of error prone boilerplate and easing our ability to add +
maintain support for multiple OA performance counter queries for each
generation:
This adds a python script to generate code for building up
performance_queries from the metric sets and counters described in
brw_oa_hsw.xml as well as functions to normalize each counter based on
the RPN expressions given.
Although the XML file currently only includes a single metric set, the
code generated assumes there could be many sets.
The metrics as described in XML get translated into C structures
which are registered in a brw->perfquery.oa_metrics_table hash table
keyed by the GUID of the metric set in XML.
v2: numerous python style improvements (Dylan)
v3: Makefile.am fixups (Emil)
v4: Pattern rule for codegen + orthogonal .c and .h rules (Robert)
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
In preparation for generating code from brw_oa_hsw.xml for describing OA
performance counter queries this adds some OA specific members to
brw_perf_query that our generated code will initialize:
- The oa_metric_set_id is the ID we will pass to
DRM_IOCTL_I915_PERF_OPEN, and is an ID got via sysfs under:
/sys/class/drm/<card>/metrics/<guid/id
- The oa_format is the OA report layout we will request from the kernel
- The accumulator offsets determine where the different groups of A, B
and C counters are located within an intermediate 64bit 'accumulator'
buffer.
Additionally brw_perf_query_counter now has 64bit or float _read()
callback members for OA counters.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
In preparation for generating code from the XML performance counter meta
data, this makes some additions to brw_context.h for this code to be
able to reference.
It adds a brw->perfquery.oa_metrics_table hash table for indexing built
up query descriptions by the GUID that is expected to be advertised by
the kernel (via sysfs) to be able to use that query.
It adds an 'OA_COUNTERS' brw_query_kind to be assigned to queries built
up by generated code.
It adds a brw->perfquery.sys_vars structure to have a consistent place
to represent the different system variables like $EuCoresTotalCount and
$EuSlicesTotalCount that are referenced by OA counter normalization
equations.
Although extending + referencing gen_device_info for these variables
was considered, these are some of the (mostly minor) reasons for
going with a dedicated structure:
- Currently we only need this info for the performance_query backend
and it might be a bit tedious to go back and initialize the state
for pre-Haswell devinfo structures.
- Considering the $SubsliceMask then the requirement for how multiple
per-slice masks are packed only comes from how the variables are
references by availability tests in XML, and might not be a good
general representation for tracking subslice masks if another use
case arises.
- If we used gen_device_info then we'd likely want to avoid making
assumptions about the C types during codegen and adding explicit
casts, while that's not necessary with a dedicated struct with all
members being uint64_t.
- This structure and the code for initializing it is currently shared
(just through copy & paste) with a few other projects dealing with
OA counters, and that's been convenient so far.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
In preparation for exposing Gen Observation Architecture performance
counters via INTEL_performance_query this adds an XML description for an
initial 'Render Metrics Basic Gen7.5' query and corresponding counters.
The intention is to auto generate code for building a query from these
counters as well as the code for normalizing the individual counters.
Note that the upstream for this XML data is currently GPU Top:
https://github.com/rib/gputop
The files are maintained under gputop-data/ and they are themselves
derived from files in an internal 'MDAPI XML' schema. There are scripts
under gputop-scripts/ and make rules in gputop-data/Makefile.xml for
maintaining these files.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Function arguments do not have an "origin" instruction, causing a
NULL-pointer dereference without this check.
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
There is no need to check sampler == 0 twice. This removes now
unused _mesa_lookup_samplerobj_locked().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Since blob is intended for serializing data, it's not a good idea to
leave padding holes with uninitialized data, which may leak heap
contents and hurt compression if the blob is later compressed, like
done by shader cache. Clear it.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Negating size_t on 32bit produces a 32bit result. This was effectively
adding values close to UINT_MAX to the cache size (the files are usually
small) instead of intended subtraction.
Fixes 'make check' disk_cache failures on 32bit.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Builtins are created once and allocated using their own private ralloc
context. When reparenting IR that includes builtins, we might be steal
bits of builtins. This is problematic because these builtins might now
be freed when the shader that includes then last is disposed. This
might also lead to inconsistent ralloc trees/lists if shaders are
created on multiple threads.
Rather than including builtins directly into a shader's IR, we should
include clones of them in the ralloc context of the shader that
requires them. This fixes double free issues we've been seeing when
running shader-db on a big multicore (72 threads) server.
v2: Also rename _mesa_glsl_find_builtin_function_by_name() to better
reflect how this function is used. (Ken)
v3: Rename ctx to mem_ctx (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was a hook I came up when trying to do the initial performance
counter work years ago. Nothing's used it for a long time, and the
upcoming performance counter support doesn't want it either.
So, goodbye render ring prelude.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The only way we write CMASK/DCC compressed textures through shaders
is fast clears and CMASK/DCC inits, which have their own flushes.
Hence the CB cache is always up to date.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
I think we should only flush right before an action (draw/dispatch etc.),
as otherwise it is too easy to issue redundant flushes.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Without stores, the only writes are fast clears, transfers and metadata
initialization, each of which have the appropiate invalidations already.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The data should always be in memory after a src flush.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Jason has patches to add validation to this area, this should fix
radv shaders.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This extension was enabled in commit 40dd45d0c6 ("i965: Enable
ARB_shader_atomic_counter_ops") but the commit failed to update the
release notes or features.txt. The release notes ship has sailed, since
the commit was in 13.0.
Math results land in r4, regardless of the condition. To implement them,
we just need to ensure that the results are moved out of r4 (as often
happens anyway, the values is live across another math instruction), so
that we can attach the condition to the MOV.
Fixes dEQP-GLES2.functional.shaders.random.all_features.fragment.93 and a
couple others, that were assertion failing that their conditions hadn't
been handled during the QIR->QPU stage.
This ended up confusing the scheduler for things like fabs (implemented as
fmaxabs x, x) or squaring a number, and it would try to avoid scheduling
them because it appeared more expensive than other instructions.
Fixes failure to register allocate in
dEQP-GLES2.functional.uniform_api.random.3 with almost no shader-db
effects (+.35% max temps)
Currently when running mesa on imx6 the following loader warnings
are seen:
# kmscube -D /dev/dri/card1
MESA-LOADER: device is not located on the PCI bus
MESA-LOADER: device is not located on the PCI bus
MESA-LOADER: device is not located on the PCI bus
Using display 0x1920948 with EGL version 1.4
As this is not an error message, change it to debug level in
order to have a cleaner log output.
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Adding libmesa_amd_common dependency and exporting its headers,
avoids the following building error:
external/mesa/src/gallium/drivers/r600/evergreen_compute.c:29:10: fatal error: 'ac_binary.h' file not found
^
1 error generated.
Fixes: 3bbbb63 "automake: r600: radeonsi: correctly manage libamd_common.la linking"
Fixes: 503fb13 "radeon/ac: switch to ac_shader_binary_config_start()"
v2 [Emil Velikov: drop unneeded LOCAL_EXPORT_C_INCLUDE_DIRS]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Earlier commit added extra tracking and we've attempted to remove the
vdpau/other folder if empty. V2 of said commit dropped the pipe
to /dev/null and the explicit "true" override.
Sadly both of those are needed since there's no guarantee that the
folder will be empty before we [mesa] make install.
Since we're bringing those two back, there's no need to track if we've
installed anything, and simply do "rm -d foo/ &>/dev/null || true"
Tested-by: Andy Furniss <adf.lists@gmail.com>
Reported-by: Andy Furniss <adf.lists@gmail.com>
Fixes: 1cd4fde053 ("gallium/targets: don't leave an empty target directory(ies)")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
This is what comment above definition says and change fixes issue with
32bit build where BLOCK_POOL_MEMFD_SIZE is used as ftruncate parameter
and constant currently gets converted from 4294967296 to 0.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With mesa/drm commit cd2f91e18db087edf93fed828e568ee53b887860
Author: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Date: Fri Jul 31 10:47:50 2015 -0700
intel: Drop aub dumping functionality
the drm_intel_aub routines are mere stubs and do nothing. Likewise
remove our invocations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should let Dota 2 run on debug builds though it will spew errors
like mad. Hopefully, Valve will get this fixed sooner rather than
later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Over the course of driver development, we've come up with a number of
different schemes for adding giant blocks of asserts inside the driver.
This one is only being used once in anv_pipeline.c and the way it's
being used actually generates compiler warnings in release builds. This
commit drops the anv_validate macro and just puts the contents of the
one validation function in side of a "#ifdef DEBUG" guard.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Except for a few unimplemented things on gen7, we don't really have
stubs anymore so we should drop this. This commit replaces the few gen7
stub() calls with explicitly labeled finishme's and makes the sparse
binding stuff silently no-op or return a FEATURE_NOT_PRESENT error.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This acts identically to anv_finishme except that it only dumps out
these nice log messages if you run with INTEL_DEBUG=perf.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes a GCC warning when compiling with -Wextra:
radv_device.c:463:47: warning: initialized field overwritten [-Woverride-init]
Signed-off-by: Damien Grassart <damien@grassart.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The dynamic_offset_offset in the descriptor set binding layout is
relative to the dynamic_offset_start for the set in the pipeline
layout.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
A buffer descriptor is 16 bytes, not 16 dwords.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I was already tired of seeing the message
Package libomxil-bellagio was not found in the pkg-config search path.
Perhaps you should add the directory containing `libomxil-bellagio.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libomxil-bellagio' found
on every configure, but I just got a distro bug reported where the user
was confused by this message and thought it indicated a bug.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Available since pkg-config-0.28 and pkgconf-0.8.10.
The removal of the AC_PATH_PROG is intentional. Use pkg-config.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
It printed the version of LLVM ($1):
configure: error: 3.6.0 requires libelf when using llvm
instead of the driver name ($2):
configure: error: r600 requires libelf when using llvm
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
Still not sure we can support miptrees when sampling from
HTILE enabled textures.
Added the tcCompatible winsys stuff while I'm at it.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This fixes:
dEQP-VK.pipeline.render_to_image.3d.huge.depth.r8g8b8a8_unorm
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Don't fast clear inside the meta loop as things get
confused, fixes a crash in:
dEQP-VK.api.copy_and_blit.resolve_image.whole_array_image.2_bit
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Ported from anv:
3d33a23e anv: Properly handle destroying NULL devices and instances
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This uses these in a few places, and fixes one or two
cases which were using da as 32-bit instead of bool.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This was made unnecessary with fd33a6bcd7.
This was mostly done with:
find ./src -type f -exec sed -i -- \
's:PIPE_THREAD_ROUTINE(\([^,]*\), \([^)]*\)):int\n\1(void \*\2):g' {} \;
With some small manual tidy ups.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
pipe_mutex_unlock() was made unnecessary with fd33a6bcd7.
Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_unlock(\([^)]*\)):mtx_unlock(\&\1):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
replace pipe_mutex_lock() was made unnecessary with fd33a6bcd7.
Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_lock(\([^)]*\)):mtx_lock(\&\1):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
pipe_mutex_destroy() was made unnecessary with fd33a6bcd7.
Replace was done with:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_destroy(\([^)]*\)):mtx_destroy(\&\1):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
pipe_mutex_init() was made unnecessary with fd33a6bcd7.
Replace was done using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_init(\([^)]*\)):(void) mtx_init(\&\1, mtx_plain):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We never actually used the resource streamer in any shipping build
of Mesa. We have no plans to do so in the future. We looked into
using it in Vulkan, and concluded that it was unusable. We're not
the only ones to arrive at the conclusion that it's not worth using.
So, drop the last vestiges of resource streamer support and move on.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Experimentation shows that without alignment factor gcc and clang choose
a factor of 16 even on IA-32, which doesn't match what malloc() uses (8).
The problem is it makes gcc assume the pointer is 16 byte aligned, so
with -O3 it starts using aligned SSE instructions that later fault,
so always specify a suitable alignment factor.
Cc: Jonas Pfeil <pfeiljonas@gmx.de>
Fixes: cd2b55e5 "ralloc: Make sure ralloc() allocations match malloc()'s alignment."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100049
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Tested by: Mike Lothian <mike@fireburn.co.uk>
Tested by: Jonas Pfeil <pfeiljonas@gmx.de>
There are still some distributions trying to support unfortunate people
with old or exotic CPUs that don't have 64bit atomic operations. The
only thing preventing compile of the Intel driver for them seems to be
initialization of a debug variable.
v2: use call_once() instead of unsafe code, as suggested by Matt Turner
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93089
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
If we have any pending flushes on the primary command buffer, these
must be performed before executing the secondary buffer.
This fixes potential corruption when the contents of a subpass which
clears any of its render targets are given in a secondary buffer: the
flushes after a fast clear would not have been performed until the
vkCmdEndRenderPass call.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
_mesa_lookup_samplerobj() returns NULL if sampler is 0.
v2: use _mesa_lookup...(...) != NULL
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes the following assertion when the key is 0.
main/hash.c:181: _mesa_HashLookup_unlocked: Assertion `key' failed.
Fixes: 633c959fae ("getteximage: Return correct error value when texure object is not found")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Call site attributes are used since LLVM 4.0.
This also reverts commit b19caecbd6
"radeon/ac: fix intrinsic version check", because this is the correct fix.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Since all output buffers are whole frames, this should always be set.
Technically, setting this flag is is optional (see OpenMAX IL section
3.1.2.7.1), but some clients assume that it will be used and
therefore buffer indefinitely thinking that all output buffers are
fragments of the first frame when it is not set.
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
From OpenMAX IL section 4.3.5:
"The value of nIndex is the range 0 to N-1, where N is the number of
formats supported by the port. There is no need for the port to
report N, as the caller can determine N by enumerating all the
formats supported by the port. Each port shall support at least one
format. If there are no more formats, OMX_GetParameter returns
OMX_ErrorNoMore (i.e., nIndex is supplied where the value is N or
greater)."
Only one format is supported, so N = 1 and OMX_ErrorNoMore should be
returned if nIndex >= 1. The previous code here would return the
same format for all values of nIndex, resulting in an infinite loop
when a client attempts to enumerate all formats.
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
The VAAPI documentation is not very clear here, but the intent
appears to be that a forward reference is forward from a frame in the
past, not forward to a frame in the future (that is, forward as in
forward prediction, not as in a forward reference in source code).
This interpretation is derived from other implementations, in
particular the i965 driver and the gstreamer client.
In order to match those other implementations, this patch swaps the
meaning of forward and backward references as they currently appear
for motion-adaptive deinterlacing.
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested with ffmpeg and gst-vaapi. Without this bits per
frame is set way too low for fractional framerates.
v2: Mark Thompson: simplify calculation.
Use float.
Signed-off-by: Andy Furniss <adf.lists@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Mainly to avoid gcc's complains about uninitialized ptr and offset use
later in that code.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Use the same helpers as for other handle<->pointer conversions.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
So that we don't keep allocating BOs for the IBs and upload buffers.
We run some risk of memory increase with e.g. a bimodal size
distribution of command buffers, but I haven't noticed a significant
increase with dota2 and talos.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This reverts commit 0f60c6616e.
Piglit and all games tested so far seem to be working without
issue. This change will allow wide user testing and we can decided
before the next release if we need to turn it off again.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously we were deleting the entire cache if a user switched
between 32 and 64 bit applications.
V2: make the check more generic, it should now work with any
platform we are likely to support.
V3: Use suggestion from Emil to make even more generic/fix issue
with __ILP32__ not being declared on gcc for regular 32-bit builds.
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Don't flush multiple times if we clear multiple attachments. Also allows
doing the depth clear in parallel with the fast color clears.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
GS implementation uses the masked.{gather,store} intrinsics,
introduced in llvm-3.9.0. swr llvm version requirement in
automake and scons now match (scons already needed >= 3.9).
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
The OpenGL 4.5 specification's description of TexBuffer says:
"The number of texels in the texture image is then clamped to an
implementation-dependent limit, the value of MAX_TEXTURE_BUFFER_SIZE."
We set GL_MAX_TEXTURE_BUFFER_SIZE to 2^27. For buffers with a byte
element size, this is the maximum possible size we can encode in
SURFACE_STATE. If you bind a buffer object larger than this as a
texture buffer object, we'll exceed that limit and hit an isl assert:
assert(num_elements <= (1ull << 27));
To fix this, clamp the size in bytes to MaxTextureSize / texel_size.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Earlier commit was picked from a larger series, but did not consider
that it removed the vulkan <> wayland-drm interdependency.
Rather than reverting everything, temporarily move wayland-drm further
up to resolve the issue. Since it [wayland-drm] does not have any
in-mesa dependencies that's perfectly safe.
Cc: Vedran Miletić <vedran@miletic.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100060
Fixes: e135ce6f08 ("vulkan: Build common Vulkan code earlier")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Javier Jardón <jjardon@gnome.org>
Fixes a series of libz related building errors:
target SharedLib: gallium_dri_32
(out/target/prod...SHARED_LIBRARIES/gallium_dri_intermediates/LINKED/gallium_dri.so)
external/elfutils/libelf/elf_compress.c:117: error: undefined reference to 'deflateInit_'
...
external/elfutils/libelf/elf_compress.c:244: error: undefined reference to 'inflateEnd'
clang++: error: linker command failed with exit code 1 (use -v to see
invocation)
Fixes: 85a9b1b "util/disk_cache: compress individual cache entries"
See detailed explanation of why this is needed in commit eb60a89bc3.
This spot was missed/overlooked. Basically as a result of the fact
that BEGIN_* ends up calling PUSH_SPACE, which in turn adds an extra 8
to the requested amount, we have to be mindful of that when doing bare
nouveau_pushbuf_space calls.
Reportedly this fixes some crashes when replaying a hitman trace taken
on radeonsi.
Fixes: eb60a89bc3 ("nouveau: take extra push space into account for pushbuf_space calls")
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reported-by: Karol Herbst <nouveau@karolherbst.de>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
When binding as textures, the alignment can be 16. However when binding
as an image, the address has to be aligned to 256. (Also when binding as
an RT, but that can't happen with GL or current gallium APIs.)
Reported-by: Roy Spliet <nouveau@spliet.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
when files are being generated the value of $intermediates var content can be
completely random, this makes sure that outdir is the wanted one.
Fixes: 3f2cb699 ("android: vulkan: add support for libmesa_vulkan_util")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Some drivers do not support certain targets - for example nouveau
doesn't do VAAPI, while freedreno doesn't do of the video backends.
As such if we enter vdpau when building freedreno/ilo/etc, a vdpau/
folder will be created, empty library will be build and almost
immediately removed. Thus keeping an empty vdpau/ folder around.
There are two ways to fix this.
* add substantial tracking in configure/makefiles so that we never end
up in targets/vdpau
Downsides:
Error prone, as the configure checks and the 'include
gallium/drivers/foo/Automake.inc' can easily get out of sync.
* remove the folder, if empty, alongside the empty library.
Downsides:
In the latter case vdpau/ might be empty before the mesa build has
started, yet we'll remove it either way.
This patch implements the latter option, as the downside isn't that
significant, plus the patch is way shorter ;-)
v2: use has_drivers to track since TARGET_DRIVERS can contain space,
hence neither string comparison nor -n/-z works correctly.
Gentoo Bugzilla: https://bugs.gentoo.org/545230
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The previous implementation was fine for GLSL which doesn't really have
a signed modulus/remainder. They just leave the behavior undefined
whenever either source is negative. However, in SPIR-V, there is a
defined behavior for negative arguments. This commit beefs up the pass
so that it handles both correctly. Tested using a hacked up version of
the Vulkan CTS test to get 64-bit support.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is a work in progress - some things may still need fixing.
But it should be in pretty decent shape.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Equivalent *TexSubImage* methods generates INVALID_ENUM.
From OpenGL 4.5 spec, section 8.6 Alternate Texture Image
Specification Commands:
"An INVALID_ENUM error is generated by *TexSubImage* if target does
not match the command, as shown in table 8.15."
And:
"An INVALID_OPERATION error is generated by *TextureSubImage* if
the effective target of texture does not match the command, as
shown in table 8.15."
Fixes:
GL45-CTS.direct_state_access.textures_copy_errors
v2: slightly change commit summary (Samuel)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Valgrind reports that the shader cache writes uninitialized data to disk.
Turns out ureg_get_tokens() is returning the count of allocated tokens
instead of how many are actually used, so the cache writes out unused
space at the end. Use the real count instead.
This change should not cause regressions elsewhere because the only
ureg_get_tokens() user that cares about token count is the shader cache.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
V2:
- when loading from disk cache also binary insert into memory cache.
- check that the binary loaded from disk is the correct size. If not
delete the cache item and skip loading from cache.
V3:
- remove unrequired variable
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This reduces the cache size for Deus Ex from ~160M to ~30M for
radeonsi (these numbers differ from Grigori's results below
probably due to different graphics quality settings).
I'm also seeing the following improvements in minimum fps in the
Shadow of Mordor benchmark on an i5-6400 CPU@2.70GHz, with a HDD:
no-cache: ~10fps
with-cache-no-compression: ~15fps
with-cache-and-compression: ~20fps
Note: The with cache results are from the second run after closing
and opening the game to avoid the in-memory cache.
Since we mainly care about decompression I went with
Z_BEST_COMPRESSION as suggested on irc by Steinar H. Gunderson
who has benchmarked decompression speeds.
Grigori Goronzy provided the following stats for Deus Ex: Mankind
Divided start-up times on a Athlon X4 860k with a SSD:
No Cache 215 sec
Cold Cache zlib BEST_COMPRESSION 285 sec
Warm Cache zlib BEST_COMPRESSION 33 sec
Cold Cache zlib BEST_SPEED 264 sec
Warm Cache zlib BEST_SPEED 33 sec
Cold Cache no compression 266 sec
Warm Cache no compression 34 sec
The total cache size for that game is 48 MiB with BEST_COMPRESSION,
56 MiB with BEST_SPEED and 170 MiB with no compression.
These numbers suggest that it may be ok to go with Z_BEST_SPEED
but we should gather some actual decompression times before doing
so. Other options might be to do the compression in a separate
thread, this might allow us to use a higher compression algorithim
such as LZMA.
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Previously, when q.subroutine was set to 1, a new subroutine
declaration was added to the AST, while 0 meant a subroutine
definition has been detected by the parser.
Thus, setting the q.subroutine flag in both situations is
obviously wrong because a new type identifier is added instead
of trying to match the declaration. To fix it up, introduce
ast_type_qualifier::is_subroutine_decl() to differentiate
declarations and definitions easily.
This fixes a regression with:
arb_shader_subroutine/compiler/direct-call.vert
Cc: Mark Janes <mark.a.janes@intel.com>
Fixes: be8aa76afd ("glsl: remove unecessary flags.q.subroutine_def")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100026
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Depending on the generated Makefile means that all generated sources are
recreated after ./configure.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
While an input attachment may only take on one of those two layouts,
other depth/stencil attachments that use the same image may have
HiZ-enabled layouts. Improves the average frame rate on a release
candidate of a proprietary Vulkan benchmark by 9.94% over 3 runs on my
SKL GT4.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We'll loop through this array when performing automatic layout
transitions.
v2: Adjust formatting of an assignment (Jason Ekstrand)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We will be using the image layout. Store the full struct directly from
the user.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Due to recent commits, the sampler now bypasses the auxiliary HiZ buffer
when reading from a depth image subresource that is in the general
layout. Remove this unneeded resolve.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will be used to sample a depth input attachment without having to
pass through the HiZ buffer.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
surf_usage is only useful to image views that may use HiZ buffers.
Storage image views don't use HiZ buffers.
v2: Update commit message and add an assertion.
Fixes: 055ff2ec52 ("anv: Replace anv_image_has_hiz() with ISL_AUX_USAGE_HIZ")
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Validate the inputs, verify that this image has a depth
buffer, use gen_device_info instead of
v2:
- Add parenthesis (Jason Ekstrand)
- Make parameters const
- Use gen_device_info instead of gen
- Pass aspect to missed function in transition_depth_buffer
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This function supersedes layout_to_hiz_usage().
v2:
- Don't find the optimal buffer for layout transitions (Jason Ekstrand).
- Pass the devinfo instead of the gen (Jason Ekstrand)
- Update the function documentation.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The header of ralloc needs to be aligned, because the compiler assumes
that malloc returns will be aligned to 8/16 bytes depending on the
platform, leading to degraded performance or alignment faults with ralloc.
Fixes SIGBUS on Raspberry Pi at high optimization levels.
This patch is not perfect for MSVC, as maybe in the future the alignment
for the most demanding data type might change to more than 8.
v2: Commit message reword/typo fix, and add a bigger explanation in the
code (by anholt)
Signed-off-by: Jonas Pfeil <pfeiljonas@gmx.de>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Recent change to st/mesa state update logic caused major regressions to
swr validation code.
swr uses the same validation logic (swr_update_derived) for both draw
and Clear calls. New st/mesa state update logic results in certain state
objects not being set/bound during Clear. This was causing null ptr
exceptions. Creation of static dummy state objects allows setting these
pointers during Clear validation, without interfering with relevant state
validation.
Once fixed, new logic also highlighted an error in dirty bit checking for
fragment shader and clip validation.
(The alternative is to have a simplified validation routine for Clear.
Which may do that at some point.)
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
During the first update of the hw_clear_state atoms, we may not yet
have a current rasterizer state object. So, svga->curr.rast may be
NULL and we crash.
Add a few null pointer checks to work around this. Note that these
are only needed in the state update functions which are called for
'clear' validation.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This allows us to allocate surface states from the command buffer when
pushing descriptor sets rather than allocating them through a
descriptor set pool.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In validate_DrawElements_common() we need to check for OES_geometry_shader
extension to determine if we should fail if transform feedback is
unpaused. However current code reads ctx->Extensions.OES_geometry_shader
directly, which does not take context version into account. This means
that if the context is GLES 3.0, which makes the OES_geometry_shader
inapplicable, we would not validate the draw properly. To fix it, let's
replace the check with a call to _mesa_has_OES_geometry_shader().
Fixes following dEQP tests on i965 with a GLES 3.0 context:
dEQP-GLES3.functional.negative_api.vertex_array#draw_elements
dEQP-GLES3.functional.negative_api.vertex_array#draw_elements_incomplete_primitive
dEQP-GLES3.functional.negative_api.vertex_array#draw_elements_instanced
dEQP-GLES3.functional.negative_api.vertex_array#draw_elements_instanced_incomplete_primitive
dEQP-GLES3.functional.negative_api.vertex_array#draw_range_elements
dEQP-GLES3.functional.negative_api.vertex_array#draw_range_elements_incomplete_primitive
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
One less set of enums. Dropped the #defines from brw_defines.h and ran:
$ for file in *.cpp *.c *.h; do sed -i \
-e 's/BRW_SURFACEFORMAT_/ISL_FORMAT_/g' \
-e 's/ISL_FORMAT_ASTC_[A-Zxs0-9_]*/\U&/g' $file; \
done
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
If we don't have pipelined register access (e.g. Haswell before kernel
v4.2), then we can only implement EXT_transform_feedback by reseting the
SO offsets *between* batches. However, if we do have pipelined access to
the SO registers on gen7, we can simply emit an inline reset of the SO
registers without a full batch flush.
v2 [by Ken]: Simplify after recent kernel feature detection changes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
According to the PRM description of the Depth field:
"This field specifies the total number of levels for a volume texture
or the number of array elements allowed to be accessed starting at the
Minimum Array Element for arrayed surfaces"
However, ISL defines array_len as the length of the range
[base_array_layer, base_array_layer + array_len], so it already represents
a value relative to the base array layer like the hardware expects.
v2: Depth is defined as a U11-1 field, so subtract 1 from
the actual value (Jason)
This fixes a number of new CTS tests that would crash otherwise:
dEQP-VK.pipeline.render_to_image.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The algorithms used by this pass, especially for division, are heavily
based on the work Ian Romanick did for the similar int64 lowering pass
in the GLSL compiler.
v2: Properly handle vectors
v3: Get rid of log2_denom stuff. Since we're using bcsel, we do all the
calculations anyway and this is just extra instructions.
v4:
- Add back in the log2_denom stuff since it's needed for ensuring that
the shifts don't overflow.
- Rework the looping part of the pass to be easier to expand.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Each of the pop functions (and push_else) take a control flow parameter as
their second argument. If NULL, it assumes that the builder is in a block
that's a direct child of the control-flow node you want to pop off the
virtual stack. This is what 90% of consumers will want. The SPIR-V pass,
however, is a bit more "creative" about how it walks the CFG and it needs
to be able to pop multiple levels at a time, hence the argument.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This is shared between the Vulkan and GL drivers as it's a requirement
of the back-end compiler. However, it doesn't really belong in the
compiler. We rename the file to match the prefix of the other stuff in
common and because libdrm defines an intel_debug.h and this avoids a
pile of possible name conflicts.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This hasn't been used for quite some time now but we never bothered to
get rid of it when we dropped GLSL IR support for vec4.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
One of these days, I'd like to see this function go away all together
but for now, let's at least put it near the struct it updates.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It does sort-of go with MAX_UBO and friends but MAX_DRAW_BUFFERS is an
actual hardware constant based on the number of things we can blend
rather than an arbitrary "number of things allowed in GL" like some of
the other maximums are.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While we're at it, we also change the GEN6 binding macro to be a start
index that gets added to the binding. This makes things a bit more
explicit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's currently in brw_util.c but that's the only bit of brw_util.c
that's shared between the compiler and the rest of the GL driver.
It's just a fairly obvious table so the duplication isn't bad. It's
certainly less pain than trying to figure out how to share the code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Vulkan doesn't respect MAX_SURFACES so this assert isn't valid in that
case. It should, however, assert that it isn't insanely large.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a relic of when we wired up meta to be able to use RECTLIST
primitives. It's no longer needed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This isn't used by Vulkan and is specific to the way the GL driver
works. There's no reason to have it in common compiler code. Also, it
relies on BRW_MAX_* defines which are defined in brw_context.h
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The zy swizzle gives us one component of quotient and one component of
remainder. What we wanted was zw for the remainder.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We're about to use the build-id as the starting point for another SHA1
hash in the Intel Vulkan driver, and returning a pointer is far more
convenient.
Reviewed-by: Chad Versace <chadversary@chromium.org>
The queryid_valid() function asserts that an ID given by an application
isn't zero since the spec explicitly reserves an ID of zero as invalid.
The implementation was written as if the ID was a signed integer and
based on the assumption that queryid_to_index() is simply subtracting
one from the ID. It was broken because in fact the ID was stored in an
unsigned int and testing for an index >= 0 would always succeed.
This adds a spec quote to clarify why zero is considered invalid and
checks for zero before even passing the ID to queryid_to_index() for
then checking the upper bound.
This is a v2 of a patch originally posted by Juha-Pekka (thanks)
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Fix usage of ac_add_function_attr() and make it known!
common/ac_nir_to_llvm.c: In function 'create_llvm_function':
common/ac_nir_to_llvm.c:265:4: error: implicit declaration of function
'ac_add_function_attr' [-Werror=implicit-function-declaration]
ac_add_function_attr(main_function, i + 1, AC_FUNC_ATTR_BYVAL);
^~~~~~~~~~~~~~~~~~~~
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The wl_drm interface (akin to X11's DRI2) uses the standard set of DRM
FourCC format codes. wl_shm copies this, except for ARGB8888/XRGB8888,
which use their own definitions.
Make sure we only use wl_shm format codes when we're working with
wl_shm. Otherwise, using swrast with 32bpp formats would fail with an
error.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Daniel Stone <daniels@collabora.com> (v1)
Fixes: cb5e799448 ("egl/wayland: unify dri2_wl_create_surface implementations")
v2: [Emil Velikov: move to dri2_wl_create_window_surface]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com> (IRC)
In the past, we used this on Gen4-5 to transform non-normalized texture
coordinates (for sampler2DRect) to normalized ones. We also used it on
Gen6-7.5 for sampler2DRect with GL_CLAMP.
Jason dropped this code in 6c8ba59cff
in favor of using nir_lower_tex(), which just does a textureSize()
call. But we were still setting up these state references for
useless uniform data.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
They can vary at call sites if the intrinsic is NOT a legacy SI intrinsic.
We need this to force readnone or inaccessiblememonly on some amdgcn
intrinsics.
This is only used with LLVM 4.0 and later. Intrinsics only used with
LLVM <= 3.9 don't need the LEGACY flag.
gallivm and ac code is in the same patch, because splitting would be
more complicated with all the LEGACY uses all over the place.
v2: don't change the prototype of lp_add_function_attr.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com> (v1)
Even though compute shaders cannot access the framebuffer, there is a
synchronization issue when a compute dispatch accesses a texture that
was previously bound and drawn to as a framebuffer.
Section 9.3 (Feedback Loops Between Textures and the Framebuffer) of
the OpenGL 4.5 spec rather implicitly clarifies that undefined behavior
results if the texture is still attached to the currently bound
framebuffer. However, the feedback loop is broken when the application
changes the framebuffer binding before a compute dispatch, and the
state tracker needs to let the driver known about this.
Fixes GL45-CTS.compute_shader.pipeline-post-fs on SI family Radeons.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
exec_node::get_prev() does not guard against going past the beginning
of the list, so we need to add explicit checks here.
Found by ASAN in piglit arb_shader_storage_buffer_object-rendering.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
radeon_llvm_check and friends were never called in the no-opencl case,
which ended up with an empty llvm module list. As --enable-opencl always
requires --enable-llvm, we can use the latter as the guard.
Signed-off-by: Marc Dietrich <marvin24@gmx.de>
[Emil Velikov: commit message polish]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This bit is definitely not necessary because subroutine_list
can be used instead. This frees one more bit in the flags.q
struct which is nice because arb_bindless_texture will need
4 bits for the new layout qualifiers.
No piglit regressions found (including compiler tests) with
"-t subroutine".
v2: set the subroutine flag for validating illegal flags
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Do not hardcode the file in the python script, but pass it via the build
system(s). The latter is the only one that should know about the file
location/tree structure.
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The following changes are implemented:
Add src/vulkan/Android.mk to build libmesa_vulkan_util
Android.mk: add src/vulkan to SUBDIR to build new module
intel/vulkan: fix libmesa_vulkan_util,vk_enum_to_str.h dependencies
Add -o OUTPUT_PATH option in src/vulkan/util/gen_enum_to_str.py script
Use -o OUTPUT_PATH option in automake generation rules for vk_enum_to_str.{c,h}
Fixes: e9dcb17 "vulkan/util: Add generator for enum_to_str functions"
Fixes: 8e03250 "vulkan: Combine wsi and util makefiles"
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
[Emil Velikov]
- Move parser within main()
- Use --outdir instead of -o
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Since both r600 and radeonsi use code from libamd_common they need to
static link it. At the same time, adding a common library to LIB_DEPS is
fragile [can lean to multiple symbol definitions] and non-obvious - I
had to do a double-take how things work atm.
So follow the libradeon.la approach and put common libraries in
TARGET_RADEON_COMMON
Fixes: 936f5407a7 ("gallium/radeon: Add libamd_common.a to TARGET_LIB_DEPS also for r600")
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Otherwise we'll fail to find the header and `make distcheck` will bail.
Fixes: e9dcb17962 ("vulkan/util: Add generator for enum_to_str functions")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
When generating the MOV INDIRECT instruction, the source type is ignored
and it is set to destination's type. However, this is going to change in a
later patch, so we need to explicitly set the proper source type.
brw_vec8_grf() creates an float type's fs_reg by default, when the
ICP handle is actually unsigned. This patch fixes these cases before
applying the aforementioned patch.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The lowered BSW/BXT indirect move instructions had incorrect
source types, which luckily wasn't causing incorrect assembly to be
generated due to the bug fixed in the next patch, but would have
confused the remaining back-end IR infrastructure due to the mismatch
between the IR source types and the emitted machine code.
v2:
- Improve commit log (Curro)
- Fix read_size (Curro)
- Fix DF uniform array detection in assign_constant_locations() when
it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT.
v3:
- Move changes in assign_constant_locations() to other patch.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Previously, if we had accesses with different sizes to the same uniform, we might not
push it aligned with the bigger one. This is a problem in BSW/BXT when we access
an array of DF uniform with both direct and indirect addressing because for the latter
we use 32-bit MOV INDIRECT instructions. However this problem can happen with other
generations and bitsizes.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
So we don't need to know about radv_sampler in ac_nir_to_llvm.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes build failure with --enable-opencl --enable-xvmc:
make[4]: Entering directory '/home/daenzer/src/mesa-git/mesa/build-amd64/src/gallium/targets/xvmc'
CXXLD libXvMCgallium.la
../../../../src/gallium/drivers/r600/.libs/libr600.a(evergreen_compute.o): In function `evergreen_create_compute_state':
/home/daenzer/src/mesa-git/mesa/build-amd64/src/gallium/drivers/r600/../../../../../src/gallium/drivers/r600/evergreen_compute.c:254: undefined reference to `ac_elf_read'
../../../../src/gallium/drivers/r600/.libs/libr600.a(evergreen_compute.o): In function `r600_shader_binary_read_config':
/home/daenzer/src/mesa-git/mesa/build-amd64/src/gallium/drivers/r600/../../../../../src/gallium/drivers/r600/evergreen_compute.c:189: undefined reference to `ac_shader_binary_config_start'
/home/daenzer/src/mesa-git/mesa/build-amd64/src/gallium/drivers/r600/../../../../../src/gallium/drivers/r600/evergreen_compute.c:189: undefined reference to `ac_shader_binary_config_start'
collect2: error: ld returned 1 exit status
Makefile:760: recipe for target 'libXvMCgallium.la' failed
Fixes: dc4c551a34 ("radeon/ac: switch from radeon_elf_read() to ac_elf_read()")
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com>
I have no idea why these were part of the compiler files. They're
miptree related code, and the compiler doesn't appear to use them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
For radeonsi we could probably switch to
ac_shader_binary_read_config(). However the functions have
diverged so just share this helper for now.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The read config functions are different for r600 and radeonsi so
we can't just share the one in amd common. So just share this
instead.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There was exactly one user of this, and I just removed it.
It also accessed an implicit global context, with no locking. This
meant that it was only safe if all callers of ralloc_autofree_context()
held the same lock...which is a pretty terrible thing for a utility
library to impose.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of using ralloc_autofree_context() to install an atexit()
handler to ralloc_free(glsl_type::mem_ctx), we can simply free them
from _mesa_glsl_release_types().
This is effectively the same, because _mesa_glsl_release_types() is
called from _mesa_destroy_shader_compiler(), which is called from Mesa's
one_time_fini() function, which Mesa installs as an atexit() handler.
The one advantage here is that it ensures the built-in functions are
destroyed before the types.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
this allows to pass the generated files directly to llc or bugpoint
v2: add atomic counter ID
v3: remove extra scope operator, constify
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
For blitting we need to use the depth or stencil format, never
the combined.
This fixes:
dEQP-VK.texture.shadow.2d.nearest.less_or_equal_d32_sfloat_s8_uint
and a few others.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These formats are used by some CTS tests, may as well fill them in.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is similar to what we do in the texture error codepath.
While we are at it, update the specification comment with
latest GL 4.5 spec.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This improves consistency with image variables and atomic
counters which are already rejected the same way.
Note that opaque variables can't be treated as l-values, which
means only the 'in' function parameter is allowed.
v2: rewrite commit message
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
The main idea behind this is to free some bits in the flags.q
struct because currently all 64-bits are used and we can't
add more layout qualifiers without reaching a static assert.
In order to do that (mainly for ARB_bindless_texture), use an
enumeration for the AMD_conservative_depth layout qualifiers
because it's forbidden to declare more than one depth qualifier
for gl_FragDepth.
Note that ast_type_qualifier::merge_qualifier() will prevent
using duplicate layout qualifiers by returning a compile-time
error.
No piglit regressions found (including compiler tests) with
RX480 on RadeonSI.
v2: use a switch case
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Andres Gomez <agomez@igalia.com> (v1)
Preliminary work for ARB_bindless_texture which can interact
with ARB_shader_image_load_store.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This game uses GLSL 430 but the interpolation qualifiers in
some shaders don't match, which ends up in a link error. GLSL
440 spec removed this restriction, force it.
This fixes the following link error, as well as serious
rendering problems.
error: vertex shader output `out_TEXCOORD1' specifies noperspective
interpolation qualifier, but fragment shader input specifies no
interpolation qualifier
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
If i-th thread could not be created it means we have i threads,
not i+1, because we start from 0.
Fixes: 404d0d5 "gallium/u_queue: add an option to have multiple worker threads"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Commit 4aea8fe ("gallium/u_queue: fix random crashes when the app calls
exit()") added a atexit handler which calls
util_queue_killall_and_wait() for each queue to stop the threads.
However the app is also free to use atexit handlers to clean up things,
leading to util_queue_destroy() call which will also call
util_queue_killall_and_wait() for the same queue again, causing threads
being joined twice, and that is undefined. This happens with libglut,
for example. A simple fix is to just set num_threads to 0 as there are
no more valid threads after util_queue_killall_and_wait() returns.
Fixes: 4aea8fe "gallium/u_queue: fix random crashes when the app calls exit()"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Per spec, VK_QUERY_RESULT_64_BIT specifies the integer size and the
availability flag is an integer. We apparently handled this correctly
already for the copy to buffer case.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
PKT3_OCCLUSION_QUERY hangs when used in a nested IB. This only
calls it when in a primary command buffer and we change
GetQueryPoolResults to not need it. CmdCopyQueryPoolResults
still needs it so we break that behavior for secondary command buffers.
However, that would hang already and using an unitialized value is
better than a hang.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
This adds initial support for NV_dedicated_allocation, then
uses it for the wsi image/memory allocation paths internally
in the driver.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This bo->fd wasn't setting some stuff correctly that could
lead to crashes for anything using this path later.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a complete rewrite of my previous rfc patches.
This adds the ability to present to a different GPU that rendering
using a driver side operation that can copy from the tiled to
linear shared image.
This does prime support completely in the swapchain present code,
and each queue has a precreated command buffer for each image
and for the each queue family. This means presenting should work
on graphics and compute queues and transfer in the future.
v1.1: initialise needs_linear_copy in swapchain.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously, only the last error code was returned.
Using `set -e` makes the script quit on any unhandled error.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
This fixes 4a883966c1 where the
PIPE_CAP was removed.
Now USER_INDEX_BUFFERS are always enabled remove the check and only
check for cmst_active directly.
v2: Axel pointed out the code was still needed when cmst was inactive,
Rebase on master too
v3: Drop struct member user_ibufs also && fixup shortlog (Edward).
v4: Fix negation
v5: Use the right variable name csmt != cmst
Fixes: 4a883966c1 ("gallium: remove PIPE_CAP_USER_INDEX_BUFFERS")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99953
Reported-and-tested-by: Vinson Lee <vlee@freedesktop.org> (v1)
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Make use of common uploaders that landed recently to Mesa
v2: fixed formatting, broken due to thunderbird configuration
v3: per Axel comment: added a comment into NineDevice9_DrawPrimitiveUP
v4: per Axel comment: changed style of the comment
This reduces register pressure in both types of shaders, by reordering the
input loads from the var->data.driver_location order to whatever order
they appear first in the NIR shader. These instructions aren't
reorderable at our QIR scheduling level because the FS takes two in
lockstep to do an interpolation, and the VS takes multiple read
instructions in a row to get a whole vec4-level attribute read.
shader-db impact:
total instructions in shared programs: 76666 -> 76590 (-0.10%)
instructions in affected programs: 42945 -> 42869 (-0.18%)
total max temps in shared programs: 9395 -> 9208 (-1.99%)
max temps in affected programs: 2951 -> 2764 (-6.34%)
Some programs get their max temps hurt, depending on the order that the
load_input intrinsics appear, because we end up being unable to copy
propagate an older VPM read into its only use.
We need to be paying attention to optimization's impact on this -- even if
we reduce instruction count, increasing max temps in general is likely to
cause us to fail to register allocate on some shaders, which means that
those won't run at all.
CXX glsl/ast_to_hir.lo
glsl/ast_to_hir.cpp: In member function 'virtual ir_rvalue* ast_declarator_list::hir(exec_list*, _mesa_glsl_parse_state*)':
glsl/ast_to_hir.cpp:4846:42: warning: missing braces around initializer for 'unsigned int [16]' [-Wmissing-braces]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Andres Gomez <agomez@igalia.com>
If a VAO isn't bound and u_vbuf isn't enabled because of the Core profile,
we'll get user vertex buffers in drivers if we update vertex buffers
in glClear. So don't do that.
This fixes a regression since disabling u_vbuf for Core profiles.
Reviewed-by: Brian Paul <brianp@vmware.com>
The clip state is updated before VS, so it can be NULL for the first draw
call. Just remove the unnecessary dependency on st->vp.
Reviewed-by: Brian Paul <brianp@vmware.com>
Not needed. ddebug does the same thing. The limitation is that drivers
can only use pipe_resource::screen through pipe_resource_reference.
This unbreaks trace, because pipe_context uploaders aren't wrapped,
so trace doesn't understand buffers returned by them.
Reviewed-by: Brian Paul <brianp@vmware.com>
v3: split from the etnaviv patch; fix new_ib.buffer leak
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com> (VMware driver only)
the format of the rt can be different than the one of the texture, so must
propagate the format explicitly to the helper. Broken since
3f9c5d6244 (but unused by st/mesa).
This changes the way radv_entrypoints_gen.py works from generating a
table containing every single entrypoint in the XML to just the ones
that we actually need. There's no reason for us to burn entrypoint
table space on a bunch of NV extensions we never plan to implement.
RADV implements VK_AMD_draw_indirect_count, so add that to the list.
Port of 114c281e70
"and/entrypoints: Only generate entrypoints for supported features"
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Ideally would have caught these when adding the interface but this just
switches a few return types for the INTEL_performance_query backend
interface to bool instead of GLboolean.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Starting with the next commit, badly sorting this list will break the
eglGetProcAddress().
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This will allow us to make sure the list is always sorted in the next
commit.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Let's make that comment true.
If will also be necessary in a couple commits (using bsearch).
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Use the utility u_copy_nv12_from_yv12 to implement this similarly to
how it's been done in the VPAU state tracker. The old code mixed up
planes and fields and didn't correctly handle video surfaces in
interlaced format.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
mplayer likes putting YV12 data, and if there is a buffer format mismatch,
the vdpau state tracker would try to reallocate the video surface as an
YV12 surface. A virtual driver doesn't like reallocating and doesn't like YV12
surfaces, so if we can't support YV12, try an YV12 to NV12 conversion
instead.
Also advertize that we actually can do the getBits and putBits conversion.
v2: A previous version of this patch prioritized conversion before
reallocating. This has been changed to prioritize reallocating in this version.
Cc: Christian König <christian.koenig@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
v3: have util_clear_texture mirror the pipe function (Roland Scheidegger)
v2: rework util clear functions such that they operate on a resource
instead of a surface (Roland Scheidegger)
Creates a util_clear_texture function for implementing the GL_ARB_clear_texture
in softpipe and llvmpipe.
Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
This adds support to write to sample mask from the fragment shader.
We can optimise this later like radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This refactors out the sample index fixup between
txf and image load.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This follows the txf_ms code, I can't figure out why amdgpu-pro
doesn't do this in their shaders, they must know someone we don't.
This fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_id.*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The code was interpolating at the offset from the sample,
not the offset from the center. Also fix for persample interpolation
modes we should force the pixel center to be at the sample.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix issue with index buffers that do not contain a 0 index. 0 index
can be a non-valid index if the (copied) vertex buffers are a subset of the
user's (which happens because we only copy the range between min & max).
Core will use an index passed in from the driver to replace invalid indices.
Only do this for calls that contain non-zero indices, to minimize performance
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
cost.
For now, the cache key is all of FETCH_COMPILE_STATE.
Use new/delete for swr_vertex_element_state, since we have to call the
constructors/destructors of the struct elements.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
If a thread doesn't load GLSL IR from cache but does load TGSI
from cache (that was created by another thread) than it will
crash due to expecting gl_program_parameter_list to have been
restored from the GLSL IR cache and not be null.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
This just enables basic MSAA compression (no fast clears) for all
multisampled surfaces. This improves the framerate of the Sascha
"multisampling" demo by 76% on my Sky Lake laptop. Running Talos on
medium settings with 8x MSAA, this improves the framerate in the
benchmark by 80%.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Not all clear colors are valid. In particular, on Broadwell and
earlier, only 0/1 colors are allowed in surface state. No CTS tests are
affected outright by this because, apparently, the CTS coverage for
different clear colors is pretty terrible. However, when multisample
compression is enabled, we do hit it with CTS tests and this commit
prevents regressions when enabling MCS on Broadwell and earlier.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
v2: Instead of having the same block in isl_gen7,8,9.c add it
once into isl.c::isl_choose_image_alignment_el() instead.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The isl_surf_init call that each of these helpers make can, in theory,
fail. We should propagate that up to the caller rather than just
silently ignoring it.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
OpenGL allows the TCS to be missing and supplies an implicit passthrough
shader, but OpenGL ES does not (see section 7.3 of the ES 3.2 spec,
cited above in the code).
One open question is how to handle this for ARB_ES3_2_compatibility.
This patch raises the link error for all ES shading language programs,
but it might make sense to base it on the API. The approach taken in
this patch is more restrictive, but should still allow any valid ES
programs to work in GL.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Andres Gomez <agomez@igalia.com>
From IVB PRM, SURFACE_STATE::Height:
"For typed buffer and structured buffer surfaces, the number of
entries in the buffer ranges from 1 to 2^27 . For raw buffer
surfaces, the number of entries in the buffer is the number of bytes
which can range from 1 to 2^30."
The minimum value is 1, according to the spec. The spec quote
was already added into the code by 028f6d8317.
Fixes crashing tests under:
dEQP-VK.robustness.buffer_access.*
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
From ARB_post_depth_coverage:
"This extension allows the fragment shader to control whether values in
gl_SampleMaskIn[] reflect the coverage after application of the early
depth and stencil tests. This feature can be enabled with the following
layout qualifier in the fragment shader:
layout(post_depth_coverage) in;
Use of this feature implicitly enables early fragment tests."
And a bit later it also adds:
"early_fragment_tests" requests that fragment tests be performed before
fragment shader execution, as described in section 15.2.4 "Early Fragment
Tests" of the OpenGL Specification. If neither this nor post_depth_coverage
are declared, per-fragment tests will be performed after fragment shader
execution."
Fixes:
GL45-CTS.post_depth_coverage_tests.PostDepthSampleMask
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It will return the current variable ('var') or the earlier declaration ('earlier') in
case of redeclaration of that variable.
In order to distinguish between both, 'is_redeclaration' boolean will indicate in which
case we are.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The get_variable_being_redeclared() function can free 'var' because
a re-declaration of an unsized array variable can establish the size, so
we set the array type to the 'earlier' declaration and free 'var' as it is
not needed anymore.
However, the same 'var' is referenced later in ast_declarator_list::hir().
This patch fixes it by picking the ir_variable_mode from the proper
ir_variable.
This error was detected by Address Sanitizer.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99677
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Before releasing a shared context, flush the context
with ST_FLUSH_WAIT to make sure all commands are executed.
This ensures that rendering to any shared resources is completed
before they will be referenced by another context.
Fixes an intermittent flickering with Photoshop. (VMware bug# 1779340)
Reviewed-by: Brian Paul <brianp@vmware.com>
When st_context_flush() is called with ST_FLUSH_WAIT,
the function will return after the fence is completed.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes up the clip distance passing between the geometry
shader and the copy shader. It packs the clip and cull distances
into one or two consecutive slots, and avoids wasting space and
make sure the gs output and copy shader input agree on where
things are stored.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This works out the geometry shader clip/cull inputs separately
to the outputs, and uses that information to read from the ES->GS
ring buffer. It stores the clip/cull distances packed into one
or two slots. It fixes the es output emission and gs input
reading to match.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
As geom shaders can have different ones on entry and exit.
also move to uint8_t as these are never that big.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For prime support I need to access this, so move it in advance.
[airlied: fix int->uint32_t]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In some configurations the util directory is created when building out
of tree, but not others. This patch ensures that it's created.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-and-Tested-by: Mike Lothian <mike@fireburn.co.uk>
For gpu generations that use LLVM we create a timestamp string
containing both the LLVM and Mesa build times, otherwise we just
use the Mesa build time.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will be used to share the sha1 computed by the tgsi load
function with the tgsi write function.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We want to use this in the new tgsi shader cache so we move it here
and make it available externally.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If there was more than a single directory in the .cache/mesa dir
then it would only remove one (or none) of the directories.
Apparently Valgrind was also reporting:
Conditional jump or move depends on uninitialised value
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
This adds a python generator to produce enum_to_str functions for
Vulkan from the vk.xml API description. It supports extensions as well
as core API features, and the generator works with both python2 and
python3.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
sizeof(struct pipe_draw_info) = 104 -> 88
Also, vertices_per_patch is switched to ubyte, because it can't be more
than 32.
Seemed-reasonable-to: Roland Scheidegger
This fixes:
vdpauinfo: ../lib/CodeGen/TargetPassConfig.cpp:579: virtual void
llvm::TargetPassConfig::addMachinePasses(): Assertion `TPI && IPI &&
"Pass ID not registered!"' failed.
v2: use list_head, switch the call order in destroy
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This adds a bare-bones backend for the INTEL_performance_query extension
that exposes pipeline statistics.
Although this could be considered redundant given that the same
statistics are already available via query objects, they are a simple
starting point for this extension and it's expected to be convenient for
tools wanting to have a single go to api to introspect what performance
counters are available, along with names, descriptions and semantic/data
types.
This code is derived from Kenneth Graunke's work, temporarily removed
while the frontend and backend interface were reworked.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of using the same backend interface as AMD_performance_monitor
this defines a dedicated INTEL_performance_query interface that is
modelled more on the ARB_query_buffer_object interface (considering the
similarity of the extensions) with the addition of vfuncs for
initializing and enumerating query and counter info.
Compared to the previous backend, some notable differences are:
- The backend is free to represent counters using whatever data
structures are optimal/convenient since queries and counters are
enumerated via an iterator api instead of declaring them using
structures directly shared with the frontend.
This is also done to help us support the full range of data and
semantic types available with INTEL_performance_query which is awkward
while using a structure shared with the AMD_performance_monitor
backend since neither extension's types are a subset of the other.
- The backend must support waiting for a query instead of the frontend
simply using glFinish().
- Objects go through 'Active' and 'Ready' states consistent with the
query object backend (hopefully making them more familiar). There is
no 'Ended' state (which used to show that a query has ended at least
once for a given object). There is a new 'Used' state, set when a
query is first begun which implies that we are expecting to get
results back for the object at some point. There's no equivalent to
the 'EverBound' state since the spec doesn't require there to be a
limbo state between generating IDs and associating them with an object
on query Begin.
The INTEL_performance_query and AMD_performance_monitor extensions are
now completely orthogonal within Mesa main (though a driver could
optionally choose to implement both extensions within a unified backend
if that were convenient for the sake of sharing state/code).
v2: (Samuel Pitoiset)
- init PerfQuery.NumQueries in frontend
- s/return_string/output_clipped_string/
- s/backed/backend/ typo
- remove redundant *bytesWritten = 0
v3:
- Add InitPerfQueryInfo for lazy probing of available queries
v4:
- Clean up some internal usage of GL typedefs (Ken)
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To allow the backend interfaces for AMD_performance_monitor and
INTEL_performance_query to evolve independently based on the more
specific requirements of each extension this starts by separating
the frontends of these extensions.
Even though there wasn't much tying these frontends together, this
separation intentionally copies what few helpers/utilities that were
shared between the two extensions, avoiding any re-factoring specific to
INTEL_performance_query so that the evolution will be easier to follow
later.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It looks like it was partly copied from the median filter fragment shader
and unnecessesarily saved a lot of temporary values.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The vdpau state tracker allows multiple threads access to the same gallium
context simultaneously. We can fix this either by locking the same mutex
each time the context is used or by using a different gallium context for
each mutex domain. Here we do the latter, although I'm not sure that's really
the best option.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
When looking at the full range matrices, it becomes obvious that the difference
between the standard matrices and the full range matrices is that the full
range matrices are multiplied by 1.164. Together with offsetting the y value
with -16/255, this will scale and offset RGB with the desired quantities.
However, the standard SMPTE 240M matrix seems to differ a bit since the
U and V coefficients are only multiplied with 1.138 to get the full range
matrix. This would actually alter the color somewhat so I figure that's an
error. The full range matrix is consistent with Nvidia's VDPAU implementation.
We can also incorporate the ybias in the brightness simplifying the
calculation somewhat.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The brightness matrix doesn't actually match the procamp matrix and
what's calculated in vl_csc_get_matrix.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
It will cause multiple simultaneous maps of the same vertex buffer and
flushed-while-mapped warnings.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Windows doesn't have dlfcn.h. Protect the code in question
with #if ENABLE_SHADER_CACHE test. And fix indentation.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This extension adds new query types which can be used to detect overflow
of transform feedback buffers. The new query types are also accepted by
conditional rendering commands.
v3:
- s/gen7+/gen6+/ in the relnotes (Jordan Justen)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Enable the use of a transform feedback overflow query with
glBeginConditionalRender. The render commands will only execute if the
query is true (i.e. if there was an overflow).
Use ARB_conditional_render_inverted to change this behavior.
v4:
- reuse MI_MATH calcs from hsw_queryob (Kenneth)
- fallback to software conditional rendering when MI_MATH is not
available (Kenneth)
v5:
- check query->Target (Kenneth)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Enable getting the results of a transform feedback overflow query with a
buffer object.
v4:
- hsw_overflow_result_to_gpr0 a public function, so it can be used
by conditional render. (Kenneth)
- fix typo grp0/gpr0 (Kenneth)
- rename load_gen_written_data_to_regs to
load_overflow_data_to_cs_gprs (Kenneth)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When querying for transform feedback overflow on one or all of the
streams, store information about number of generated and written
primitives. Then check whether generated == written.
v2:
- use only SO_PRIM_STORAGE_NEEDED, do not fallback to
CL_INVOCATION_COUNT. (Kenneth)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add some basic types and storage for the queries of this extension.
v2:
- update date of extension (Kenneth)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
WARNING: sphinx.ext.pngmath has been deprecated. Please use
sphinx.ext.imgmath instead.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
src/gallium/docs/source/tgsi.rst:3488: WARNING: Title underline too short.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Without these, mathjax considers these as the continuation of the
previous line.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
src/gallium/docs/source/context.rst:95: ERROR: Unexpected indentation.
Sub lists need to be surrounded by a blank line.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The only feature over and above ES 3.0 is DrawTransformFeedback().
We already have to do the whole SOL_NUM_PRIMS_WRITTEN counter dance in
order to compute the SVBI value for ResumeTransformFeedback(), at which
point our existing GetTransformFeedbackVertexCount() implementation will
do the trick (though with a stall to CPU map the buffer).
Someday, we could probably implement DrawTransformFeedback() more
efficiently, using the "Load Internal Vertex Count" feature of
3DSTATE_SVB_INDEX and the 3DPRIMITIVE indirect vertex count bit.
Rumor has it this allows people to use WebGL 2.0 on Sandybridge.
Note that we don't need pipelined register writes like Gen7+ because
we use the 3DSTATE_SVB_INDEX command rather than MI_LOAD_REGISTER_MEM.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99842
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This fixes Piglit's ARB_transform_feedback2/change-objects-while-paused
GLES 3.0 test. When resuming the transform feedback object, we need to
reset the SVBI counters so we continue writing at the correct point in
the buffer.
Instead of SO_WRITE_OFFSET counters (with a DWord offset), we have the
Streamed Vertex Buffer Index (SVBI) counters, which contain a count of
vertices emitted.
Unfortunately, there's no straightforward way to store the current SVBI
counter values to a buffer. They're not available in a register. You
can use a bit in the 3DSTATE_SVB_INDEX packet to copy them to another
internal counter which 3DPRIMITIVE can use...but there's no good way to
extract that either.
So, once again, we use SO_NUM_PRIMS_WRITTEN to calculate the vertex
numbers. Thankfully, we can reuse most of the existing Gen7+ code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
I'm going to need this in a new Resume hook shortly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Sandybridge and earlier only have a single counter.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This way on Sandybridge we'll only do 1 stream worth of math, since
we only have one SO_NUM_PRIMS_WRITTEN counter.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
I plan to use these functions on Sandybridge soon. I changed the prefix
on a couple of functions to "brw" instead of "gen7" as in theory they
should be usable all the way back to G45.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
These driver hooks are not used when MI_MATH and MI_LOAD_REGISTER_REG
are supported, which Gen8+ can always do. So this code is dead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
R600_DEBUG=mono has had no effect since:
commit 1fabb29717
Author: Marek Olšák <marek.olsak@amd.com>
Date: Tue Feb 14 22:08:32 2017 +0100
radeonsi: have separate LS and ES main shader parts in the shader selector
Also, this assertion was failing:
si_state_shaders.c:1307: si_shader_select_with_key: Assertion
`!shader->is_optimized' failed.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We were unconditionally storing these outputs, sometimes even one component
at a time, but apps never read them in TES.
Move the TESSINNER/OUTER buffer stores into the TCS epilog where we can
easily disable them on demand.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This removes a lot of useless LDS stores.
A few games read TESSINNER/OUTER, but not any other outputs. Most games
don't read any outputs.
The only app doing LDS output reads is UE4 Lightsroom Interior.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This allows the helper to check for llc instead of having to do it
manually at all the call sites.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
All this cache line address calculation stuff is tricky. Let's not
duplicate it more places than we have to.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It's a bit shorter and easier to work with. Also, we're about to add a
helper called clflush which does the clflush but without any memory
fencing.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This validation was added before the etnaviv drm driver landed in
the linux kernel. Due some pre-merge API changes we had to fix-up
this value but with a mainline kernel this is not a problem anymore.
Lets remove that validation which also gets rid of problem caught
by Coverity, reported to me by imirkin.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Used only within the generated source file.
Fixes: 12301c5418 ("radv: drop the RADV_CALL macro.")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Otherwise symbols wont be annotated with C linkage and we'll fail at
link time.
Currently this is worked around by wrapping the header inclusion itself.
The latter in itself fragile and not recommended.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
It's a problem waiting to happen. Individual headers should be annotated
if needed.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Namely, after the include directives. The headers are properly annotated
so keeping things as-is is only asking for trouble.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
compiler.h defines a few mesa specific macros which are not C specific.
This allows us to avoid buggy extern C { #include $system_header }
constructs.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
i.e. add extern C {} in program/symbol_table.h
It will allow us remove a workaround we have elsewhere in the code.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The same PS epilog workaround as for 8-bit integer formats is required,
since the CB doesn't do clamping.
Fixes GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels*.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Also handle the GL_ARB_indirect_parameters case where the count itself
is in a buffer.
Use transfers rather than mapping the buffers directly. This anticipates
the possibility that the buffers are sparse (once ARB_sparse_buffer is
implemented), in which case they cannot be mapped directly.
Fixes GL45-CTS.gtf43.GL3Tests.multi_draw_indirect.multi_draw_indirect_type
on <= CIK.
v2:
- unmap the indirect buffer correctly
- handle the corner case where we have indirect draws, but all of them
have count 0.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Allocating huge buffers in VRAM is not a problem, but when those buffers
start being migrated, the kernel runs into errors because it cannot split
those buffer up for moving through GTT.
This should fix intermittent failures of
GL45-CTS.texture_buffer.texture_buffer_max_size
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The preamble flushes now and the rest is the responsibility of the app.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This splits out the cache flush bit setting code
dependent on the src/dest access flags.
It then calls it from the subpass barrier code.
It also marks a TODO to remove the aggressive CS/PS
flushes at some point.
This fixes a bunch of the
dEQP-VK.renderpass.attachment_allocation.input_output.*
tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This assert might have made sense before but we no longer use
gl_linked_shader here. Unless the caller has really done something
crazy this assert is fairly useless.
We also do some small tidy ups in this change.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The version tag used to nominate has bitten even experienced mesa
developers. Not to mention that it deviates from the one used in the
kernel leading to further confusion.
Simplify things and omit it all together.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reword the section to focus on what is allowed, using a more brief, yet
descriptive wording.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reenable the PPC64LE Vector-Scalar Extension for LLVM versions >= 3.8.1,
now that LLVM bug 26775 and its corollary, 25503, are fixed.
Amendment: remove extraneous spaces in macro def & invocations.
We would prefer a runtime check, e.g. via an LLVMQueryString
(analogous to glGetString, eglQueryString) or LLVMGetVersion API,
but no such API exists at this time.
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
[Emil Velikov: remove LLVM_VERSION macro]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
If llvm::sys::getHostCPUName() returns "generic", override
it with "pwr8" (on PPC64LE).
This is a work-around for a bug in LLVM: a table entry for "POWER8NVL"
is missing, resulting in (big-endian) "generic" being returned on
little-endian Power8NVL systems. The result is that code that
attempts to load the least significant 32 bits of a 64-bit quantity in
memory loads the wrong half.
This omission should be fixed in the next version of LLVM (4.0),
but this work-around should be left in place in case some
future version of POWER<n> also ends up unrepresented in LLVM's table.
This workaround fixes failures in the Piglit arb_gpu_shader_fp64 conversion
tests on POWER8NVL processors.
(V4: add similar comment in the code.)
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Cc: 12.0 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Define ElfW() and NT_GNU_BUILD_ID if needed as these defines are not
present on at least OpenBSD and FreeBSD. Fixes the build on OpenBSD.
Fixes: d4fa083e11 ("util: Add utility build-id code.")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
generated-sources-dir-for macro replaces intermediates-dir-for
and LOCAL_MODULE_CLASS is defined as required by new macro,
in order to avoid the following building error:
external/mesa/src/gallium/drivers/radeonsi/si_debug.c:29:10: fatal error: 'sid_tables.h' file not found
^
1 error generated.
Fixes: 730574c58e ("android: ac/debug: move sid_tables.h generation and
IB decode to amd/common")
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
This adds support to radv_GetPhysicalDeviceXlibPresentationSupportKHR
and radv_GetPhysicalDeviceXcbPresentationSupportKHR to check if the
local device file descriptor is compatible with the descriptor
retrieved from the X server via DRI3.
This will stop radv binding to an X server until we have prime
support in place. Hopefully apps use this API before trying
to render things.
v2: drop unneeded function, don't leak memory. (jekstrand)
v3: also check in surface_get_support callback.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just keeps popping up minor problems and regressions we should
revisit in a more sustainable manner later.
This also reverts:
Revert "radv: query cmds should mark a cmd buffer as having draws."
Revert "radv: also fixup event emission to not get culled."
This reverts commit d1640e7932.
This reverts commit 8b47b97215.
This reverts commit b4b19afebe.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
start can only be non-zero with MultiDrawElements, which is unlikely
to occur with UNSIGNED_BYTE indices.
v2: Also fix the util_shorten_ubyte_elts_to_userptr call.
Tested with the new piglit.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This iterates the fast clear flush across the layers in the
specified range.
It also moves the compute resolve flush into the function
and builds the range in there.
This fixes:
dEQP-VK.geometry.layered.* regressions since fast clears.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes:
dEQP-VK.renderpass.formats.a2b10g10r10_unorm_pack32*
regressions.
Fixes:
f22836dbdd radv: Add CPU color packing for VK_FORMAT_A2B10G10R10_UNORM_PACK32.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Even though the preferred stance is not to fix incorrect applications
via the driver, this prevents some nasty GPU hangs.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This lowers lgkm wait cycles by 30% on VI and normal conditions.
The might be a measurable improvement when CE is disabled (radeon)
or under L2 thrashing.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
So that we can disable u_vbuf for GL core profiles.
This is a v2 of the previous VI-only patch.
It requires SH_MEM_CONFIG.ALIGNMENT_MODE = UNALIGNED on CIK-VI.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If an unsized declared array is not the last in an SSBO
and an implicit size can not be defined on linking time,
the linker should raise an error instead of reaching
an assertion on GL.
This reverts part of commit 3da08e1664
getting back to the behavior of commit 5b2675093e
The original patch was correct for GLES that should produce
a compile-time error but the linker error is still necessary
in desktop GL.
Fixes the following piglit tests:
tests/spec/arb_shader_storage_buffer_object/non_integral_size_array_member.shader_test
tests/spec/arb_shader_storage_buffer_object/unsized_array_member.shader_test
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
This one only keeps allocated memory in the list, and list nodes
in the descriptor sets. Thsi doesn't need messing around with
max_sets, and we get automatic merging of free regions.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
We only use the freed ones after all free space has been used. If
the app only allocates small descriptor sets, we might go over
max_sets before the memory is full.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
Fixes: f4e499ec79
The optimization in unpack_64 is clearly subsumed with the opt_algebraic
optimizations in the previous commit. The pack optimization may not be
quite handled by opt_algebraic but opt_algebraic should get the really
bad cases. Also, it's been broken since it was merged and we've never
noticed so it must not be doing anything.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NIR is a typeless IR and the two opcodes, when considered bitwise, do
exactly the same thing. There's no reason to have two versions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In order to avoid costly fallback recompiles when cache items are
created with an old version of Mesa or for a different gpu on the
same system we want to create directories that look like this:
./{TIMESTAMP}_{LLVM_TIMESTAMP}/{GPU_ID}
Note: The disk cache util will take a single timestamp string, it is
up to the backend to concatenate the llvm string with the mesa string
if applicable.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Here we skip the recreation of uniform storage if we are relinking
after a cache miss. This is improtant because uniform values may
have already been set by the application and we don't want to reset
them.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will allow us to skip certain things when falling back to
a full recompile on a cache miss such as avoiding reinitialising
uniforms.
In this change we use it to avoid reading the program metadata
from the cache and skipping linking during a fallback.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
These may be lowered constant arrays or uniform values that we set before linking
so we need to cache the actual uniform values.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
For now this disables the shader cache when transform feedback is
enabled via the GL API as we don't currently allow for it when
generating the sha for the shader.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
V2: don't store pointers use an enum instead to flag what should be
restored. Also do the work in a helper that we will later use for
the subroutine remap table.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The three additional tables are AttributeBindings, FragDataBindings,
and FragDataIndexBindings.
The first table (AttributeBindings) was identified as missing by
trying to test the shader cache with a program that called
glGetAttribLocation.
Many thanks to Tapani Pälli <tapani.palli@intel.com>, as it was review
of related work that he had done previously that pointed me to the
necessity to also save and restore FragDataBindings and
FragDataIndexBindings.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The scenario is:
glShaderSource
glCompileShader <-- deferred due to cache hit of shader
glShaderSource <-- with new source code
glAttachShader
glLinkProgram <-- no cache hit for program
At this point we need to compile the original source when we
fallback.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The hash key for glsl metadata is a hash of the hashes of each GLSL
source string.
This commit uses the put_key/get_key support in the cache put the SHA-1
hash of the source string for each successfully compiled shader into the
cache. This allows for early, optimistic returns from glCompileShader
(if the identical source string had been successfully compiled in the past),
in the hope that the final, linked shader will be found in the cache.
This is based on the intial patch by Carl.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This uses disk_cache.c to write out a serialization of various
state that's required in order to successfully load and use a
binary written out by a drivers backend, this state is referred to as
"metadata" throughout the implementation.
This initial version is intended to work with all stages beside
compute.
This patch is based on the initial work done by Carl.
V2: extend the file's doxygen comment to cover some of the
design decisions.
V3:
- skip cache for fixed function shaders
- add int64 support
- fix glsl IR program parameter caching/restore and cache the
parameter values which are used by gallium backends.
- use new link status enum
V4:
- add compute program support
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Remove local definition of RADEON_INFO_TILE_CONFIG and use the correct
macro provided by libdrm_radeon RADEON_INFO_TILING_CONFIG.
Latter was present as of libdrm 2.4.22, sirca 2010.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Such that we can remove all the local fall-back definitions and use the
official UABI ones.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The currently used range HEAD..origin/master is far too broad. It looks
for nominations within the already_landed list (branchpoint..HEAD).
Similarly we look for already_landed whiting the [possible] nominations
Rand branchpoint..origin/master.
Improve things by limiting the look ups to the branch point.
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Currently we loop (git log --grep) to check if the fix has landed. We
can simplify and make things faster by storing the already_picked list
and grep ping through it.
Slim down the message while we're here.
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Mention the generic channels (PPA, Corp, other) as well as give a couple
of examples. Even if the latter became out of date the former should a
be good guide.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Print only the information needed. Namely:
*info: the DRI module picked and the vendor/renderer strings
*gears: everything but the "...configuration file..." line(s)
v2: (Eric) Use "2>&1 |" over "|&", properly escape &.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
We had multiple cases in the past where files used only by the
Scons/MinGW/Windows build were missing.
Avoid such instances and add a step to catch them early.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
These are enough for the spir-v generator to handle UConvert
and SConvert operations, and fix the 4 tests in CTS.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds the support at the spirv->nir level for the Int64
cap.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is used in DOOM, so provide the fast clear path for it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes regressions from c505d6d852.
Switching from using gl_shader_program to gl_program for the pipline
objects CurrentProgram array meant we were freeing gl_shader_programs
immediately after glDeleteProgram was called, but the spec states
the program should only get deleted once it is no longer in use.
To work around this we add a new ReferencedPrograms array to track
gl_shader_programs in use.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the buffer has been freed by the kernel under memory pressure, it is
invalid to try and access the backing storage for that buffer in the
future - the backing storage is not recreated automatically. As such we
need to mark the GL object as being freed for unretained buffers and so
recreate the object on next use.
Futhermore from the GL_APPLE_object_purgeable:
"In contrast, by calling ObjectUnpurgeableAPPLE with an <option> of
UNDEFINED_APPLE, the application is indicating that it intends to
recreate the contents of the storage from scratch. Further, the
application is is stating that it would like the GL to do only the
minimal amount of work set PURGEABLE_APPLE to FALSE. If
ObjectUnpurgeableAPPLE is called with the <option> set to
UNDEFINED_APPLE, then ObjectUnpurgeableAPPLE will return the value
UNDEFINED_APPLE."
we must always report GL_UNDEFINED_APPLE when called with
glObjectUnpurgeable(GL_UNDEFINED_APPLE).
Testcase: piglit/object_purgeable-api-*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This partially reverts commit 97217a40f9.
It leaves ES 2.0 support in place per Ian's suggestion, because ES 2.0
is designed to work on hardware like i915.
Chrome only uses the GPU if you have GL >= 2.0, and using i915 (and
prog_execute) actually hurt performance compared with the software
paths.
The --build-id=... ld flag has been present since binutils-2.18,
released 28 Aug 2007.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Provides the ability to read the .note.gnu.build-id section of ELF
binaries, which is inserted by the --build-id=... flag to ld.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
glGetTextureSubImage() and glGetCompressedTextureSubImage() are currently
returning INVALID_OPERATION error when the passed texture argument does not
correspond to an existing texture object. However, the error should be
INVALID_VALUE instead. From OpenGL 4.5 spec PDF, section '8.11. Texture
Queries', page 236:
"An INVALID_VALUE error is generated if texture is not the name of
an existing texture object."
Same wording applies to the compressed version.
The INVALID_OPERATION error is coming from the call to
_mesa_lookup_texture_err(). This patch uses _mesa_lookup_texture() instead
and emits the correct error in the caller.
Fixes: GL45-CTS.get_texture_sub_image.errors_test
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Mesa currently doesn't allow to create 3.1+ compatibility profiles
mainly because various features are unimplemented and bugs can
happen.
However, some buggy apps request a compat profile without using
any old features unimplemented in mesa, and they fail to start.
This option should help some games to run but it's not enough
for all (eg. Dying Light).
v2: - s/force_compat_profile/allow_higher_compat_version
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
vkQueuePresentKHR() takes VkPresentInfoKHR pointer and includes a
pResults fields which must holds the results of all the images
requested to be presented. Currently we're not filling this field.
Also as a side effect we probably want to go through all the images
rather than stopping on the first error.
This commit also makes the QueuePresentKHR() implementation return the
first error encountered.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Commit 8bca8d89ef ("glx/glvnd: Fix dispatch function names and indices")
fixed the sorting of the array initializers in g_glxglvnddispatchfuncs.c
because FindGLXFunction's binary search needs these to be sorted
alphabetically.
That commit also mostly fixed the sorting of the DI_foo defines in
g_glxglvnddispatchindices.h, which is what actually matters as the
arrays are initialized using "[DI_foo] = glXfoo," but a small error
crept in which at least causes glXGetVisualFromFBConfigSGIX to not
resolve, breaking games such as "The Binding of Isaac: Rebirth" and
"Crypt of the NecroDancer" from Steam not working and possible causes
other problems too.
This commit fixes the last of the sorting errors, fixing these mentioned
games not working.
Fixes: 8bca8d89ef ("glx/glvnd: Fix dispatch function names and indices")
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Cc: Adam Jackson <ajax@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This is possibly a bad idea, I might have to consider a better one.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes a regression with the remove non-draw cmd buffers in
queries.
Fixes: 8b47b97215 radv: detect command buffers that do no work and drop them (v2)
Signed-off-by: Dave Airlie <airlied@redhat.com>
For GS input arrays, we may turn a packed_type of ivec4 into an
array of ivec4s. We still want flat qualification.
Found by inspection. Not known to help anything.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Unfortunately, this doesn't substantially improve the performance of any
known apps. With Dota 2 on my Sky Lake gt4, it seems help by somewhere
between 0% and 1% but there's enough noise that it's hard to get a clear
picture.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
It's a bit hard to measure because it almost gets lost in the noise,
but this seemed to help Dota 2 by a percent or two on my Broadwell
GT3e desktop.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This helps Dota 2 on Broadwell by 8-9%. I also hacked up the driver and
used the Sascha "shadowmapping" demo to get some results. Setting
uses_kill to true dropped the framerate on the demo by 25-30%. Enabling
the PMA fix brought it back up to around 90% of the original framerate.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Vulkan doesn't have a stencilWriteEnable bit like it does for depth.
Instead, you have a stencil mask. Since the stencil mask is handled as
dynamic state, we have to handle it later during command buffer
construction. This, combined with a later commit, seems to help Dota2
on my Broadwell GT3e desktop by a couple percent because it allows the
hardware to move the depth and stencil writes to early in more cases.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This changes the way anv_entrypoints_gen.py works from generating a
table containing every single entrypoint in the XML to just the ones
that we actually need. There's no reason for us to burn entrypoint
table space on a bunch of NV extensions we never plan to implement.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Even though we supported both coherent and non-coherent memory types, we
effectively forced apps to use the coherent types by accident. Found by
inspection, only compile tested.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
I think this only affects radeonsi - VI, because all other drivers using
u_vbuf probably don't support GL_DOUBLE, so they won't be affected by this.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Notes:
- make sure the default size is large enough to handle all state trackers
- pipe wrappers don't receive transfer calls from stream_uploader, because
pipe_context::stream_uploader points directly to the underlying driver's
stream_uploader (to keep it simple for now)
v2: add error handling to nv50, nvc0, noop
v3: set const_uploader
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)
Tested-by: Charmaine Lee <charmainel@vmware.com>
For lower memory usage and more efficient updates of the buffer residency
list. (e.g. if drivers keep seeing the same buffer for many consecutive
"add" calls, the calls can be turned into no-ops trivially)
v2: add const_uploader, add documentation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Charmaine Lee <charmainel@vmware.com>
This ports the remains of the workarounds from radeonsi for
the non-TESS cases. It should provide equivalent workarounds
for hawaii and bonarie.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This allows shaders to write to storage images declared with unknown
format if they are decorated with NonReadable ("writeonly" in GLSL).
Previously an image view would always use a lowered format for its
surface state, however when a shader declares a write-only image, we
should use the real format. Since we don't know at view creation time
whether it will be used with only write-only images in shaders, create
two surface states using both the original format and the lowered
format. When emitting the binding table, choose between the states
based on whether the image is declared write-only in the shader.
Tested on both Sascha Willems' computeshader sample (with the original
shaders and ones modified to declare images writeonly and omit their
format qualifiers) and on our own shaders for which we need support
for this.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Allow that capability if the driver indicates that it is supported, and
flag whether images are read-only/write-only in the nir_variable (based
on the NonReadable and NonWritable decorations), which drivers may need
to implement this.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
As soon as we support shaderStorageImageWriteWithoutFormat we can see
write-only images (sampled == 2) that don't have a format specified.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This makes our driver robust to changes in spirv_to_nir which would set
this flag on the variable. Right now, our driver relies on spirv_to_nir
*not* setting var->data.image.write_only for correctness. Any patch
which implements the shaderStorageImageWriteWithoutFormat will need to
effectively revert this commit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This adds two columns to the format table as well as two helpers for
determining whether or not a given format is supported for typed reads
and writes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Passes the newly added piglit test for this extension on i965.
V2: Fix comments by Ilia.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This uses the common fs interp code to use the new
llvm intrinsics so llvm can drop the old ones.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This puts the common gfx state for the device into an
indirect buffer, and just calls out to it, on CIK and above.
This is taken from what radeonsi does.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just prep work for the following patch to use
a common gfx init indirect buffer.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If a buffer is just full of flushes we flush things on command
buffer submission, so don't bother submitting these.
This will reduce some CPU overhead on dota2, which submits a fair
few command streams that don't end up drawing anything.
v2: reorganise loop to count first then malloc,
rename some vars (Bas)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
BLORP is now smart enough to handle any swizzle (even those that contain
ZERO or ONE) in a reasonable manner. Just let BLORP handle it. This
fixes the following Vulkan CTS tests on Haswell:
- dEQP-VK.api.image_clearing.clear_color_image.1d_b4g4r4a4_unorm_pack16
- dEQP-VK.api.image_clearing.clear_color_image.2d_b4g4r4a4_unorm_pack16
- dEQP-VK.api.image_clearing.clear_color_image.3d_b4g4r4a4_unorm_pack16
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
It's trivial to swizzle clear colors on the CPU, easily deals with the
hardware restrictions for render target swizzles, and makes swizzled
clears work on all hardware as opposed to just HSW+.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Released back in 2007 so it should not be an issue for anyone building
from git.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
DRI2DriverPrimeShift was added in dri2proto-2.8, which we now require
as of the previous commit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
DRI2DriverPrimeShift was added in dri2proto-2.8, which we now require as
of the previous commit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Noticed while skimming through, although admittedly there's many other
dependencies that are not tracked by the scons build.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
dri2proto 2.8 was released 4+ years ago, so it must be of no surprise
for anyone building mesa from git.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The two symbols referenced were introduced with v2.2 and 2.3 of
the dri2proto package and we require dri2proto >= 2.6.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Replace with AS_HELP_STRING and AC_MSG_ERROR respectively, as spotted by
autoupdate.
Note that the suggested AC_CANONICAL_SYSTEM > AC_CANONICAL_TARGET change
is not addressed here since that requires very extensive testing.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
I forgot to error check stat() and also I wasn't using the subdir in
is_two_character_sub_directory().
Fixes: d7b3707c61 "util/disk_cache: use stat() to check if entry is a directory"
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Drop all -m*, -W*, -O*, -g* and -f* flags, with the exception of
-fno-rtti, which must be used if it's part of the llvm-config --cxxflags
output. We don't want LLVM to dictate the flags we use, and it can even
cause build failures, e.g. if LLVM and Mesa are built with different
compilers.
While we're at it, eat any whitespace preceding dropped flags as well.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
TCS and TES inputs without an array size are implicitly sized to
gl_MaxPatchVertices. But TCS outputs are apparently not:
"If no size is specified, it will be taken from the output patch size
(gl_VerticesOut) declared in the shader."
Fixes dEQP-GLES31.functional.program_interface_query.program_output.
array_size.separable_tess_ctrl.var.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
We were already unwrapping types when the producer was a non-array
stage and the consumer was an arrayed-stage...but we ought to unwrap
both ends for TCS -> TES matching too.
This will allow us to drop the "resize to gl_MaxPatchVertices" check
shortly, which breaks some things.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
OpenGL ES actually has spec text to prohibit this. It's just OpenGL
that's confusing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
ES 3.x requires both TCS and TES to be present. We already checked
the TCS && !TES case above, so we just have to check !TCS && TES here.
Note that this is allowed in OpenGL, just not ES.
This fixes a subcase of:
dEQP-GLES31.functional.debug.negative_coverage.*.tessellation.single_tessellation_stage
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Now that we have OES_tessellation_shader, the same situation can occur
in ES too, not just GL core profile.
Having a TCS but no TES may confuse drivers - i965 crashes, for example.
This prevents regressions in
ES31-CTS.core.tessellation_shader.single.xfb_captures_data_from_correct_stage
with some SSO pipeline validation changes I'm making.
v2: Add an ES spec citation (suggested by Alejandro)
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The blob uses these, and it fixes a bunch of dEQP stencil sampling tests
involving border colors. Probably the Z-based samplers work somehow
differently wrt border colors when using the stencil swizzle.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The struct have different size, so the arrays have different stride.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
- Use the same instruction area on GC3000 as the Vivante driver.
This allows the same number of instructions on GC3000 as GC2000
instead of half.
- Makes sure that the "PE to FE" stall before updating the shader code
or constants is hit (which is conditional on vs_offset > 0x4000). This
is necessary on GC3000 too, it increases stability.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Update from etnaviv repository rnndb. This adds some newly
discovered state for GC3000 (and some GC2000) features.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Just noticed we do a fair bit of unneeded searching here.
Since we know that the buffers in a CS are unique already,
the first time we get any buffers, we can just memcpy those into
place, and when we are searching for subsequent CSes, we only
have to search up until where the previous unique buffers were.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
On SM35 there does not appear to be a way to emit a ATOM.EXCH with a
null destination. This should be functionally equivalent to a plain
store however, so just do that.
Fixes GL45-CTS.compute_shader.atomic-case2 on SM35.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The former logic just plain didn't work at all. We need to write the
subsequent dword to the next buffer location.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
We have logic to short-circuit such retrievals to zero. However "zero"
was an immediate, and some logic expected to get registers (to later be
propagated). Fix this by using loadImm.
Fixes GL45-CTS.gpu_shader5.images_array_indexing
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
From GLSL ES 3.10 spec, section 4.1.9 "Arrays":
"If an array is declared as the last member of a shader storage block
and the size is not specified at compile-time, it is sized at run-time.
In all other cases, arrays are sized only at compile-time."
In desktop GLSL it is allowed to have unsized-arrays that are
not last, as long as we can determine that they are implicitly
sized, which is detected at link-time.
With this patch Mesa reports a compilation error as glslang does with
the following shader:
buffer SSBO { vec4 data[]; vec4 moreData;};
void main (void)
{
}
Fixes:
dEQP-GLES31.functional.debug.negative_coverage.log.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.callbacks.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.get_error.shader.compile_compute_shader
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gallium's blitter expects that it can set the sample mask even when the
rasterizer doesn't have the flag on.
Between this and the previous test, 10 new ext_framebuffer_multisample
tests start passing.
gallium's quad-based blitter for copying MSAA depth textures expects to be
able to do 4 passes updating a sample at a time using glSampleMask, and
there's no color buffer bound when it's doing that.
In the hardware we only get to declare 8 vertex elements (GLES2's
minimum), so we should be exposing that number here. Fixes an assertion
failure in piglit texrect-many, at the expense of various GL 2.0-ish
minmax tests now complaining that our count is too low.
The kernel will reject our shader if we emit one here, and having 4, 8, or
12 as the top end of our UBO clamp rare is enough that it's not worth
making the kernel let us.
Fixes piglit fs-const-array-of-struct and
fs-const-array-of-struct-of-array since recent GLSL linking changes made
us get this as an indirect load of a uniform, instead of a tempoary.
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Currently we have extra (somewhat questionable) modularity, such that
one could build some parts with LLVM while others w/o.
That is extremely fragile, error prone and requires quite noticable
amount of code throughout.
Thus lets deprecate the gallium toggle in faviour of the generic one.
The former will throw a warning when set, and it will be overwritten by
the latter. This will allow gradual transition w/o breaking people's
scripts.
v2: Rebase, document in release notes.
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de> (v1)
The extra function brings no added benefit as of earlier commit which
made llvm_require_version (as called by radeon_llvm_check) require LLVM
(--enable-gallium-llvm).
Fixes: 5f966a96af7 "configure.ac: Mandate --enable-gallium-llvm when
checking LLVM version"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
Earlier refactoring commits changed from one, dare I say it, broken
behaviour to another. Namely:
Before, as you explicitly --enable-gallium-llvm your selection was
ignored when llvm-config was not present/detected.
Today, the "auto" heuristics enables gallium llvm regardless if you have
llvm/llvm-config available or not.
Rework the auto-detection to attribute for llvm's presence.
v2: Set enable_gallium_llvm=no when LLVM is not found.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
Reported-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Already implicitly handled throughout, but keep it clear and disable
gallium-llvm. This change should be a no-op.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
Earlier refactoring commits started setting the above regardless if LLVM
is used or not. Move them to the respective section to restore the
original functionality.
Since we require the preprocessor flags (includes in particular) for the
header version parsing keep those as-is. They are not used outside of
configure.ac thus should not cause any side-effects.
As-is adding the C/CXXFLAGS can lead to build issues on when
cross-compiling.
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
Although it works, it's not the correct thing to do.
v2: Rebase
v3: Rebase
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de> (v1)
LLVM_BINDIR is completely unused while others such as LLVM_LIBDIR are
used only internally. In the latter case there's no need to AC_SUBST it.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
Set FOUND_LLVM only when LLVM is present (checking for exact version/etc
is deferred) and use enable-gallium-llvm to indicate the global LLVM
status.
Renaming the latter is not appropriate for stable patches, so we'll
address it with a later commit.
Loosely based on work by Tobias.
v2: Check FOUND_LLVM if enable_gallium_llvm is set.
Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
... to where it's applicable.
Since we effectively made --enable-gallium-llvm mean --enable-llvm with
earlier commits, we need to move the requirement to guard the compnents
added for the LLVM draw.
Otherwise we'll error (as below) when building RADV w/o gallium drivers.
configure: error: --enable-gallium-llvm is required when building radv
v2: Don't remove but move the dependency (Tobias).
Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
With this change we effectively require --enable-gallium-llvm when
building RADV. This should be perfectly safe since the gallium radeonsi
driver already explicitly requires it.
The "gallium" part in --enable-gallium-llvm is about to be removed soon
(not in stable), but until then make sure that things can build.
To reflect the requirement (as opposed to check previously) we rename
llvm_check_version_for to llvm_require_version
Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
Drop the gallium prefix since we're about it use it throughout the
configure.
Note we do want to check for enable_gallium_llvm check since (as
explicitly requested) the toggle should mean --enable-llvm. Latter of
which to be resolved with later patches.
Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
This is actually not needed because the version is checked later.
Around line 2380
if test "x$enable_gallium_llvm" == "xyes"; then
llvm_check_version_for $LLVM_REQUIRED_GALLIUM "gallium"
llvm_add_default_components "gallium"
fi
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Cc: Tobias Droste <tdroste@gmx.de>
Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
v2: [Emil Velikov: rebase/respin series order]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Otherwise we would fail with "implicit declaration of function" geteuid
and getenv respectively.
To trigger (re)move the libdrm.pc file and use the following:
$ ./autogen.sh --disable-egl --disable-gbm --disable-dri \
--with-dri-drivers=swrast --with-gallium-drivers=swrast
$ make
Cc: Vinson Lee <vlee@freedesktop.org>
Fixes: 3f462050c ("loader: Add an environment variable to override driver name choice.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99701
v2: [Emil: handle stdlib.h add commit message]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Not much point in the const qualifier since we provide a copy to the
user. Resolves the following -Wignored-qualifiers warning.
src/intel/blorp/blorp_blit.c:1857:8: warning: 'const' type qualifier on
return type has no effect [-Wignored-qualifiers]
v2: keep const qualifier of local variable.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We want cached GTT for all non-persistent read mappings.
Set level = 0 on purpose.
Use dma_copy, because resource_copy_region causes a failure in the PBO
read of piglit/getteximage-luminance.
If Rocket League used the READ flag, it should get cached GTT.
v2: mask out UNSYNCHRONIZED
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
UMR is our new debugging tool. It must have +s set for Mesa to use it
without root privileges:
sudo chmod +s .../umr
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The instruction has an associated label when Instruction.Label == 1,
as can be seen in ureg_emit_label() or tgsi_build_full_instruction().
This fixes dump generating extra :0 labels on conditionals, and virgl
parsing more than the expected tokens and eventually reaching "Illegal
command buffer" (when parsing more than a safety margin of 10 we
currently have).
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
It's legal to submit just semaphores with no command streams,
this patch fixes this case by emitting the empty cs, it also
handles the fence emission for this case better.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We just increased the max UBO, so we should also increase the clamp that
we do for robustness. Similarly, as we're including the fileIndex in the
new indirect value, we should reset fileIndex to 0 so that it is not
added in a second time.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Many many many compute shaders only define a 1- or 2-dimensional block,
but then continue to use system values that take the full 3d into
account (like gl_LocalInvocationIndex, etc). So for the special case
that a dimension is exactly 1, we know that the thread id along that
axis will always be 0, so return it as such and allow constant folding
to fix things up.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Kepler and up unfortunately only support up to 8 constbufs. We work
around this by loading from constbufs as if they were storage buffers.
However we were not consistently applying limits to loads from these
buffers. Make sure to do the same thing we do for storage buffers.
Fixes GL45-CTS.robust_buffer_access_behavior.uniform_buffer
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Apparently GL 4.5 requires 14 of these (there's a "*" in the spec, but
it's unclear what it refers to). We need to expose an extra binding
point for the "program parameters", which means this must be 15. Remove
the last vestige of the "use c14 for immediates" idea.
Fixes GL45-CTS.shading_language_420pack.binding_uniform_block_array
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
There's all kinds of logic that doesn't like there being holes in defs
or srcs lists. Avoid them. This also fixes the sched logic for maxwell.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Unfortunately there is no SHF.L/SHF.R instruction pre-SM35. So we have
to do a bit more work to get the job done.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
A few thoughts:
- Some of that LegalizeSSA logic should really live much earlier and be
subject to the likes of DCE and other useful passes
- Some of the "lowering" done in from_tgsi should be done later so that
proper optimization might be done.
However this all works and the above can be improved upon later.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Hardware does not support 64-bit integers MAD and MUL operations, so we need
to transform them in 32-bit operations.
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
We were never emitting a .X flag for consuming condition code on SET,
and weren't emitting a signed type for SLCT comparison. Discovered while
working on int64 logic.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
These operations allow you to compute min/max on arbitrary-width
integers, 32 bits at a time.
Note that the low/med ops implicitly set the condition code, and the
med/high ops implicitly consume it.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Nouveau does not currently have logic to implement this as a library
function. Even though such a library could be written, there's no big
advantage to do it that way for now given that int64 is a very uncommon
use-case. Allow a driver to expose INT64 without supporting division and
modulo operations.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Previously if you used MESA_GL_VERSION_OVERRIDE=3.3COMPAT, Mesa exposed
an OpenGL 3.3 compatibility profile context (with various unimplemented
features and bugs), but still refused to compile shaders with
#version 330 compatibility
This patch simply adds a small bit of plumbing to let that through.
Of course the same caveats apply: compatibility profile is still not
supported (and will not be supported), so there are no guarantees that
anything will work.
Tested-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Most of them already redirected to https anyway, so we might as well
avoid the redirection and the security implications by linking directly
to the right protocol.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Brian Paul <brianp@vmware.com>
This will be used to remove cache items created with old versions
of Mesa or other invalid cache items from the cache.
V2: rename stub function (cache_* funtions were renamed disk_cache_*)
in master.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
For the on-disk shader cache we want to be able to differentiate
between a program that was linked and one that was loaded from cache.
V2:
- don't return the new enum directly to the application when queried,
instead return GL_TRUE or GL_FALSE as required. Fixes google-chrome
corruptions when using cache.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
For allowing fast color clears in the main render targets of dota2.
[airlied: fix clear_vals[1] as suggested by Andres.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Removed temporary scafolding in PA, widended the PA_STATE interface
for SIMD16, and implemented PA_STATE_CUT and PA_TESS for SIMD16.
PA_STATE_CUT and PA_TESS now work in SIMD16.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Make all SimdVectors in LLVM represented as simdscalar[4] rather
than a struct.
Fixes issues with promotion of values from i32 to i64 to match
register width.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
SIMD16 Primitive Assembly (PA) only supports TriList and RectList.
CUT_AWARE_PA, TESS, GS, and SO disabled in the SIMD16 front end.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
File names were wrong, file formats were wrong, bunzip command was
wrong...
I also removed all but the simplest example; people who use pipes already
know how to untar, so let's simplify and remove potential confusion for
non-tech-savvy users.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Release candidates haven't been in a 'beta' subdir in a long time, so let's
replace the dead link with an explanation instead.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to initialize dcc like we do in the subpass path.
v2: fix initial/final layouts
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Not wired up (not referenced in any SUBDIR), leading to `make distcheck'
failure.
Fixes: d77fa310ed "ilo: EOL drop unmaintained gallium drv from buildsys"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Properly annotate <li> and keep the note analogous to all the previous
ones - OpenVG, st/egl, etc.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
There were two "libglvnd configuration" section in the squashed commit
that added libglvnd support, while only one in the original libglvnd
branch. A following commit moves one of them downwards. Now remove the
upper "older" one and move GL_LIB name decision downwards after the new
libglvnd configuration section.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
The instance offers 2 cores, so use them to speed things up.
v2: Set MAKEFLAGS instead [Eric]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Note: we need the explicit --enable-freedreno for libdrm since the
latter is 'smart' and disables it if building on !arm platforms.
The radeonsi and swr are explicitly left out since they require
'too-recent' LLVM - 3.6
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The current regex was tracking only the libdrm_foo packages, while with
recent changed we bumped only (and rightfully so) libdrm.
Fix the regex to track any libdrm package.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
It's unlikely that any of the additions come as a suprise to anyone
i915, nouveau, radeon, r200. Regardless, state clearly what's
available.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
As per the spec -
"The functions memoryBarrierShared() and groupMemoryBarrier() are
available only in compute shaders; the other functions are available
in all shader types."
Conform to this by adding another delegate to check for compute
shader support instead of only whether the current stage is compute
This allows some fragment shaders in Dirt Rally to compile
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When parsing texture instruction, it doesn't stop if the
'cur' is ',', the loop variable 'i' will also be increased
and be used to index the 'inst.TexOffsets' array. This can lead
an oob access issue. This patch avoid this.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Li Qiang <liq3ea@gmail.com>
This reverts commit 0bac2551e4.
Now that we position the guardband correctly (applying translations
in addition to scaling) and made it as large (or larger) than the
render target, this shouldn't be necessary.
Now we leave guardband clipping enabled 100% of the time, like the
Windows driver does.
Fixes GL45-CTS.gtf21.GL2FixedTests.clip.clip. It tries to draw a
16384x64 rectangle, and it appears that some kind of numerical
imprecisions in the clipper result in some edge pixels going missing.
The Windows driver passes this test because of guardband clipping.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously we disabled the guardband when the viewport was smaller than
the framebuffer on Gen6-7.5, to prevent portions of primitives from
being draw outside of the viewport. On Gen8+, we relied on the viewport
extents test to effectively scissor this away for us.
We can simply always enable scissoring instead. We already include the
viewport in the scissor rectangle, so this will effectively do the
viewport extents test for us. (The only difference is that the scissor
rectangle doesn't support sub-pixel values. I think that's okay.)
Given that the viewport extents test is essentially a second scissor,
and is enabled for basically all 3D drawing on Gen8+, it stands to
reason that scissoring is cheap. Enabling the guardband reduces the
cost of clipping, which is expensive.
The Windows driver appears to never disable guardband clipping, and
appears to use scissoring in this case. I don't know if they leave
it on universally though.
This fixes misrendering in Blender, where the "floor plane" grid lines
started rendering at wrong angles after I disabled XY clipping of line
primitives. Enabling the guardband seems to solve the issue.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99339
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(Patch co-authored by Jason and Ken.)
We scaled the guardband based on the viewport size, but failed to
take into account the translation portion of the viewport transform.
This meant the guardband was always centered around the origin.
We want it to be centered around the screen-space drawing area,
which is the intersection of the viewport and the render target.
At best, getting this wrong would reduce the guardband's effectiveness
in some cases. At worst, it might break things - objects outside of the
guardband are trivially rejected, so getting the guardband in the wrong
place and leaving guardband clipping enabled could cause problems.
v2: drop clamping of positive maximums.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The next patch will make the guardband calculation dependent on the
transformation matrix. Instead of computing it in both atoms, just
combine them into a single atom.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
CMASK alignment can be greater than image data alignment, so pass
it to the app so that it knows what alignment to backing memory
should have.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Now that there's MESA_LOADER_DRIVER_OVERRIDE for choosing the driver name
we load, we don't need this any more.
v2: Get the junk out of pipe_loader_drm.c, too.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v2)
My vc4 simulator has been implemented so far by having an entrypoint
claiming to be i965, which was a bit gross. The simulator would be a lot
less special if we entered through the vc4 entrypoint like normal, so add
a loader environment variable to allow the i965 fd to probe as vc4.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
All the replicated prototypes/function bodies obfuscated the interesting
logic of the file: the mapping from driver enable macros to entrypoints we
expose, and the way that the swrast entrypoints are special compared to
the DRM entrypoints.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The GLX specification says about glXDestroyPixmap:
"The storage for the GLX pixmap will be freed when it is not current
to any client."
We're not really following this language to the letter: some of the storage
is freed immediately (in particular, the dri3_drawable, which contains both
GLXDRIdrawable and loader_dri3_drawable). So we NULL out the pointers to
that freed storage; the previous patches added the corresponding NULL-pointer
checks.
This fixes memory corruption in piglit
./bin/glx-visuals-depth/stencil -pixmap -auto
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The GLX specification says about glXDestroyPixmap:
"The storage for the GLX pixmap will be freed when it is not current
to any client."
So arguably, functions like glXSwapIntervalMESA can be called after
glXDestroyPixmap has been called for the currently bound GLXPixmap.
In that case, the GLXDRIDrawable no longer exists, and so we just skip
those calls.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
With a subsequent patch, we might see NULL loaderPrivates, e.g. when
a DRIdrawable is flushed whose corresponding GLXDRIdrawable was destroyed.
This resulted in a crash, since the loader vs. DRI3 drawable structures
have a non-zero offset.
Fixes glx-visuals-{depth,stencil} -pixmap
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Set 3DSTATE_WM/ThreadDispatchEnable bit on/off based on the same
conditions as used in the GL version.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before 4.5, the default framebuffer was not allowed for
GetFramebufferParameter, so it should return INVALID_OPERATION for any
call using the default framebuffer.
4.5 included new pnames, and some of them are allowed for the default
framebuffer. For the rest, INVALID_OPERATION. From OpenGL 4.5 spec,
section 9.2.3 "Framebuffer Object Queries:
"An INVALID_OPERATION error is generated by GetFramebufferParameteriv
if the default framebuffer is bound to target and pname is not one
of the accepted values from table 23.73, other than
SAMPLE_POSITION."
Fixes:
GL45-CTS.direct_state_access.framebuffers_get_parameter_errors
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
4.5 added new pnames allowed for GetFramebufferParameter, and
GetNamedFramebufferParameter.
From OpenGL 4.5 spec, section 9.2.3 "Framebuffer Object Queries" (quoting
the paragraph with only the new pnames, not all the supported):
"pname may also be one of DOUBLEBUFFER,
IMPLEMENTATION_COLOR_READ_FORMAT, IMPLEMENTATION_COLOR_READ_TYPE,
SAMPLES, SAMPLE_BUFFERS, or STEREO, indicating the corresponding
framebuffer-dependent state from table 23.73. Values of
framebuffer-dependent state are identical to those that would be
obtained were the framebuffer object bound and queried using the
simple state queries in that table. These values may be queried
from either a framebuffer object or a default framebuffer."
Fixes:
GL45-CTS.direct_state_access.framebuffers_get_parameters
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Current implementation returns the value for the currently bound read
framebuffer. GetNamedFramebufferParameteriv allows to get it for any
given framebuffer. GetFramebufferParameteriv would be also interested
on that method
It was refactored by allowing to pass a given framebuffer. If NULL is
passed, it used the currently bound framebuffer.
It also adds a call to _mesa_update_state. When used only by
GetIntegerv, this one was called as part of the extra checks defined
at get_hash. But now that the method is used by more methods, and the
update is needed, it makes sense (and it is safer) just calling it on
the method itself, instead of rely on the caller.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
If we have an indirect index here we need to scale it by attribute slots
e.g. is this is vec2[256] then we get an indir_index in the 0.255 range
but the vec2 are aligned inside vec4 slots. So scale the indir index,
then extract the channels.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Compressing a render target and decompressing it in the same
single-subpass render pass may waste bandwidth. While this may be
beneficial in some circumstances, it does not help in all. Reclaims
about 1.95% FPS for Dota 2 on some configurations.
v2 (Jason Ekstrand):
- Provide a more thorough comment
- Enable CCS_D for input attachments
v3 (Jason Ekstrand):
- Provide performance numbers
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
dEQP-EGL.functional.create_context.no_config tries to create a context
with no config, then immediately destroys it. The drawbuffer is never
set up, so we can't dereference it asking if it's double buffered, or
we'll crash on a null pointer dereference.
Just bail early.
Applications using EGL_KHR_no_config_context could hit this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
cs can be NULL when it comes from r600_buffer_map_sync_with_rings()
to avoid doing the same checks. It was checked for write mappings
but not for read mappings.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Earlier changes introduced is_ycrcb flag which checks the component
order of u and v components. Condition for setting the flag was
incorrect, with ycrcb we are supposed to have cr before cb.
This patch (together with a fix in our gralloc) fixes corrupted
rendering from 'test-opengl-gl2_yuvtex' native test and corrupted
gallery thumbnail in application switcher on Android-IA.
Fixes: 51727b1cf5
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
The intent of the libdrm_$driver version limits has always been to not
burden the "other" drivers with updating their libdrm unless really
necessary. Unfortunately the configure script erroneously only checked
the driver-specific bit and not the generic bit of libdrm as well. Fix
this.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This fixes
GL45-CTS.tessellation_shader.tessellation_shader_tessellation.max_in_out_attributes
on nouveau. We only support 30 patch varyings (as 2 vec4 slots end up
being used for tess level settings), but were getting 32 exposed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
OpenGL 4.5 spec, section "8.11.4 Texture Image Queries", page 233 of
the PDF states:
"An INVALID_OPERATION error is generated if texture is the name of a buffer
or multisample texture."
This is currently not being checked and e.g a multisample texture image can
be passed down to the driver hook. On i965, it is crashing the driver with an
assertion:
intel_mipmap_tree.c:3125: intel_miptree_map: Assertion `mt->num_samples <= 1' failed.
v2: (Ilia Mirkin) Move the check from gettextimage_error_check() to
GetTextureSubImage() and use the texObj target.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Although this might come from somewhere else require it explicitly.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I enabled CCS for storage images in the Vulkan driver and ran it through
the CTS. It didn't result in any hangs but it demonstrated that the data
port cannot handle CCS.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Nothing uses this yet but it serves as a nice bit of documentation
that's relatively easy to find.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The term "lossless compression" could potentially mean multisample
color compression, single-sample color compression or HiZ because they
are all lossless. The term CCS_E, however, has a very precise meaning;
in ISL and is only used to refer to single-sample color compression.
It's also much shorter which is nice.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Commit 968ffd6c86 stored the last subpass
index of all the attachments but that of the depth-stencil attachment.
This could cause depth buffers used in multiple subpasses not to be in
the requested final layout. Fix this error.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Make the cap consistent with PIPE_CAP_INT64.
Aside from the hypothetical case of using draw for vertex shaders (and
actually caring about doubles...), every implementation supports doubles
either nowhere or everywhere.
Also, st/mesa didn't even check the cap correctly in all supported
shader stages.
While at it, add a missing LLVM version check for 64-bit integers in
radeonsi. This is conservative: judging by the log, LLVM 3.8 might be
sufficient, but there are probably bugs that have been fixed since then.
v2: fix clover (Marek)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Since we already have the functionality in place and games
like Game of Thrones seem to depend on this extension, I
think it makes sense to enable it by making it part of
the extension string even though it's still a draft:
https://www.khronos.org/registry/gles/extensions/EXT/EXT_compressed_ETC1_RGB8_sub_texture.txt
Note: OES_compressed_ETC1_RGB8_sub_texture seems to be listed
in gl2ext.h, but there's no documentation for it in the KHR
registry
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Up to now on Gen8+ we only allocated a vertex element for
gl_InstanceIndex or gl_VertexIndex when a vertex shader uses
gl_BaseInstanceARB or gl_BaseVertexARB. This is because we would
configure the VF_SGVS packet to make the VF unit write the
gl_InstanceIndex & gl_VertexIndex values right behind the values
computed from the vertex buffers.
In the next commit we will also write the gl_DrawIDARB value. Our
backend expects to pull the gl_DrawIDARB value from the element
following the element containing gl_InstanceIndex, gl_VertexIndex,
gl_BaseInstanceARB and gl_BaseVertexARB (see
vec4_vs_visitor::setup_attributes). Therefore we need to allocate an
element for the SGVS elements as long as at least one of the SGVS
element is read by the shader. Otherwise our shader will use a
gl_DrawIDARB value pulled from the URB one element too far (most
likely garbage).
v2: Fix my english (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For RGB formats in Vulkan, we use the corresponding RGBA format with a
swizzle of RGB1. While this swizzle is exactly what we want for
texturing, it's not allowed for rendering according to the docs. While
we haven't been getting hangs or anything, we should probably obey the
docs. This commit just sanitizes all render swizzles so that the alpha
channel maps to ALPHA.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The CTS tests at least are using this, and we were totally
ignoring it.
This hopefully fixes the bouncing multisample CTS tests.
v2: get family mask in ignored case from command buffer.
v3: only change things in one place, use logic from Bas.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Otherwise we were writing these as 4 components, and things went bad.
Fixes (the remaining):
dEQP-VK.clipping.user_defined.*.vert_geom.*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes clip distance fetches as they are single item loads
with a const_index like float[1].
Fixes:
dEQP-VK.clipping.user_defined.*.vert_geom.[0-6]
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This code should be used in radv, so move it to a shared location
in advance of doing that.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the past I've gotten this function confused with the one in
ir_to_mesa.cpp of the same name. Now that the affected flag setting
has move into a helper it makes sense just to inline this remaining
code.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Add assert checking that num_sources is never larger than 3.
This prevents Coverity from concluding that the unhandled
cases of num_sources not being 0-3 are relevant.
Coverity-Id: 1399480-1399489
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
fixes to issues spotted by Emil Velikov:
- set ANV_TIMESTAMP corretly
- fix typo with VULKAN_GEM_FILES
v2: update to use Makefile.sources under vulkan
instead of having own
v3: update to changes to generate from vk.xml
(commit c7fc310)
v4: remove 'hw' relative path
cleanups, remove unnecessary cruft
review from Emil Velikov:
- move to vulkan folder
- remove timestamp gen, no longer necessary
- more cleanups
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
All the other calls to retrieve the attachment have been covered except
this one - return the proper error for attachment points that are valid
enums but out of bound for the driver.
Fixes GL45-CTS.geometry_shader.layered_fbo.fb_texture_invalid_attachment
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It is not clear from the docs exactly how pipelined STATE_BASE_ADDRESS
actually is. We know from experimentation that we need to flush the
render cache prior to emitting STATE_BASE_ADDRESS and invalidate the
texture cache afterwards. The only thing the PRM says is that, on gen8+
we're supposed to invalidate the state cache after STATE_BASE_ADDRESS
but experimentation has indicated that doing so does nothing whatsoever.
Since we don't really know, let's do just a bit more flushing in the
hopes that this won't be a problem again. In particular:
1) Do a CS stall before we emit STATE_BASE_ADDRESS since we don't
really know whether or not it's pipelined.
2) Do a data cache flush in case what runs before STATE_BASE_ADDRESS
is a compute shader.
3) Invalidate the state and constant caches after STATE_BASE_ADDRESS
because the state may be getting cached there (we don't really know).
Reported-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
We had no good reason for *not* doing this on gen7 before but we didn't
know it was needed. Recently, when trying update to Vulkan CTS version
1.0.2 in our CI system, Mark discovered GPU hangs on Haswell that appear
to be STATE_BASE_ADDRESS related. This commit fixes them.
Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
No f16 support as I'm not quite sure about alignment yet.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The addition of Neon assembly breaks on arm64 builds because the assembly
syntax is different. For now, restrict Neon to ARMv7 builds.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
clang throws an error on "%r2" and similar. I couldn't find any
documentation on what "%r?" is supposed to mean and I've never seen any
use like that as far as I remember. The parameter is supposed to be
cpu_stride and just %2/%3 should be sufficient.
There's no need for trailing ";" either, so remove those, too.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This prevents LLVM from using sext instructions for local memory offsets
and allows the backend to fold immediate offsets into the instruction.
This also prevents some incorrect code generation for ptrtoint and
inttoptr instructions.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
See "glsl: Rewrite atan2 implementation to fix accuracy and handling
of zero/infinity." for the rationale, but note that the instruction
count benefit discussed there is somewhat less important for the SPIRV
implementation, because the current code already emitted no control
flow instructions -- Still this saves us one hardware instruction per
scalar component on Intel SKL hardware.
Fixes the following Vulkan CTS tests on Intel hardware:
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.scalar
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec2
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec3
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec4
dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec2
dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec4
Note that most of the test-cases above expect IEEE-compliant handling
of atan2(±∞, ±∞), which this patch doesn't explicitly handle, so
except for the last two the test-cases above weren't expected to pass
yet. The reason they do is that the i965 back-end implementation of
the NIR fmin and fmax instructions is not quite GLSL-compliant (it
complies with IEEE 754 recommendations though), because fmin/fmax of a
NaN and a non-NaN argument currently always return the non-NaN
argument, which causes atan() to flush NaN to one and return the
expected value. The front-end should probably not be relying on this
behavior for correctness though because other back-ends are likely to
behave differently -- A follow-up patch will handle the atan2(±∞, ±∞)
corner cases explicitly.
v2: Fix up argument scaling to take into account the range and
precision of exotic FP24 hardware. Flip coordinate system for
arguments along the vertical line as if they were on the left
half-plane in order to avoid division by zero which may give
unspecified results on non-GLSL 4.1-capable hardware. Sprinkle in
some more comments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This addresses several issues of the current atan2 implementation:
- Negative zero (and negative denorms which end up getting flushed to
zero) isn't handled correctly by the current implementation. The
reason is that it does 'y >= 0' and 'x < 0' comparisons to decide
on which side of the branch cut the argument is, which causes us to
return incorrect results (off by up to 2π) for very small negative
values.
- There is a serious precision problem for x values of large enough
magnitude introduced by the floating point division operation being
implemented as a mul+rcp sequence. This can lead to the quotient
getting flushed to zero in some cases introducing an error of over
8e6 ULP in the result -- Or in the most catastrophic case will
cause us to return NaN instead of the correct value ±π/2 for y=±∞
and x very large. We can fix this easily by scaling down both
arguments when the absolute value of the denominator goes above
certain threshold. The error of this atan2 implementation remains
below 25 ULP in most of its domain except for a neighborhood of y=0
where it reaches a maximum error of about 180 ULP.
- It emits a bunch of instructions including no less than three
if-else branches per scalar component that don't seem to get
optimized out later on. This implementation uses about 13% less
instructions on Intel SKL hardware and doesn't emit any control
flow instructions.
v2: Fix up argument scaling to take into account the range and
precision of exotic FP24 hardware. Flip coordinate system for
arguments along the vertical line as if they were on the left
half-plane in order to avoid division by zero which may give
unspecified results on non-GLSL 4.1-capable hardware. Sprinkle in
some more comments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This does point at the front-end emitting silly code that could have
been optimized out, but the current fsign implementation would emit
bogus IR if abs was set for the argument (because it would apply the
abs modifier on an unsigned integer type), and we shouldn't rely on
the upper layer's optimization passes for correctness.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Will avoid a regression in a future commit that introduces some
additional rcp operations. According to the GLSL 4.10 specification:
"Dividing by 0 results in the appropriately signed IEEE Inf."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
This will be used internally by the GLSL front-end in order to
implement some built-in functions. Plumb it through MESA IR for
back-ends that rely on this translation pass.
v2: Add comment.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
This fixes rendering of full-screen quads (and other screen-filling
geometry, e.g. ioquake3 walls up-close) on gc3000. It should be a no-op
on other hardware.
- It looks like SE_CLIP registers were not set at all.
I'm amazed that rendering worked without them. Emit them to
avoid issues on gc3000.
- Define constants
ETNA_SE_SCISSOR_MARGIN_RIGHT (0x1119)
ETNA_SE_SCISSOR_MARGIN_BOTTOM (0x1111)
ETNA_SE_CLIP_MARGIN_RIGHT (0xffff)
ETNA_SE_CLIP_MARGIN_BOTTOM (0xffff)
These demarcate the margin (fixp16) between the computed sizes and the
value sent to the chip. I have set these to the numbers used by the
Vivante driver for gc2000. I am not sure whether any old hardware was
relying on the old numbers, or whether those were just a guess. But if
so, these need to be moved to the _specs structure.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Shaders using sin/cos instructions were not working on GC3000.
The reason for this turns out to be that these chips implement sin/cos
in a different way (but using the same opcodes):
- Need their input scaled by 1/pi instead of 2/pi.
- Output an x and y component, which need to be multiplied to
get the result.
- tex_amode needs to be set to 1.
Add a new bit to the compiler specs and generate these instructions
as necessary.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Commit 2852efcda4 moved the location of
the depth input attachment surface state from the render pass to the
image view, but failed to update the surface state location used when
emitting the binding table. Fix this by loading the surface state from
the correct location.
Fixes:
dEQP-VK.renderpass.formats.d16_unorm.input.*
dEQP-VK.renderpass.formats.d24_unorm_s8_uint.input.*
dEQP-VK.renderpass.formats.d32_sfloat.input.*
dEQP-VK.renderpass.formats.x8_d24_unorm_pack32.input.*
dEQP-VK.renderpass.attachment_allocation.input_output.93
dEQP-VK.renderpass.attachment_allocation.input_output.92
dEQP-VK.renderpass.attachment_allocation.input_output.82
dEQP-VK.renderpass.attachment_allocation.input_output.46
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Exposing rb swapped (or other swizzled) formats for rendering would
involve swizzing in the pixel shader. This is not the case at the
moment, so reject requests for creating such surfaces.
(GPUs that need an extra resolve step anyway due to multiple pixel
pipes, such as gc2000, might also do this swap in the resolve operation.
But this would be tricky to keep track of)
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Use of unsigned loop control variable with '>= 0' would lead
to infinite loop.
Reported by clang:
etnaviv_compiler.c:1024:39: warning: comparison of unsigned expression
>= 0 is always true [-Wtautological-compare]
for (unsigned sp = c->frame_sp; sp >= 0; sp--)
~~ ^ ~
v2: Simply use the same datatype as c->frame_sp is using.
CC: <mesa-stable@lists.freedesktop.org>
Reported-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
There are some corner cases where you end up with an esgs ring, but no
gsvs ring, test for both before dereferencing.
Fixes:
dEQP-VK.geometry.emit.points_emit_0_end_0
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This emits the compiled geometry shader and other state registers.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This uses the scratch infrastructure to handle the esgs
and gsvs rings.
(this replaces the old code that did this with patching).
v2: fix correct ring sizes, reset sizes (Bas)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds gs copy shader support to the pipeline cache, and few
geometry related changes.
v2: rebase for spill changes.
v2.1: fix incorrect pipeline destruction.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This handles geometry shader inputs written by the vertex (es) shader
to the esgs ring.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This handles emitting things to the gsvs ring, and sending the
correct GS msgs.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just places the flag into the shader info so we can use it from
the driver after we create the shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This enables the paths for setting up user ptrs to vs/es and gs.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This sets up the rings and adds the variables
needed to make them work.
v2: rework for sharing ring and scratch
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Prevent Coverity seeing potential errors when src is
no initialized in the switch case.
Coverity-Id: 1396397
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We also add a flag for detecting shaders written to shader cache.
V2: dont leak cache
Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The shader cache is expected to be developed incrementally over a
fairly long series of commits. For that period of instability, we
require users to opt into the shader cache by setting:
MESA_GLSL_CACHE_ENABLE=1
In the future, when the shader cache is complete, we can revert this
commit so that the cache will be on by default.
The user can always disable the cache with
MESA_GLSL_CACHE_DISABLE=1. That functionality is not affected by this
commit, (nor will it be affected by the future revert).
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes a bunch of buffer related:
dEQP-VK.memory.pipeline_barrier.*
tests, that were crashing in LLVM due to this being missing.
Reviewed-by: Andres Rodriguez<andresx7@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Should be r600_common_screen instead of r600_screen.
Fixes: 80157a2c20 ("gallium/radeon: clean up r600_query_init_backend_mask")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This fixes a bug uncovered by the 17-part patch series, specifically:
"gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counter"
If dirty_tex_counter has been updated and set_shader_image invokes DCC
decompression, the DCC decompression itself checks the counter and updates
descriptors, which in turn invokes the same DCC decompression. The blitter
can't handle the recursion and the driver eventually crashes.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The update frequency is very low.
Difference: Only account for the size when allocating a new one and when
starting a new IB, and check for NULL. (v3)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Commit 7b5878ee04 increased number of
outputs to 64, but left output array intact. This caused stack overflow
when number of outputs is bigger then 32. Found by ASAN.
Cc: "12.0 13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
There are even more counters in the CP_STAT register but I think
these ones are enough for now.
v2: only read (and expose) CP_STAT on VI+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For simplicity, GPU-sdma-busy will return 0 on previous gens.
v2: only read SRBM_STATUS2 on Evergreen+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We also want to monitor other MMIO counters like SRBM_STATUS2 in
order to know if SDMA is busy.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The perf difference is very small, 3.25->2.84% in amdgpu_cs_flush()
in the DXMD benchmark.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This just needs to be done for r600g in the screen.
We don't need an IB submission for every new context created for GCN.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The perf difference is very small: 0.99% -> 0.40% for the time spent
in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Previously the z offset of the destination image was being ignored. It
should be taken into account when copying into a 3d target.
Also, img_extent_el.depth was being incorrectly clamped to 1 due to the
source image being VK_IMAGE_TYPE_2D. This would result in the blit
failing to iterate over all the 3d slices. Instead we clamp to the
destination image type.
Fixes failures in CTS tests:
dEQP-VK.api.copy_and_blit.image_to_image.3d_images.*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
There is a new error code in Maintenance1 that is more specific to the
situation: VK_ERROR_OUT_OF_POOL_MEMORY_KHR
Fixes CTS test case:
dEQP-VK.api.descriptor_pool.out_of_pool_memory
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is part of the spec and fixes CTS tests:
dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
At this point, the pitch is in bytes. We haven't yet divided the pitch
by 4 for tiled surfaces, so abs(pitch) may be larger than 32K. This
means the bit 15 trick won't work.
The caller now has signed integers anyway, so just pass those through
and do the obvious check.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Since LLVM revision 293359 DumpModule gets only implemented when
either a debug build or LLVM_ENABLE_DUMP is set.
This patch adds a direct replacement for the function for radv and
radeonsi, However, as I don't know a good place to put common LLVM
code for all three I inlined the implementation for LLVMPipe.
v2: Use the new code for LLVM 3.4+ instead of LLVM 5+ & fixed indentation
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This matches the behavior of most other drivers, including nouveau,
radeonsi, and i965.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This generally cuts an instruction when blending is enabled and we thus
have a single instruction generating the color value.
total instructions in shared programs: 91759 -> 91634 (-0.14%)
instructions in affected programs: 5338 -> 5213 (-2.34%)
shader-db results:
total instructions in shared programs: 92611 -> 91764 (-0.91%)
instructions in affected programs: 27417 -> 26570 (-3.09%)
The star is one shader in glmark2's terrain (drops 16% of its
instructions), but there are also wins in mupen64plus and glb2.7.
This has almost no effect on shader-db:
total instructions in shared programs: 92572 -> 92611 (0.04%)
instructions in affected programs: 4486 -> 4525 (0.87%)
Looking at 2 of the 7 different shaders that were hurt (all of which were
in mupen64), they all appear to be just differences in order of
instructions at the NIR level.
The advantage is that this should significantly reduce time in the compiler.
Applications may delete a shader program, create a new one, and bind it
before the next draw. With terrible luck, malloc may randomly return a
chunk of memory for the new gl_program that happened to be the exact
same pointer as our previously bound gl_program. In this case, our
logic to detect new programs in brw_upload_pipeline_state() would break:
if (brw->vertex_program != ctx->VertexProgram._Current) {
brw->vertex_program = ctx->VertexProgram._Current;
brw->ctx.NewDriverState |= BRW_NEW_VERTEX_PROGRAM;
}
Because the pointer is the same, we'd think it was the same program.
But it could be wildly different - a different stage altogether,
different sets of resources, and so on. This causes utter chaos.
As unlikely as this seems, I believe I hit this when running a subset
of the CTS in a loop, in a group of tests that churns through simple
programs, deleting and rebuilding them. Presumably malloc uses a
bucketing cache of sorts, and so freeing up a gl_program and allocating
a new one fairly quickly causes it to reuse that memory.
The result was that brw->vertex_program->info.num_ssbos claimed the
program had SSBOs, while brw->vs.base.prog_data.binding_table claimed
that there were none. This was crazy, because the binding table is
calculated from info.num_ssbos - the shader info appeared to change
between shader compile time and draw time. Careful use of watchpoints
revealed that it was being clobbered by rzalloc's memset when building
an entirely different program...
Fortunately, our 0xd0d0d0d0 canary for unused binding table entries
caused us to crash out of bounds when trying to upload SSBOs, or we
may have never discovered this heisenbug.
Fixes crashes in GL45-CTS.compute_shader.sso-case2 when using a hacked
cts-runner that only runs GL45-CTS.compute_shader.s* in EGL config ID 5
at 64x64 in a loop with 100 iterations.
Cc: "17.0 13.0 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This extension was not correctly supported, and it conflicts with the
VK_KHR_MAINTENANCE1 spec.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
this fixes deferred shadows with geom shaders enabled.
but I think this fix is fine by itself.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch implements a new type of struct brw_fence, one that is based
struct sync_file.
This completes support for EGL_ANDROID_native_fence_sync.
* Background
Linux 4.7 added a new file type, struct sync_file. See
commit 460bfc41fd52959311ed0328163f785e023857af
Author: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Date: Thu Apr 28 10:46:57 2016 -0300
Subject: dma-buf/sync_file: de-stage sync_file headers
A sync file is a cross-driver explicit synchronization primitive. In a
sense, sync_file's relation to synchronization is similar to dma_buf's
relation to memory: both are primitives that can be imported and
exported across drivers (at least in theory).
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Rename to brw_fence_insert_locked(). This is correct because the fence's
mutex is effectively locked, as all callers are also *creators* of the
fence, and have not yet returned the new fence.
This reduces noise in the next patch, which defines and uses
brw_fence_insert(), an unlocked variant.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Pre-patch, brw_sync.c ignored the return value of
intel_batchbuffer_flush().
When intel_batchbuffer_flush() fails during eglCreateSync
(brw_dri_create_fence), we now give up, cleanup, and return NULL.
When it fails during glFenceSync, however, we blindly continue and hope
for the best because there does not exist yet a way to tell core GL that
sync creation failed.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This a refactor patch; no expected changed in behavior.
Add `enum brw_fence_type` and brw_fence::type. There is only one type
currently, BRW_FENCE_TYPE_BO_WAIT. This patch reduces a lot of noise in
the next, which adds new type BRW_FENCE_TYPE_SYNC_FD.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Required to implement EGL_ANDROID_native_fence_sync on i965.
Specifically, i965 needs drm_intel_gem_bo_exec_fence(),
I915_PARAM_HAS_EXEC_FENCE, and libsync.h.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Analogous to previous commit(s), with a minor detail - here we set the
macros when building both C and C++ sources.
Resolving that is a more challenging task that we'll sort out another
day.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The variable replacement was unused when building w/o
ENABLE_SHADER_CACHE. Since we can mix variable declarations and code,
move it to where its used.
Fixes: 9f8dc3bf03 "utils: build sha1/disk cache only with
Android/Autoconf"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Using return foo() is incorrect even if foo itself returns void.
Spotted by AppVeyor, as below:
teximage.c(3653) : warning C4098: 'copyteximage' : 'void' function returning a value
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Does not match the function definition or how it's used. Triggers the
following warning in AppVeyor
svga_cmd_vgpu10.c(1301) : warning C4028: formal parameter 2 different from declaration
Cc: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
MSVC warns about different const qualifiers. Add the extra const to
silence it.
nir_phi_builder.c(244) : warning C4090: 'initializing' : different 'const' qualifiers
nir_phi_builder.c(245) : warning C4090: 'initializing' : different 'const' qualifiers
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
MSVC warns about implicit conversion as below. Annotate the literal
appropriately to silence the warning.
nir_gather_info.c(249) : warning C4334: '<<' : result of 32-bit shift
implicitly converted to 64 bits (was 64-bit shift intended?)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
b3119a3 introduced a strict LLVM requirement for r300 on all
architectures and thus configure fails on architectures where LLVM is
not available or buggy.
r300 doesn't strictly require LLVM, but for performance reasons we
highly recommend LLVM usage. So require it at least on x86 and x86_64
architectures as we have done before b3119a3.
Fixes: b3119a3 ("configure.ac: Check gallium LLVM version in gallium_require_llvm")
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
All of these have had support for the TGSI opcodes since before most of
the glsl compiler work landed.
Also update the docs accordingly, including the missing note about i965.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: add conversion opcodes.
v3 (idr): Rebase on replacemtn of TGSI_OPCODE_I2U64 with
TGSI_OPCODE_I2I64.
v4 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b. Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.
v5 (nha): add clarifying comment about a subtle assumption
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v1.1: move to using a normal CAP. (Marek)
v2: fill in the cap everywhere
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
which is not applicable for "all slices at each lod". Current
logic makes one to believe it has some purpose. When miptree
layout is calculated brw_miptree_layout_texture_array() sets
the qpitch unconditionally but later on ignores it altogether
for ALL_SLICES_AT_EACH_LOD.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Such as comment states for intel_miptree_hiz_buffer::mt, hiz_mt
only exists for gen6. In addition, intel_hiz_miptree_buf_create()
uses MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD unconditionally.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
In intel_hiz_miptree_buf_create() intel_miptree_aux_buffer::bo
is unconditionally initialised to point to the same buffer
object as hiz_mt does. The same goes for
intel_miptree_aux_buffer::pitch/qpitch.
This will make following patches simpler to read.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
In intel_hiz_miptree_buf_create() intel_miptree_aux_buffer::bo
is unconditionally initialised to point to the same buffer
object as hiz_mt does. Also intel_miptree_aux_buffer::offset
is initialised to zero (calloc()).
This will make following patches significantly simpler to read.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Only caller, brw_workaround_depthstencil_alignment(), returns
early for gen6+.
While at it, reduce scope for brw_get_depthstencil_tile_masks() as
well.
Reviewed-by: Samuel Iglesias Gons\341lvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
There exact same check earlier in brw_miptree_layout() which
intel_miptree_create_layout() in turn calls unconditionally.
Reviewed-by: Samuel Iglesias Gons\341lvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
In addition, let intel_miptree_create_layout() release the
miptree - it is the allocator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We had a lot of memcpy call overhead because gpu_stride wasn't being
inlined. But if you split out the stride==8 and stride==16 cases like
this code does while still using memcpy, you'd no longer have glibc's
NEON memcpy applied at which point we'd be doing 16 uncached reads
instead of 64/(NEON memcpy granularity), for about a 30% performance
hit. By hand writing the assembly, we can get a whole cacheline
loaded at a time.
Unfortunately, NEON intrinsics turned out to be unusable -- they
didn't have the vldm instruction available.
Note that, for now, the NEON code is only enabled when building for ARMv7
(Pi 2+). We may want to do runtime detection for the Raspbian case, in
the future.
Improves 1024x1024 GetTexImage by 208.256% +/- 7.07029% (n=10).
Saves a measly 20 bytes on IA32 and nothing on x64. Depending on
exactly when this is applied, a lot of variation is possible due to
function alignment.
text data bss dec hex filename
6670131 228340 22552 6921023 699b3f lib/i965_dri.so before
6670111 228340 22552 6921003 699b2b lib/i965_dri.so after
6342932 293872 29880 6666684 65b9bc lib64/i965_dri.so before
6342932 293872 29880 6666684 65b9bc lib64/i965_dri.so after
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
By putting the parameters first that match the parameters to the call
site, 4 (of 14) instructions are saved at _mesa_Uniform4fv on x64. On
IA32, the details of the instructions change, but it is the same count
and mix of instructions.
Before:
0000000000000830 <_mesa_Uniform4fv>:
830: 48 83 ec 10 sub $0x10,%rsp
834: 49 89 d0 mov %rdx,%r8
837: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 83e <_mesa_Uniform4fv+0xe>
83e: 89 f8 mov %edi,%eax
840: 89 f1 mov %esi,%ecx
842: 41 b9 02 00 00 00 mov $0x2,%r9d
848: 64 48 8b 3a mov %fs:(%rdx),%rdi
84c: 48 8b 97 c8 01 02 00 mov 0x201c8(%rdi),%rdx
853: 48 8b 72 70 mov 0x70(%rdx),%rsi
857: 6a 04 pushq $0x4
859: 89 c2 mov %eax,%edx
85b: e8 00 00 00 00 callq 860 <_mesa_Uniform4fv+0x30>
860: 48 83 c4 18 add $0x18,%rsp
864: c3 retq
After:
00000000000007f0 <_mesa_Uniform4fv>:
7f0: 48 83 ec 10 sub $0x10,%rsp
7f4: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 7fb <_mesa_Uniform4fv+0xb>
7fb: 41 b9 02 00 00 00 mov $0x2,%r9d
801: 64 48 8b 08 mov %fs:(%rax),%rcx
805: 48 8b 81 c8 01 02 00 mov 0x201c8(%rcx),%rax
80c: 6a 04 pushq $0x4
80e: 4c 8b 40 70 mov 0x70(%rax),%r8
812: e8 00 00 00 00 callq 817 <_mesa_Uniform4fv+0x27>
817: 48 83 c4 18 add $0x18,%rsp
81b: c3 retq
Saves a measly 416 bytes of text on x64. Depending on exactly when this
is applied, a lot of variation is possible due to function alignment.
text data bss dec hex filename
6670131 228340 22552 6921023 699b3f lib/i965_dri.so before
6670131 228340 22552 6921023 699b3f lib/i965_dri.so after
6343348 293872 29880 6667100 65bb5c lib64/i965_dri.so before
6342932 293872 29880 6666684 65b9bc lib64/i965_dri.so after
There is likely to be no performance change with just this patch.
_mesa_uniform immediately calls validate_uniform_parameters with
parameters in the "wrong" (different from the call site) order.
v2: Rebase on GL_ARB_gpu_shader_fp64.
v3: Rebase on GL_ARB_gpu_shader_int64.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The size of the pool is slightly smaller than the size of the
structure containing the whole pool. We need to take that into account
on when setting up the internals.
Fixes a crash due to out of bound memory access in:
dEQP-VK.api.descriptor_pool.out_of_pool_memory
v2: Drop debug traces (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
When trying to blit larger tiled surfaces, the pitch can be larger than
32768 bytes, which means it won't fit in a GLshort. Passing it in will
truncate the stride to 0, which has...surprising results.
The pitch can be up to 32,768 DWords, or 128kB. We measure it in bytes,
but divide by 4 when programming it. So we need to handle values up to
131,072. Switch from GLshort to int32_t to avoid the truncation.
Fixes GL45-CTS.gtf30.GL3Tests.depth_texture.depth_texture_copyteximage
at widths greater than 8192.
v2: Use int32_t as negative values can be used (Jason).
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
SIMD16 compute shaders use a send(16) with mlen 1 for the EOT message,
using a source of g127 for the single register. With a UD type, this
supposedly could read g128, which doesn't exist, causing the simulator
to get cranky. Use a UW type to avoid this.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
According to GL_KHR_vulkan_glsl, the signature of subpassLoad() is:
gvec4 subpassLoad(gsubpassInput subpass);
gvec4 subpassLoad(gsubpassInputMS subpass, int sample);
So the multisampled case always receives an explicit sample index that we
should use. The current implementation was ignoring this parameter
and using gl_SampleID value instead.
Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_id.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
I hadn't bothered to set this bit because I figured it would just
paper over us getting the rectangle wrong. But it turns out that
there is a legitimate reason to use it, so let's do so.
The alternative would be to chop up 16k clears to multiple 8k clears,
which is pointlessly painful.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Add a helper function, anv_get_queue_family_properties(), which fills the
struct. This patch reduces churn in the following patch that implements
vkGetPhysicalDeviceQueueFamilyProperties2KHR.
Reviewed-by: Jason Ekstranad <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Add a helper function, anv_get_image_format_properties(), which does all
the work and has a VkPhysicalDeviceImageFormatInfo2KHR parameter. This
patch reduces churn in the following patch that implements
vkGetPhysicalDeviceImageFormatProperties2KHR.
Reviewed-by: Jason Ekstranad <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The struct was deleted by:
commit efe9d1cde3
Author: Edward O'Callaghan <funfunctor@folklore1984.net>
Subject: anv: Clean up some unused variables
Unlike the original anv_common, the new one has a non-const pNext
pointer because we will use it for the output structs of
VK_KHR_get_physical_device_properties2.
v2:
- Retype pNext from void* to struct anv_common*.
Reviewed-by: Jason Ekstranad <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is a printf-like macro that prints a debug message to stderr when
built with DEBUG. If no DEBUG, then do nothing.
Reviewed-by: Jason Ekstranad <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
All of the functions were passing 1 to _mesa_uniform instead of passing
count.
Fixes 16 unsed parameter warnings like:
main/uniforms.c: In function ‘_mesa_Uniform1i64vARB’:
main/uniforms.c:1692:47: warning: unused parameter ‘count’ [-Wunused-parameter]
_mesa_Uniform1i64vARB(GLint location, GLsizei count, const GLint64 *value)
^~~~~
This is why I build with extra warnings enabled. Unfortunately, there
are so many unused parameter warnings in Mesa that I didn't notice these
added warnings for over 6 months. :(
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The spec section 5.2 says:
"vkAllocateCommandBuffers can be used to create multiple command
buffers. If the creation of any of those command buffers fails, the
implementation must destroy all successfully created command buffer
objects from this command, set all entries of the pCommandBuffers
array to VK_NULL_HANDLE and return the error."
Fixes:
dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_primary
dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_secondary
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
In swr_update_derived() update texture and sampler state on a new fragment
shader. GALLIUM_HUD can update fs using a previously bound texture and
sampler.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Useful when debugging applications which map a ton of buffers
and also because we used to run into Linux's limit on the number
of simultaneous mmap() calls.
v2: - update the commit message
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
SPIR-V maps both gl_SampleMask and gl_SampleMaskIn to the same
builtin (SampleMask). The only way to tell which one we are dealing with
is to check if it is an input or an output.
Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_mask.write.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Along the lines of what
3b804819 anv: Default PointSize to 1.0 if not written by the shader
does for anv, program a default point size in the hw of 1.0.
This preempt fixes a bunch of geom shader tests.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If radeonsi starts compiling an optimized shader variant asynchronously
with a GL debug callback set and the application destroys the GL context,
radeonsi crashes when trying to write shader stats into the debug output
of a non-existent context after compilation, because st/mesa was destroyed
before pipe_context.
Firefox with WebGL2 enabled hits this bug.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99456
v2: protect against a double destroy in st_create_context_priv and callers.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The original number was chosen in an attempt to match the limits applied to
GLSL IR.
A look at the git history of the why these limits were chosen for GLSL IR
shows it was more to do with the slow speed of unrolling large loops in
GLSL IR than anything else. The speed of loop unrolling in NIR is not a
problem so we may wish to bump this even higher in future.
No shader-db change, however a furture change will disbale the GLSL IR
optimisation loop in the i965 backend results in 4 loops from The Talos
Principle failing to unroll. Bumping the limit allows them to unroll which
results in the instruction count matching the previous output from when the
GLSL IR opts were still enabled.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously the constant array would not get copy propagated until the backend
did its GLSL IR opt loop. I plan on removing that from i965 shortly which
caused huge regressions in Deus-ex and Tomb Raider which have large
constant arrays. Moving lowering before the opt loop in the GLSL linker
fixes this and unexpectedly improves some compute shaders also.
shader-db results BDW:
instructions helped: shaders/closed/steam/deus-ex-mankind-divided/374.shader_test CS SIMD16: 204 -> 194 (-4.90%)
instructions helped: shaders/closed/steam/deus-ex-mankind-divided/318.shader_test CS SIMD8: 1010 -> 741 (-26.63%)
instructions helped: shaders/closed/steam/deus-ex-mankind-divided/144.shader_test CS SIMD8: 542 -> 385 (-28.97%)
cycles helped: shaders/closed/steam/deus-ex-mankind-divided/318.shader_test CS SIMD8: 1831382 -> 1818492 (-0.70%)
cycles helped: shaders/closed/steam/deus-ex-mankind-divided/144.shader_test CS SIMD8: 216238 -> 206180 (-4.65%)
cycles helped: shaders/closed/steam/deus-ex-mankind-divided/374.shader_test CS SIMD16: 18484 -> 16644 (-9.95%)
total instructions in shared programs: 13060313 -> 13059877 (-0.00%)
instructions in affected programs: 1756 -> 1320 (-24.83%)
helped: 3
HURT: 0
total cycles in shared programs: 256586698 -> 256561910 (-0.01%)
cycles in affected programs: 2066104 -> 2041316 (-1.20%)
helped: 3
HURT: 0
V3: only call the opt loop if lowering progressed (Suggested by Eric)
V2: call opts before and after lowering (Suggested by Ken)
Reviewed-by: Eric Anholt <eric@anholt.net>
OpenGL ES implementations are not allowed to ship ARB extensions, and
OpenGL implementations are not allowed to ship OES extensions.
The functionality is also included in GL_ARB_ES2_compatibility. Ever
OpenGL core-profile driver currently exposes both extensions. I don't
know of any applications that explicitly check for GL_OES_read_format,
so removing it seems very unlikely to cause problems. No functionality
is removed.
I have left this extension in place for compatibility profile. There
are still OpenGL 1.x drivers in Mesa, and adding code to check for
compatibility profile and not GL_ARB_ES2_compatibility for
GL_IMPLEMENTATION_COLOR_READ_TYPE and GL_IMPLEMENTATION_COLOR_READ_FORMAT
just feels dumb.
Three other other alternatives considered:
- Remove the string from compatibility profile drivers but leave the
functionality in place.
- Add a flag to expose the extension string, and set it in every OpenGL
driver that does not expose GL_ARB_ES2_compatibility (and those
drivers only). I tried this. You can't have two instances of an
extension in the extension table (one dummy_true for ES1 and one with
a flag for compatibility profile), so the implementation requires a
bit of effort.
- Only expose the extension in compatibility if the version is less
than 2.0. I didn't see an easy way to do this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
As per VK_KHR_maintenance1, clients can render to a slice of a 3D image
by creating a VK_IMAGE_VIEW_TYPE_2D view of it.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
As of VK_KHR_maintenance1, these are supposed to be reported for any
formats on which we support transfer operations. For us, this is
anything that we can texture from.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Our command buffers already efficiently use a global pool so trimming
doesn't really need to do anything.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
As per VK_KHR_maintenance1, setting a negative height in the viewport
can be used to get flipped coordinates. This is, aparently, very useful
when porting D3D apps to Vulkan. All we need to do to support this is
to make sure we actually set the min and max correctly.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
define __STDC_FORMAT_MACROS and include <inttypes.h> (same as
ir_builder_print_visitor.cpp already does).
Otherwise, some mingw build errors out (since
8e7e1ae036 and
bbce1c538d presumably) with:
src/compiler/glsl/ir_print_visitor.cpp:479:40: error: expected ‘)’ before ‘PRIu64’
case GLSL_TYPE_UINT64:fprintf(f, "%" PRIu64, ir->value.u64[i]); break;
(Note even with that fix I get other format specifier warnings:
src/compiler/glsl/ir_print_visitor.cpp:473:47:
warning: unknown conversion type character ‘a’ in format [-Wformat=]
fprintf(f, "%a", ir->value.f[i]);
^
src/compiler/glsl/ir_print_visitor.cpp:473:47:
warning: too many arguments for format [-Wformat-extra-args]
but it still compiles at least)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The use of fast rcp instruction is disabled, and will always fall back
to use a division instead (1 / x). Hence, if we get a division opcode,
it doesn't make much sense trying to split that into rcp/mul.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
we can't use the cpu implementation of fdiv, as this one uses different
lp_build_context, which causes assertion failure.
Just use default fdiv action (there is no fast rcp for doubles which we
could potentially use anyway).
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
In brw_blorp_copyteximage, we use the format from the render buffer.
This could be a combined depth/stencil format. In this case, we handle
stencil properly but we give blorp the wrong ISL format. Specifically,
we would give blorp ISL_FORMAT_R32G32B32A32_FLOAT which is the wrong
size was causing GPU hangs.
Fixes: GL45-CTS.gtf30.GL3Tests.packed_depth_stencil.packed_depth_stencil_copyteximage
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
state_tracker/st_glsl_to_tgsi.cpp:302:28: warning: ‘glsl_to_tgsi_instruction::tex_type’
is too small to hold all values of ‘enum glsl_base_type’
glsl_base_type tex_type:4;
Fixes: 8ce53d4a2f ("glsl: Add basic ARB_gpu_shader_int64 types")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Other things are out of order, but I need to add a getter so I'm just
fixing those.
This helps people adding to GBM know where the right place to put things
is.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Daniel Stone <daniels@collabora.com>
This sets the dnz flag on all the relevant multiplication operations. At
emission time, this will only be supported by nvc0+, so nv50 will need a
different solution.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This will be useful for proper D3D9 emulation, where this behavior is
expected by some shaders.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Some CIK-VI docs say this is the default behavior on SI. That doesn't
answer whether it's also the default behavior on CIK-VI.
Cc: 17.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
so that most shaders can get lower VGPR usage thanks to lazy input loading.
I think this is a more accurate constraint that prevents the black transitions
in Witcher 2.
Affected shaders (7758):
Max Waves: 57437 -> 58231 (1.38 %)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It's also possible to monitor them via performance counters but
the hardware can only use two counters simultaneously. It seems
easier to re-use the existing code which reads from MMIO instead
of writing a multi-pass approach.
v2: - add new lines after ':'
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will allow to expose more queries in order to know which
blocks are busy/idle.
v2: - add new lines after ':'
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Some query results struct contents are declared as cache line aligned.
Use aligned malloc, and align the whole struct, to be safe.
Fixes crash when compiling with clang.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
CalculateProcessorTopology tries to figure out system topology by
parsing /proc/cpuinfo to determine the number of threads, cores, and
NUMA nodes. There are some architectures where the "physical id" begins
with 1 rather than 0, which was creating and empty "0" node and causing a
crash in CreateThreadPool.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97102
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
CC: <mesa-stable@lists.freedesktop.org>
This allows eglCreateImageKHR to access P010 surfaces created by vaapi
Signed-off-by: Rainer Hochecker <fernetmenta@online.de>
Acked-by: Ben Widawky <ben@bwidawsk.net>
This fixes "st/va: delay calling begin_frame until we have all parameters".
v2: call begin frame after decoder (re)creation as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
It seems clear that trying to multiply two pairs of doubles would result
in the temporary register getting overwritten by the second pair. So
make the code more explicit.
Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
There are some line wrapping violations here but those lines will get
deleted in the following patch.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Now that the i965 backend doesn't depend on this field we can
make it more generic and short circuit a bunch of code paths.
The new field will be used in a following patch for another
clean-up.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This makes much more sense and should be more performant in some
critical paths such as SSO validation which is called at draw time.
Previously the CurrentProgram array could have contained multiple
pointers to the same struct which was confusing and we would often
need to fish out the information we were really after from the
gl_program anyway.
Also it was error prone to depend on the _LinkedShader array for
programs in current use because a failed linking attempt will lose
the infomation about the current program in use which is still
valid.
V2: fix validate_io() to compare linked_stages rather than the
consumer and producer to decide if we are looking at inward
facing shader interfaces which don't need validation.
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
To avoid build regressions the following 2 patches were squashed in to
this commit:
mesa/meta: rewrite _mesa_shader_program_use() and _mesa_program_use()
These are rewritten to do what the function name suggests, that is
_mesa_shader_program_use() sets the use of all stage and
_mesa_program_use() sets the use of a single stage.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
mesa: update active relinked program
This likely fixes a subroutine bug were
_mesa_shader_program_init_subroutine_defaults() would never have been
called for the relinked program as we previously just set
_NEW_PROGRAM as dirty and never called the _mesa_use* functions when
linking.
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
What a3xx docs call IJPERSPCENTERREGID.. the xy coord passed into
bary.f. We were incorrectly setting both this and gl_FragCoord.xy to
the same register resulting in all sorts of hilarity.
Fixes stk, vdrift, 0ad, probably a bunch others.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
The previous code always compared integers as 64-bit. Due to variations
in sign-extension in the code generated by nir_opt_algebraic.py, this
meant that nir_search doesn't always do what you want. Instead, 32-bit
values should be matched as 32-bit and 64-bit values should be matched
as 64-bit. While we're here we unify the unsigned and signed paths.
Now that we're using the right bit size, they should be the same since
the only difference we had before was sign extension.
This gets the UE4 bitfield_extract optimization working again. It had
stopped working due to the constant 0xff00ff00 getting sign-extended
when it shouldn't have.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
In order to handle CCS_E, we stomp the image format to a UINT format and
then do some bitcasting logic in the shader. This works fine since SKL
render compression only considers the channel layout of the format and
not the format itself. In order for this to work on images that have
been fast-cleared, we need to also convert the clear color so that, when
interpreted as UINT, it provides the same bit value as it would have in
the original format. This fixes a bunch of OpenGL ES CTS tests for
copy_image when we start using CCS more aggressively.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
basetsd.h on Windows defines INT64 and UINT64 typedefs which conflict
with these. Append "_TOK" to avoid conflicts.
Should fix the Windows build.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Gen8 adds Q/UQ types. We attempted to change the types back to DF in the
generator (commit c95380c40), but an assertion added in the FP64 series
(commit e481dcc3) triggers before that code has a chance to execute.
In fact, using Q/UQ in the IR and then changing to DF in the generator
would not work in the presence of source modifiers, etc.
Fixes: d6fcede6 ("i965: Return Q and UQ types for int64 and uint64")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Integer comparison functions (e.g., nir_op_ilt) are handled in the next
commit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2 (idr): Make the "from" type in a cast unsized. This reduces the
number of required cast operations at the expensive slightly more
complex code. However, this will be a dramatic improvement when other
sized integer types are added. Suggested by Connor.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: Fixup assertion in brw_reg_type_to_hw_type to allow
BRW_REGISTER_TYPE_{UQ,Q} on Gen8+.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It seems like maybe this should return a different type based on Gen. Q
and UQ only exist on Gen8+, but, based on the old comment, I believe
previous Gens can generate 64-bit moves.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It's much easier to do this in the generator rather than while coming
out of NIR. brw_type_for_nir_type doesn't know the Gen, so we'd have to
add a bunch of plumbing. The alternate fix is to not emit int64 moves
for doubles in the first place... but that seems even more difficult.
This change won't catch non-MOV instructions that try to use 64-bit
integer types on Gen < 8. This may convert certain kinds of bugs in to
different kinds of bugs that are more difficult to detect (since the
assertions in the function won't catch them).
NOTE: I don't think anything can emit mixed-type 64-bit moves until the
same platform supports both ARB_gpu_shader_fp64 and
ARB_gpu_shader_int64. When we enable int64 on Gen < 8, we can solve
this problem other ways.
This prevents regressions on HSW in the next patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: Don't up-convert the shift count parameter if shift instructions.
Suggested by Connor. Add type_is_singed() function. This will make
adding 8- and 16-bit types easier.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Previously both sources were unsized. This caused problems when the
thing being shifted was 64-bit but the shift count was 32-bit. The
expectation in NIR is that all unsized sources (and destination) will
ultimately have the same size.
The changes in nir_opt_algebraic.py are to prevent errors like:
Failed to parse transformation:
03:12:25 (('extract_i8', 'a', 'b'), ('ishr', ('ishl', 'a', ('imul', ('isub', 3, 'b'), 8)), 24), 'options->lower_extract_byte')
03:12:25 Traceback (most recent call last):
03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 610, in __init__
03:12:25 xform = SearchAndReplace(xform)
03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 495, in __init__
03:12:25 BitSizeValidator(varset).validate(self.search, self.replace)
03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 311, in validate
03:12:25 validate_dst_class = self._validate_bit_class_up(replace)
03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 414, in _validate_bit_class_up
03:12:25 src_class = self._validate_bit_class_up(val.sources[i])
03:12:25 File "/home/jenkins/workspace/Leeroy_2/repos/mesa/src/compiler/nir/nir_algebraic.py", line 420, in _validate_bit_class_up
03:12:25 assert src_class == src_type_bits
03:12:25 AssertionError
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
This change makes me wonder whether double packing should be
reimplemented as int64BitsToDouble(packInt2x32(v)). I'm a little on the
fence since not all platforms that support fp64 natively support int64.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b. Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.
v3 (idr): Make the "from" type in a cast unsized. This reduces the
number of required cast operations at the expensive slightly more
complex code. However, this will be a dramatic improvement when other
sized integer types are added. Suggested by Connor.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The lowering passes 64-bit integer operations will generate a lot of
these.
v2: Modify the HANDLE_PACK_UNPACK_INVERSE so that the breaks apply to
the switch instead of the 'do { } while(true)' loop.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These functions are directly available in shaders. A #define is added
to detect the presence. This allows these functions to be tested using
piglit regardless of whether the driver uses them for lowering. The
GLSL spec says that functions and macros beginning with __ are reserved
for use by the implementation... hey, that's us!
v2: Use function inlining.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These functions are directly available in shaders. A #define is added
to detect the presence. This allows these functions to be tested using
piglit regardless of whether the driver uses them for lowering. The
GLSL spec says that functions and macros beginning with __ are reserved
for use by the implementation... hey, that's us!
v2: Use function inlining.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These functions are directly available in shaders. A #define is added
to detect the presence. This allows these functions to be tested using
piglit regardless of whether the driver uses them for lowering. The
GLSL spec says that functions and macros beginning with __ are reserved
for use by the implementation... hey, that's us!
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: Rename lower_64bit.cpp and lower_64bit_test.cpp to lower_int64.
Suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These functions are directly available in shaders. A #define is added
to detect the presence. This allows these functions to be tested using
piglit regardless of whether the driver uses them for lowering. The
GLSL spec says that functions and macros beginning with __ are reserved
for use by the implementation... hey, that's us!
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Just add operations to the switch statement here.
v2 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b. Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b. Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If ARB_gpu_shader_int64 is supported, ARB_shader_clock also adds
clockARB() that returns a uint64_t. Rather than add new opcodes and
intrinsics for this, just wrap the existing intrinsic with a
packUint2x32.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These are all the allowed 64-bit functions from ARB_gpu_shader_int64
spec.
v2: restrict int64/double functions better.
v3 (idr): Delete spurious blank lines. Suggested by Matt.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
As for the double code, but using the 64-bit integer conversions.
v2 (idr): Remove some spurious u2i() and i2u() operations when packing
and unpacking, respectively, int64_t varyings.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
This adds 64-bit integer support to some AST and IR operations where
it is needed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds support to call the new operations on conversions.
v2 (idr): Delete an unnecessary break-statement. Noticed by Matt. Add
a missing blank line. Noticed by Ian.
v3 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b. Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This just adds the new operations and add 64-bit integer support to all
the existing cases where it is needed.
v2: fix some issues found in testing.
v2.1: add unreachable (Ian), add missing int/uint pack/unpack (Dave).
v3 (idr): Rebase on top of idr's series to generate
ir_expression_operation_constant.h. In addition, this version:
Adds missing support for ir_unop_bit_not, ir_binop_all_equal,
ir_binop_any_nequal, ir_binop_vector_extract,
ir_triop_vector_insert, and ir_quadop_vector.
Removes support for uint64_t from ir_unop_abs and ir_unop_sign.
v4 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b. Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v2]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v3]
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This adds all the conversions in the world, I'm not 100% sure of all of
these are needed, but add all of them and we can cut them down later.
v2: fix issue with packing output types.
v3 (idr): Rebase on top of idr's series to generate
ir_expression_operation_constant.h. Fix transposed ir_validate
assertions for ir_unop_u642i64 and ir_unop_i642u64. Add missing
automatic type setup for ir_unop_u642i64 and ir_unop_i642u64.
v4 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b. Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v2]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v3]
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This adds support for 64-bit integer constants to the parser,
ast and ir.
v2: fix a few issues found in testing.
v3: Add missing ir_constant copy contructor support.
v4: Use PRIu64 and PRId64 in printfs in glsl_parser_extras.cpp.
Suggested by Nicolai. Rebase on Marek's linalloc changes.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v2]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v3]
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This hooks up the API to the internals for 64-bit integer uniforms.
v2: update to use non-strict aliased alternatives
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This adds the builtins and the lexer support.
To avoid too many warnings, it adds basic support to the type in a few
other places in mesa, mostly in the trivial places.
It also adds a query to be used later for if a type is an integer 32 or 64.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Just add the boilerplate xml code.
v2 (idr): Update dispatch_sanity. Only add extension functions in core
profile.
v3 (idr): Remove comment line from gl_API.xml. Suggested by Matt.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
Blorp can deal with depth/stencil surfaces blits/copies without the
render target requirement. Also having both render target and
depth/stencil requirement is incompatible from isl's point of view.
This fixes an image creation issue in the high level quality settings
of the Unity3D player, which requires a depth texture with src/dst
transfer & 4x multisampling.
v2: Simply aspect checking condition (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
src1 must be a descriptor (including the information to determine that
the SEND is doing an extended math operation), but src0 can actually be
null since it serves as the source of the implicit GRF -> MRF move.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
desc will always be non-NULL, because brw_validate_instructions() does
not attempt to validate any instructions that fail the
is_unsupported_inst() check.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We want to rely on brw_opcode_desc() always returning non-NULL in other
validation functions. Other validation functions will be in the else
case of the block added in this patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
inst, whose assignment can be seen in the last line of context pointed
to the correct instruction in the SIMD16 program, but src_offset was the
offset from the beginning of the SIMD16 program.
So if an instruction at offset 0x100 in the SIMD16 program was illegal,
we would mark an error on the instruction at offset 0x100 (which is
likely in the SIMD8 program).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Using a UD-typed operand makes the execution size D, and if the size of
the execution type is greater than the size of the destination type, the
destination must be appropriately strided.
We actually just want UW-types all around.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We change the immediate source type to VF to allow instruction
compaction, but there are no entires in the compaction table for DF, so
there's no point in doing this.
Additionally, I mixing floating-point types is now allowed except for
F and VF.
Use the resource_changed callback to invalidate internal resources
derived from external textures when they are (re-)bound. This is needed
to comply with the requirement from the GL_OES_EGL_image_external
extension that a call to glBindTexture guarantees that all further
sampling will return values that correspond to the values in the
external texture at or after the time that glBindTexture was called.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
To comply with the requirement from the GL_OES_EGL_image_external
extension that a call to glBindTexture guarantees that all further
sampling will return values that correspond to the values in the
external texture at or after the time that glBindTexture was called,
do not bail out early from mesa_BindTextures if the target is
external.
This will later allow the state tracker to instruct the pipe driver
to invalidate internal resources derived from the external texture.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Implement the resource_changed pipe callback to invalidate internal
resources derived from imported buffers. This is needed to update the
texture for re-imported renderables.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Imported resources already have contents that we want to be copied to
texture resources derived from them. Set initial seqno of imported
resources to 1, just as if it had already been rendered to.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
For imported buffers that can't be used directly as a source to the
texture samplers, the pipe driver might need to create an internal
copy, for example in a different tiling layout. When buffers are
reimported they may contain new image data, so the driver internal
copies need to be recreated.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Add a hook to tell drivers that an imported resource may have changed
and they need to update their internal derived resources.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
We were including it once per value, so probably around 10k times.
Let's not cause the compiler any more work than we have to.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
It's harmless to use ALIGN_NPOT() for uncompressed formats
because they have block width/height = 1.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Fixes crash in piglit
`egl_khr_gl_renderbuffer_image-clear-shared-image GL_DEPTH_COMPONENT24`
on Skylake.
The crash happened because blorp attempted to execute a pending hiz
clear after the hiz buffer was deleted. Deleting the pending hiz ops
when the hiz buffer gets deleted fixes the crash.
For good measure, this patch also deletes all pending CCS/MCS ops when
the CCS/MCS buffer gets deleted. I'm now aware of any bugs
caused by the dangling ops, but deleting them is clearly the right thing
to do.
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99265
In situations where libdrm_amdgpu and mesa are installed to the same
location, the mesa installed headers will take precedence over the git
source headers.
This is due to the AMDGPU_CFLAGS containing the install directory.
This situation can cause build errors if the git version of a header is
newer than the currently installed version of a header (e.g. git pull
updates vulkan.h)
Note: using the same install prefix for mesa and libdrm is probably a
common occurrence since it is described in the radeonBuildHowTo wiki:
https://www.x.org/wiki/radeonBuildHowTo/
v2: added sign-off
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
PresentPixmap only works if the pixmap depth matches with the
window depth, otherwise it returns a BadMatch protocol error.
Even if the depths match, the result won't look correctly
if the VDPAU RGB component order doesn't match the X11 one so
we only allow the X11 format.
For other buffers we copy them to a buffer which is send to X.
v2: only send buffers with format VDP_RGBA_FORMAT_B8G8R8A8
v3: reword commit message
v4: add comment explaining the code
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
At least on VI, texture gather doesn't work with a 24_8 data format, so
use 8_8_8_8 and a modified swizzle instead.
A bit of background: When creating a GL_STENCIL_INDEX8 texture, we select
the X24S8 pipe format because we don't support stencil-only render targets
properly. With mip-mapping this can lead to a setup where the tiling is
incompatible with stencil texturing, and a flushed stencil texture is
used. For the flushed stencil, a literal X24S8 is used because there were
issues with an 8bpp DB->CB copy.
Longer term, it would be good if we could get away from these workarounds,
i.e. properly support an S8 format for stencil-only rendering and flushed
stencil. Since stencil texturing is somewhat rare, it's not a high
priority.
Fixes GL45-CTS.texture_cube_map_array.sampling.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
When the attachment type is NONE (att->Type),
FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE should be NONE always.
Note that technically, the current behaviour follows the spec. From
OpenGL 4.5 spec, Section 9.2.3 "Framebuffer Object Queries":
"If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is NONE, then
either no framebuffer is bound to target; or the default
framebuffer is bound, attachment is DEPTH or STENCIL, and the
number of depth or stencil bits, respectively, is zero."
Reading literally this paragraph, for the default framebuffer, NONE
should be only returned if attachment is DEPTH and STENCIL without
being allocated.
But it doesn't makes too much sense to return DEFAULT_FRAMEBUFFER if
the attachment type is NONE. For example, this can happens if the
attachment is FRONT_RIGHT run on monoscopic mode, as that attachment
is only available on stereo mode.
With the current behaviour, defensive querying of the object type
would not work properly. So you could query the object type checking
for NONE, get DEFAULT_FRAMEBUFFER, and then get and INVALID_OPERATION
when requesting other pnames (like RED_SIZE), as the real attachment
type is NONE.
This fixes:
GL45-CTS.direct_state_access.framebuffers_get_attachment_parameters
v2: don't change the behaviour for att->Type != GL_NONE, as caused
some ES CTS regressions
v3: simplify condition (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Here we remove the single use of this field in gl_linked_shader
which allows us to move the field out of gl_shader_info
While we are at it we rewrite link_xfb_stride_layout_qualifiers()
to be more clear.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
There is no reason for this to be in the shared gl_shader_info or
to copy it to gl_program at the end of linking (its already there).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is only used by gl_linked_shader as a temp during linking
so use a temp there instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is only used by gl_linked_shader as a temp during linking
so use a temp there instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is only used by gl_linked_shader as a temp during linking
so use a temp there instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is never used in gl_linked_shader other than as a temp
during linking so just use a temp instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We also move EarlyFragmentTests out of the gl_shader_info struct
as it is now only used by gl_shader.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
There is no need to go via the pointer in nir_shader. This change
is required for the shader cache as we don't create a nir_shader.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We only need to set it when linking was successful and the program
being linked is currently active.
The programs_in_use mask is just used as a flag for now but in
a future change we will use it to update the CurrentProgram array.
V2: make sure to flush vertices before linking (suggested by Marek)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
A later patch will result in SSO programs calling this helper
per gl_program rather than per gl_shader_program.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We also move NumProgramResourceList at the same time.
GLES does interface validation on SSO at runtime so we need to move
this to be able to switch to storing gl_program pointers in
CurrentProgram.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This allows us to cleanup the functions that pass this count around,
but more importantly we will be able to call the uniform linking
functions from that backends linker without having to pass this
information to the backend directly via Driver.LinkShader().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently this just breaks up the linking code a bit but in the
future i965 will call this from the backend via Driver.LinkShader()
so that we can do NIR optimisations before assigning uniform
locations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Blits do not need any special treatment as the target buffer
object is added to render cache just as one does for normal draw.
Color clears and resolves in turn require explicit "end of pipe
synchronization". It is not clear what this means exactly but the
assumption is that render cache flush with command stream stall
should be sufficient.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Current blorp logic issues unconditional "flush everything"
(see brw_emit_mi_flush()) after each render. For example, all
blits issue this unconditionally which shouldn't be needed if
they set render cache properly so that subsequent renders do
necessary flushing before drawing.
In case of piglit:
ext_framebuffer_multisample-accuracy all_samples depth_draw small
intel_hiz_exec() is always preceded by blorb blit and the
unconditional flush looks to hide the lack of stall and flushes
in depth clears. By removing the brw_emit_mi_flush() I get gpu
hangs.
This patch adds the stalls and flushes mandated by the spec
and gets rid of those hangs.
v2 (Jason, Ken): Document the rational for separating
depth cache flush and stall on Gen7.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Earlier commit imported a SHA1 implementation and relaxed the SHA1 and
disk cache handling, broking the Windows builds.
Restrict things for now until we get to a proper fix.
Fixes: d1efa09d34 "util: import sha1 implementation from OpenBSD"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-01-18 20:09:01 +00:00
1960 changed files with 150063 additions and 115273 deletions
# Install python wheels, necessary to install SCons via pip
- python -m pip install wheel
# Install SCons
- python -m pip install --egg scons==2.4.1
- python -m pip install scons==2.5.1
- scons --version
# Install flex/bison
- if not exist "%WINFLEXBISON_ARCHIVE%" appveyor DownloadFile "https://downloads.sourceforge.net/project/winflexbison/old_versions/%WINFLEXBISON_ARCHIVE%"
@@ -213,7 +213,7 @@ If you don't already have GLUT installed, you should grab
<h2>2.4 Where is the GLw library?</h2>
<p>
GLw (OpenGL widget library) is now available from a separate <ahref="http://cgit.freedesktop.org/mesa/glw/">git repository</a>. Unless you're using very old Xt/Motif applications with OpenGL, you shouldn't need it.
GLw (OpenGL widget library) is now available from a separate <ahref="https://cgit.freedesktop.org/mesa/glw/">git repository</a>. Unless you're using very old Xt/Motif applications with OpenGL, you shouldn't need it.
</p>
@@ -276,7 +276,7 @@ If you're using a hardware accelerated driver you want <code>direct rendering: Y
</p>
<p>
If your DRI-based driver isn't working, go to the
<ahref="http://dri.freedesktop.org/">DRI website</a> for trouble-shooting information.
<ahref="https://dri.freedesktop.org/">DRI website</a> for trouble-shooting information.
</p>
@@ -284,7 +284,7 @@ If your DRI-based driver isn't working, go to the
<p>
Make sure the ratio of the far to near clipping planes isn't too great.
<ahref="relnotes/17.0.5.html">Mesa 17.0.5</a> is released.
This is a bug-fix release.
</p>
<h2>April 17, 2017</h2>
<p>
<ahref="relnotes/17.0.4.html">Mesa 17.0.4</a> is released.
This is a bug-fix release.
</p>
<h2>April 1, 2017</h2>
<p>
<ahref="relnotes/17.0.3.html">Mesa 17.0.3</a> is released.
This is a bug-fix release.
</p>
<h2>March 20, 2017</h2>
<p>
<ahref="relnotes/13.0.6.html">Mesa 13.0.6</a> and
<ahref="relnotes/17.0.2.html">Mesa 17.0.2</a> are released.
These are bug-fix releases from the 13.0 and 17.0 branches, respectively.
<br>
NOTE: It is anticipated that 13.0.6 will be the final release in the 13.0
series. Users of 13.0 are encouraged to migrate to the 17.0 series in order
to obtain future fixes.
</p>
<h2>March 4, 2017</h2>
<p>
<ahref="relnotes/17.0.1.html">Mesa 17.0.1</a> is released.
This is a bug-fix release.
</p>
<h2>February 20, 2017</h2>
<p>
<ahref="relnotes/13.0.5.html">Mesa 13.0.5</a> is released.
This is a bug-fix release.
</p>
<h2>February 13, 2017</h2>
<p>
<ahref="relnotes/17.0.0.html">Mesa 17.0.0</a> is released. This is a
new development release. See the release notes for more information
about the release.
</p>
<h2>February 1, 2017</h2>
<p>
<ahref="relnotes/13.0.4.html">Mesa 13.0.4</a> is released.
This is a bug-fix release.
</p>
<h2>January 23, 2017</h2>
<p>
<ahref="relnotes/12.0.6.html">Mesa 12.0.6</a> is released.
This is a bug-fix release.
<br>
NOTE: This is an extra release for the 12.0 stable branch, as per developers'
feedback. It is anticipated that 12.0.6 will be the final release in the 12.0
series. Users of 12.0 are encouraged to migrate to the 13.0 series in order
to obtain future fixes.
</p>
<h2>January 5, 2017</h2>
<p>
<ahref="relnotes/13.0.3.html">Mesa 13.0.3</a> is released.
@@ -151,7 +217,7 @@ This is a bug-fix release.
</p>
<p>
Mesa demos 8.3.0 is also released.
See the <ahref="http://lists.freedesktop.org/archives/mesa-announce/2015-December/000191.html">announcement</a> for more information about the release.
See the <ahref="https://lists.freedesktop.org/archives/mesa-announce/2015-December/000191.html">announcement</a> for more information about the release.
You can download it from <ahref="ftp://ftp.freedesktop.org/pub/mesa/demos/8.3.0/">ftp.freedesktop.org/pub/mesa/demos/8.3.0/</a>.
</p>
@@ -466,7 +532,7 @@ This is a bug-fix release.
<p>
Mesa demos 8.2.0 is released.
See the <ahref="http://lists.freedesktop.org/archives/mesa-announce/2014-July/000100.html">announcement</a> for more information about the release.
See the <ahref="https://lists.freedesktop.org/archives/mesa-announce/2014-July/000100.html">announcement</a> for more information about the release.
You can download it from <ahref="ftp://ftp.freedesktop.org/pub/mesa/demos/8.2.0/">ftp.freedesktop.org/pub/mesa/demos/8.2.0/</a>.
</p>
@@ -645,7 +711,7 @@ This is a bug fix release.
<p>
Mesa demos 8.1.0 is released.
See the <ahref="http://lists.freedesktop.org/archives/mesa-dev/2013-February/035180.html">announcement</a> for more information about the release.
See the <ahref="https://lists.freedesktop.org/archives/mesa-dev/2013-February/035180.html">announcement</a> for more information about the release.
You can download it from <ahref="ftp://ftp.freedesktop.org/pub/mesa/demos/8.1.0/">ftp.freedesktop.org/pub/mesa/demos/8.1.0/</a>.
</p>
@@ -1341,7 +1407,7 @@ and primarily just incorporates bug fixes.
<h2>December 28, 2003</h2>
<p>
The Mesa CVS server has been moved to <ahref="http://www.freedesktop.org">
The Mesa CVS server has been moved to <ahref="https://www.freedesktop.org">
freedesktop.org</a> because of problems with SourceForge's anonymous
CVS service.
</p>
@@ -1913,7 +1979,7 @@ Here's what's new:</p>
</pre>
<h2>March 23, 2000</h2>
<p>I've just upload the Mesa 3.2 beta 1 files to SourceForge at <ahref="http://sourceforge.net/project/showfiles.php?group_id=3">http://sourceforge.net/project/filelist.php?group_id=3</a></p>
<p>I've just upload the Mesa 3.2 beta 1 files to SourceForge at <ahref="https://sourceforge.net/project/showfiles.php?group_id=3">https://sourceforge.net/project/filelist.php?group_id=3</a></p>
<p>3.2 (note even number) is a stabilization release of Mesa 3.1 meaning it's mainly
just bug fixes.</p>
<p>Here's what's changed:</p>
@@ -1961,7 +2027,7 @@ After 3.2 is wrapped up I hope to release 3.3 beta 1 soon afterward.</p>
<h2>December 17, 1999</h2>
<p>A Slashdot interview with Brian about Mesa (questions submitted by Slashdot readers)
can be found at <ahref="http://slashdot.org/interviews/99/12/17/0927212.shtml">http://slashdot.org/interviews/99/12/17/0927212.shtml</a>.</p>
can be found at <ahref="https://slashdot.org/interviews/99/12/17/0927212.shtml">https://slashdot.org/interviews/99/12/17/0927212.shtml</a>.</p>
<h2>December 14, 1999</h2>
<p>Mesa 3.1 is released!</p>
@@ -1995,7 +2061,7 @@ BOF meeting is now available.</p>
<p>-Brian</p>
<h2>August 14, 1999</h2>
<p><ahref="http://www.mesa3d.org">www.mesa3d.org</a> is having
<p><ahref="https://www.mesa3d.org">www.mesa3d.org</a> is having
technical problems due to hardware failures at VA Linux systems. The Mac pages,
ftp, and CVS services aren't fully restored yet. Please be patient.</p>
<p>-Brian</p>
@@ -2004,9 +2070,9 @@ ftp, and CVS services aren't fully restored yet. Please be patient.</p>
<p>RPMS of the nVidia RIVA server can be found at <code>ftp://ftp.mesa3d.org/mesa/misc/nVidia/</code>.</p>
<h2>June 2, 1999</h2>
<p><ahref="http://www.nvidia.com/">nVidia</a> has released some Linux binaries for
<p><ahref="https://www.nvidia.com/">nVidia</a> has released some Linux binaries for
xfree86 3.3.3.1, along with the <b>full source</b>, which includes GLX acceleration
based on Mesa 3.0. They can be downloaded from <code>http://www.nvidia.com/Products.nsf/htmlmedia/software_drivers.html</code>.</p>
based on Mesa 3.0. They can be downloaded from <code>https://www.nvidia.com/Products.nsf/htmlmedia/software_drivers.html</code>.</p>
<h2>May 24, 1999</h2>
<p>Beta 2 of Mesa 3.1 has been make available at <code>ftp://ftp.mesa3d.org/mesa/beta/</code>.
@@ -2054,11 +2120,11 @@ grateful.
<p>The new webpages are now online. Enjoy, and let me know if you find any errors.
<h2>February 16, 1999</h2>
<p><ahref="http://www.sgi.com/">SGI</a> releases its
this stand-alone example</a>. See the llvm-c/Core.h file for reference.
</li>
</ul>
@@ -264,18 +264,18 @@ for posterior analysis, e.g.:
<li>
<p>Rasterization</p>
<ul>
<li><ahref="http://www.cs.unc.edu/~olano/papers/2dh-tri/">Triangle Scan Conversion using 2D Homogeneous Coordinates</a></li>
<li><ahref="https://www.cs.unc.edu/~olano/papers/2dh-tri/">Triangle Scan Conversion using 2D Homogeneous Coordinates</a></li>
<li><ahref="http://www.drdobbs.com/parallel/rasterization-on-larrabee/217200602">Rasterization on Larrabee</a> (<ahref="http://devmaster.net/posts/2887/rasterization-on-larrabee">DevMaster copy</a>)</li>
<li><ahref="http://devmaster.net/posts/6133/rasterization-using-half-space-functions">Rasterization using half-space functions</a></li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92634">Bug 92634</a> - gallium's vl_mpeg12_decoder does not work with st/va</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94512">Bug 94512</a> - X segfaults with glx-tls enabled in a x32 environment</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94900">Bug 94900</a> - HD6950 GPU lockup loop with various steam games (octodad[always], saints row 4[always], dead island[always], grid autosport[sometimes])</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98263">Bug 98263</a> - [radv] The Talos Principle fails to launch with "Fatal error: Cannot set display mode."</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98914">Bug 98914</a> - mesa-vdpau-drivers: breaks vdpau for mpeg2video</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99144">Bug 99144</a> - Incorrect rendering using glDrawArraysInstancedBaseInstance and first != 0 on Skylake</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99154">Bug 99154</a> - Link time error when using multiple builtin functions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99158">Bug 99158</a> - vdpau segfaults and gpu locks with kodi on R9285</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98421">Bug 98421</a> - src/loader/loader.c:111:40: error: unknown type name ‘drmDevicePtr’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99532">Bug 99532</a> - Compute shader doesn't give right result under some circumstances</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99631">Bug 99631</a> - segfault with OSVRTrackerView and openscenegraph git master</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99633">Bug 99633</a> - rasterizer/core/clip.h:279:49: error: ‘const struct API_STATE’ has no member named ‘linkageCount’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99692">Bug 99692</a> - [radv] Mostly broken on Hawaii PRO/CIK ASICs</li>
</ul>
<h2>Changes</h2>
<p>Bartosz Tomczyk (2):</p>
<ul>
<li>r600: Fix stack overflow</li>
<li>r600/sb: Fix memory leak</li>
</ul>
<p>Bruce Cherniak (1):</p>
<ul>
<li>swr: [rasterizer core] Remove dead code Clipper::ClipScalar()</li>
</ul>
<p>Chad Versace (1):</p>
<ul>
<li>i965/mt: Disable HiZ when sharing depth buffer externally (v2)</li>
</ul>
<p>Dave Airlie (3):</p>
<ul>
<li>radv: change base aligmment for allocated memory.</li>
<li>radv: fix cik macroModeIndex.</li>
<li>radv: adopt some init config workarounds from radeonsi.</li>
</ul>
<p>Derek Foreman (1):</p>
<ul>
<li>egl/dri2: add image_loader_extension back into loader extensions for wayland</li>
</ul>
<p>Emil Velikov (26):</p>
<ul>
<li>docs: add sha256 checksums for 13.0.4</li>
<li>configure.ac: list radeon in --with-vulkan-drivers help string</li>
<li>i965: automake: correctly set MKDIR_GEN</li>
<li>freedreno: automake: correctly set MKDIR_GEN</li>
<li>i965: automake: include builddir prior to srcdir</li>
<li>i915: automake: include builddir prior to srcdir</li>
<li>egl: automake: include builddir prior to srcdir</li>
<li>clover: automake: include builddir prior to srcdir</li>
<li>st/dri: automake: include builddir prior to srcdir</li>
<li>d3dadapter9: automake: include builddir prior to srcdir</li>
<li>glx: automake: include builddir prior to srcdir</li>
<li>glx/apple: automake: include builddir prior to srcdir</li>
<li>glx/windows: automake: include builddir prior to srcdir</li>
<li>loader: automake: include builddir prior to srcdir</li>
<li>mapi: automake: include builddir prior to srcdir</li>
<li>radeon, r200: automake: include builddir prior to srcdir</li>
<li>dri/swrast: automake: include builddir prior to srcdir</li>
<li>dri/osmesa: automake: include builddir prior to srcdir</li>
<li>mesa/tests: automake: include builddir prior to srcdir</li>
<li>bin/get-extra-pick-list: use git merge-base to get the branchpoint</li>
<li>bin/get-extra-pick-list: rework to use already_picked list</li>
<li>bin/get-typod-pick-list.sh: limit `git grep ...' to only as needed</li>
<li>bin/get-pick-list.sh: limit `git grep ...' only as needed</li>
<li>bin/get-pick-list.sh: remove ancient way of nominating patches</li>
<li>bin/get-fixes-pick-list.sh: add new script</li>
<li>Update version to 13.0.5</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>vc4: Avoid emitting small immediates for UBO indirect load address guards.</li>
</ul>
<p>Hans de Goede (1):</p>
<ul>
<li>glx/glvnd: Fix GLXdispatchIndex sorting</li>
</ul>
<p>Ian Romanick (11):</p>
<ul>
<li>linker: Slight code rearrange to prevent duplication in the next commit</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68504">Bug 68504</a> - 9.2-rc1 workaround for clover build failure on ppc/altivec: cannot convert 'bool' to '__vector(4) __bool int' in return</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99456">Bug 99456</a> - Firefox crashing when opening about:support with WebGL2 enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99677">Bug 99677</a> - heap-use-after-free in glsl</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99715">Bug 99715</a> - Don't print: "Note: Buggy applications may crash, if they do please report to vendor"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99850">Bug 99850</a> - Tessellation bug on Carrizo</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100049">Bug 100049</a> - "ralloc: Make sure ralloc() allocations match malloc()'s alignment." causes seg fault in 32bit build</li>
</ul>
<h2>Changes</h2>
<p>Alex Smith (2):</p>
<ul>
<li>radv: Emit pending flushes before executing a secondary command buffer</li>
<li>radv: Flush before copying with PKT3_WRITE_DATA in CmdUpdateBuffer</li>
</ul>
<p>Bartosz Tomczyk (1):</p>
<ul>
<li>glsl: fix heap-buffer-overflow</li>
</ul>
<p>Bas Nieuwenhuizen (8):</p>
<ul>
<li>radv: Pass CMASK alignment to application.</li>
<li>radv: Pass DCC alignment to application.</li>
<li>radv: Never try to create more than max_sets descriptor sets.</li>
<li>radv: Reset emitted compute pipeline when calling secondary cmd buffer.</li>
<li>radv: Only use PKT3_OCCLUSION_QUERY when it doesn't hang.</li>
<li>radv: Use correct size for availability flag.</li>
<li>radv: Disable HTILE for textures with multiple layers/levels.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91281">Bug 91281</a> - Tonga VCE 2160p encode fails with BO to small for addr</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92234">Bug 92234</a> - [BDW] GPU hang in Shogun2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92634">Bug 92634</a> - gallium's vl_mpeg12_decoder does not work with st/va</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92760">Bug 92760</a> - Add FP64 support to the i965 shader backends</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92925">Bug 92925</a> - Incorrect GEN for ASTC in Surface Format Table</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93551">Bug 93551</a> - Divinity: Original Sin Enhanced Edition(Native) crash on start</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94512">Bug 94512</a> - X segfaults with glx-tls enabled in a x32 environment</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94900">Bug 94900</a> - HD6950 GPU lockup loop with various steam games (octodad[always], saints row 4[always], dead island[always], grid autosport[sometimes])</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=95460">Bug 95460</a> - Please add more drivers (freedreno, virgl) to features.txt status document</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=96959">Bug 96959</a> - nop.sat generated by pow workaround?</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97804">Bug 97804</a> - Later precision statement isn't overriding earlier one</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97952">Bug 97952</a> - /usr/include/string.h:518:12: error: exception specification in declaration does not match previous declaration</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98005">Bug 98005</a> - VCE dual instance encoding inconsistent since st/va: enable dual instances encode by sync surface</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98012">Bug 98012</a> - [IVB] Segfault when running Dolphin twice with Vulkan</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98134">Bug 98134</a> - dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.draw_buffers wants a different GL error code</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98172">Bug 98172</a> - Concurrent call to glClientWaitSync results in segfault in one of the waiters.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98238">Bug 98238</a> - witcher 2: objects are black when changing lod</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98245">Bug 98245</a> - GLES3.1 link negative dEQP "expected linking to fail, but passed."</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98263">Bug 98263</a> - [radv] The Talos Principle fails to launch with "Fatal error: Cannot set display mode."</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98297">Bug 98297</a> - Can't configure a desktop with 3x4k monitors in one row</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98421">Bug 98421</a> - src/loader/loader.c:111:40: error: unknown type name ‘drmDevicePtr’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99119">Bug 99119</a> - swr_fence_work.cpp(42): error: argument of type "std::nullptr_t" is incompatible with parameter of type "unsigned long"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99144">Bug 99144</a> - Incorrect rendering using glDrawArraysInstancedBaseInstance and first != 0 on Skylake</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99154">Bug 99154</a> - Link time error when using multiple builtin functions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99158">Bug 99158</a> - vdpau segfaults and gpu locks with kodi on R9285</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99214">Bug 99214</a> - Crash in library libswrAVX.so when assigning vertex buffer object pointers with elements of type GL_DOUBLE</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99219">Bug 99219</a> - The Stanley Parable GPU hang when starting a new game</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99229">Bug 99229</a> - [G33] thousands of tests crash</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99231">Bug 99231</a> - [HSW][i965] Crash in upload_3dstate_streamout()</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99303">Bug 99303</a> - [REGRESSION][BISECTED] DMs are crashing on start with "radeon"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99419">Bug 99419</a> - Crash(Segmentation fault) si_shader_select in Master Of Orion</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99450">Bug 99450</a> - [amdgpu] Payday 2 visual glitches on some models</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99451">Bug 99451</a> - polygon offset use after free</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99456">Bug 99456</a> - Firefox crashing when opening about:support with WebGL2 enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99631">Bug 99631</a> - segfault with OSVRTrackerView and openscenegraph git master</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99633">Bug 99633</a> - rasterizer/core/clip.h:279:49: error: ‘const struct API_STATE’ has no member named ‘linkageCount’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99637">Bug 99637</a> - VLC video has corrupted colors when using VDPAU output on Radeon SI</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68504">Bug 68504</a> - 9.2-rc1 workaround for clover build failure on ppc/altivec: cannot convert 'bool' to '__vector(4) __bool int' in return</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97988">Bug 97988</a> - [radeonsi] playing back videos with VDPAU exhibits deinterlacing/anti-aliasing issues not visible with VA-API</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99484">Bug 99484</a> - Crusader Kings 2 - Loading bars, siege bars, morale bars, etc. do not render correctly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99715">Bug 99715</a> - Don't print: "Note: Buggy applications may crash, if they do please report to vendor"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100049">Bug 100049</a> - "ralloc: Make sure ralloc() allocations match malloc()'s alignment." causes seg fault in 32bit build</li>
</ul>
<h2>Changes</h2>
<p>Alex Smith (3):</p>
<ul>
<li>radv: Emit pending flushes before executing a secondary command buffer</li>
<li>radv: Flush before copying with PKT3_WRITE_DATA in CmdUpdateBuffer</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99246">Bug 99246</a> - [d3dadapter+radeonsi & bisect] EVE-Online : hang on wormhole sight</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100061">Bug 100061</a> - LODQ instruction generated with invalid dst mask</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100182">Bug 100182</a> - Flickering in The Talos Principle on Sky Lake GT4.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100201">Bug 100201</a> - Windows scons build with MSVC toolchain and LLVM 4.0 fails</li>
</ul>
<h2>Changes</h2>
<p>Alex Deucher (1):</p>
<ul>
<li>radeonsi: add new polaris12 pci id</li>
</ul>
<p>Andres Gomez (5):</p>
<ul>
<li>glsl: on UBO/SSBOs link error reset the number of active blocks to 0</li>
<li>cherry-ignore: add the Invalidate L2 for TRANSFER_WRITE barriers fix</li>
<li>cherry-ignore: add the Flush after unmap in gbm/dri fix</li>
<li>cherry-ignore: corrected typo in the Flush after unmap in gbm/dri fix</li>
<li>Update version to 17.0.3</li>
</ul>
<p>Axel Davy (2):</p>
<ul>
<li>st/nine: Resolve deadlock in surface/volume dtors when using csmt</li>
<li>st/nine: Use atomics for available_texture_mem</li>
</ul>
<p>Bas Nieuwenhuizen (1):</p>
<ul>
<li>radv: flush DB cache before and after HTILE decompress.</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>radv: fix primitive reset index emission</li>
</ul>
<p>Emil Velikov (1):</p>
<ul>
<li>docs: add sha256 checksums for 17.0.2</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>st/mesa: set result writemask based on ir type</li>
</ul>
<p>Jan Vesely (1):</p>
<ul>
<li>clover: use pipe_resource references</li>
</ul>
<p>Jason Ekstrand (9):</p>
<ul>
<li>anv/query: Invalidate the correct range</li>
<li>anv/GetQueryPoolResults: Actually implement the spec</li>
<li>anv/image: Return early when unbinding an image</li>
<li>anv/query: Fix the location of timestamp availability</li>
<li>anv: Make anv_get_layerCount a macro</li>
<li>anv/blorp: Use anv_get_layerCount everywhere</li>
<li>anv/cmd_buffer: Apply flush operations prior to executing secondaries</li>
<li>anv/cmd_buffer: Fix bad indentation</li>
<li>anv: Flush caches prior to PIPELINE_SELECT on all gens</li>
</ul>
<p>José Fonseca (1):</p>
<ul>
<li>c11/threads: Include thr/xtimec.h for xtime definition when building with MSVC.</li>
</ul>
<p>Juan A. Suarez Romero (1):</p>
<ul>
<li>tests/cache_test: allow crossing mount points</li>
</ul>
<p>Karol Herbst (1):</p>
<ul>
<li>nvc0/ir: treat FMA like MAD for operand propagation</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965: Fall back to GL 4.2/4.3 on Haswell if the kernel isn't new enough.</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>radeonsi: don't hang on shader compile failure</li>
</ul>
<p>Matt Turner (1):</p>
<ul>
<li>i965/fs: Don't emit SEL instructions for type-converting MOVs.</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>intel: Correct the BDW surface state size</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>mesa/main: fix MultiDrawElements[BaseVertex] validation of primcount</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97524">Bug 97524</a> - Samplers referring to the same texture unit with different types should raise GL_INVALID_OPERATION</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (16):</p>
<ul>
<li>cherry-ignore: Add the pci_id into the shader cache UUID</li>
<li>cherry-ignore: fix crash if ctx torn down with no rendering</li>
<li>cherry-ignore: Fix typos.</li>
<li>cherry-ignore: Revert "etnaviv: Cannot render to rb-swapped formats"</li>
<li>cherry-ignore: Revert "i965/fs: Don't emit SEL instructions for type-converting MOVs."</li>
<li>cherry-ignore: fix typo in a2b10g10r10 fast clear calculation</li>
@@ -226,7 +226,7 @@ did not exist in the 7.10 release series at all.</p>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36086">Bug 36086</a> - [wine] Segfault r300_resource_copy_region with some wine apps and RADEON_HYPERZ</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36182">Bug 36182</a> - Game Trine from http://www.humblebundle.com/ needs ATI_draw_buffers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36182">Bug 36182</a> - Game Trine from https://www.humblebundle.com/ needs ATI_draw_buffers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36268">Bug 36268</a> - [r300g, bisected] minor flickering in Unigine Sanctuary</li>
AddrLibClassm_class;///< Store class type (HWL type)
LibClassm_class;///< Store class type (HWL type)
AddrChipFamilym_chipFamily;///< Chip family translated from the one in atiid.h
ChipFamilym_chipFamily;///< Chip family translated from the one in atiid.h
UINT_32m_chipRevision;///< Revision id from xxx_id.h
UINT_32m_chipRevision;///< Revision id from xxx_id.h
UINT_32m_version;///< Current version
UINT_32m_version;///< Current version
//
// Global parameters
//
ADDR_CONFIG_FLAGSm_configFlags;///< Global configuration flags. Note this is setup by
ConfigFlagsm_configFlags;///< Global configuration flags. Note this is setup by
/// AddrLib instead of Client except forceLinearAligned
UINT_32m_pipes;///< Number of pipes
UINT_32m_banks;///< Number of banks
UINT_32m_pipes;///< Number of pipes
UINT_32m_banks;///< Number of banks
/// For r800 this is MC_ARB_RAMCFG.NOOFBANK
/// Keep it here to do default parameter calculation
UINT_32m_pipeInterleaveBytes;
UINT_32m_pipeInterleaveBytes;
///< Specifies the size of contiguous address space
/// within each tiling pipe when making linear
/// accesses. (Formerly Group Size)
UINT_32m_rowSize;///< DRAM row size, in bytes
UINT_32m_rowSize;///< DRAM row size, in bytes
UINT_32m_minPitchAlignPixels;///< Minimum pitch alignment in pixels
UINT_32m_maxSamples;///< Max numSamples
UINT_32m_minPitchAlignPixels;///< Minimum pitch alignment in pixels
UINT_32m_maxSamples;///< Max numSamples
private:
AddrElemLib*m_pElemLib;///< Element Lib pointer
ElemLib*m_pElemLib;///< Element Lib pointer
};
AddrLib*AddrSIHwlInit(constAddrClient*pClient);
AddrLib*AddrCIHwlInit(constAddrClient*pClient);
Lib*SiHwlInit(constClient*pClient);
Lib*CiHwlInit(constClient*pClient);
Lib*Gfx9HwlInit(constClient*pClient);
}// Addr
#endif
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.