This is optional (and no CAP).
Implemented by radeonsi, ddebug, rbug, trace.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Some "standard" (_S) swizzle modes are displayable on Raven,
even though the micro tile mode says it's not displayable.
Expose the addrlib function to the driver.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The i965 driver has become dependent on x86 specific compiler builtin
functions, so ensure it's disabled for non-x86 builds.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rob Herring <robh@kernel.org>
If an RS blit is done with source exactly the same as destination, and
the hardware supports this, do an in-place resolve. This only fills in
tiles that have not been rendered to using information from the TS.
This is the same as the blob does and potentially saves significant
bandwidth when doing i.MX6qp scanout using PRE, and when rendering to
textures (though here using sampler TS would be even better).
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
`_EGLDriver *drv` is a freshly calloc()'ed object, memset(0)'ing some of
it is a no-op.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
If you set MESA_GLSL_CACHE_DISABLE, radv crashed.
Fixes: fd24be134f (radv: make use of on-disk cache)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This will allow us to emit the CLEAR_STATE packet instead
of a bunch of useless packets when doing CS initialization.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
If the app provided in-memory pipeline cache doesn't yet contain
what we are looking for, or it doesn't provide one at all then we
fallback to the on-disk cache.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is the drivers on-disk cache intended to be used as a
fallback as opposed to the pipeline cache provided by apps.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's not used -- DFRACEXP gets array indexes of its exponent out-parameter
lowered earlier -- and it wouldn't have worked correctly anyway when both
dst and dst1 use relative addressing.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Replace the undefined destination by a new temporary register.
Cleanup merge_two_dsts while we're at it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Make sure we actually allocate two adjacent TGSI temporaries. The
current code fails e.g. when an arithmetic operation has two
operands with indirect accesses.
I will send out a new piglit test
(arb_gpu_shader_int64/execution/indirect-array-two-accesses.shader_test)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It's not used, and the assignment for the TGSI case was incorrect
for sampler arrays.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
See the comment for the relevant spec quote.
Fixes dEQP-GLES31.functional.srgb_texture_decode.skip_decode.srgba8.texel_fetch
v2: note the interaction between ARB_bindless_texture and EXT_texture_sRGB_decode
as a TODO
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes sequences like:
1. Context 1 samples from texture with sRGB decode enabled
2. Context 2 samples from texture with sRGB decode disabled
3. Context 1 samples from texture with sRGB decode disabled
Previously, step 3 would see the prev_sRGBDecode value from context 2
and would incorrectly use the old sampler view with sRGB decode enabled.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Start building vertex shaders as simd16.
Disabled by default, set USE_SIMD16_SHADERS in knobs.h to experiment.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Increase the max allowed vector size from 256 to 512.
No piglit llvmpipe regressions running on avx2.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The original implementation allocated a new BO here, but we decided to
switch to intel_upload_space, which returns a reference to the current
upload BO. We accidentally kept the brw_bo_alloc, even though it's no
longer necessary - intel_upload_space will immediately unreference it,
causing us to allocate and immediately free a buffer.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Section 6.3.2 of the GL 4.5 spec says:
"Any GL command which attempts to read from, write to, or change
the state of a buffer object may generate an INVALID_OPERATION error
if all or part of the buffer object is mapped ... However, only
commands which explicitly describe this error are required to do so.
If an error is not generated, such commands will have undefined
results and may result in GL interruption or termination."
Setting this flag allows us to skip walking over the buffer bindings
for every enabled vertex attribute (_mesa_all_buffers_are_unmapped).
Improves performance in GFXBench4's gl_driver2_off microbenchmark by
3.05797% +/- 0.709031% (n=33) on Apollolake.
This breaks KHR-*.draw_elements_base_vertex_tests.invalid_mapped_bos,
but that test is invalid and has been removed from the upstream CTS.
Reviewed-by: Eric Anholt <eric@anholt.net>
That requires a generated header that was rolled into a loop.
fixes: a47c525f32 ("meson: build glx")
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously buffer offsets were passed in explicitly as an offset, which
had to be added to the resource address. Now they are passed in via an
increased 'start' parameter. As a result, we were double-adding the
start offset in this kind of situation.
This condition was triggered by piglit's draw-elements test which has a
requisite glMultiDrawElements in combination with a small enough number
of vertices to go through the immediate push path.
Fixes: 330d0607ed ("gallium: remove pipe_index_buffer and set_index_buffer")
Reported-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Commit 06bfb2d28f ("r600: fork and import gallium/radeon") broke the
Android build:
external/mesa3d/src/gallium/drivers/radeon/r600_pipe_common.c:43:10: fatal error: 'llvm-c/TargetMachine.h' file not found
^~~~~~~~~~~~~~~~~~~~~~~~
Update the Android makefiles so that drivers/radeon is only built when
radeonsi (and therefore LLVM) is enabled.
Fixes: 06bfb2d28f (r600: fork and import gallium/radeon)
Acked-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
As part of Treble project in Android O, all the device specific files have
to be located in a separate vendor partition. This is done by setting
LOCAL_PROPRIETARY_MODULE (the name is misleading). This change will not
break existing platforms without a vendor partition as it will just move
files to /system/vendor.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Rob Herring <robh@kernel.org>
ARB_enhanced_layouts allows multiple output variables to share the same
location - and these variables may not have the same sizes. For
example, consider these output variables:
// consume X/Y/Z components of 6 vectors
layout(location = 0) out vec3 a[6];
// consumes W component of the first vector
layout(location = 0, component = 3) out float b;
Looking at the first declaration, we see that VARYING_SLOT_VAR0 needs 24
components worth of space (vec3 padded out to a vec4, 4 * 6 = 24). But
looking at the second declaration, we would think that VARYING_SLOT_VAR0
needs only 4 components of space (a single float padded out to a vec4).
nir_setup_outputs() only considered the space requirements of the first
declaration it happened to see, so if 'float b' came first, it would
underallocate the output register space, causing brw_fs_validator.cpp
to assert fail about inst->dst.offset exceeding the register size.
Fixes Piglit's tests/spec/arb_enhanced_layouts/execution/component-layout/
vs-to-fs-array-interleave-single-location.shader_test.
Thanks to Tim Arceri for finding this bug and writing a test!
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
With the ssao demo from Vulkan demos:
radv/rx480: 440->440fps
anv/haswell: 24->34 fps
The demo does a 0->32 loop across a ubo with 32 members.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This assert was firing just running demos.
Jason said it should be this.
Fixes: 6c7720ed78 (anv/wsi: Allocate enough memory for the entire image)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
this adds automatic size support to the atomic buffer code,
but also realigns the code to act like the ubo/ssbo code.
v1.1:
add missing blank lines.
reindent one block properly.
check for NullBufferObj.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks has a MOV that hits
this validation path. MOVs don't have a src1 file, but calling
brw_inst_src1_type() was tripping on src1.file being BRW_IMMEDIATE_VALUE
and the hw_type being something invalid for immediates.
To work around this, just pretend src1 is src0 if there isn't a src1.
Fixes: 2572c2771d (i965: Validate "Special
Requirements for Handling Double Precision Data Types")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102680
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Fixes 'KHR-GL45.copy_image.functional' on Nouveau and i965.
v2: (by Kenneth Graunke)
Rewrite patch according to Jason Ekstrand's review feedback.
This makes it handle differing strides, which i965 needed.
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Older kernels fail the va_op with this flag set. If the kernel
supports GFX9 usefully, it will also support this flag.
Fixes: e8d57802fe "radv/gfx9: allocate events from uncached VA space"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Jason and I investigated several OpenGL CTS failures where the tests
bind the same texture for rendering and texturing, at the same time.
This has defined results as long as the reads happen before writes,
or the regions are non-overlapping. Normally, this just works out.
However, CCS can cause problems. If the shader is reading one set of
pixels, and writing to different pixels that are adjacent, they may end
up being covered by the same CCS block. So rendering may be writing a
CCS block, while the sampler is trying to read it. Corruption ensues.
Disabling CCS is unfortunate, but safe.
Fixes several KHR-GL45.texture_barrier.* subtests.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This lowers ffma to a * b + c.
This seems like it should keep Marek happiest, so
we'd never get to the fma instruction emission code.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
So it appears the Vulkan SPIR-V fma opcode can be equivalent to a
mad operation, and the fma hw opcode on AMD hw is issued like a double
opcode so is slower. Also the radeonsi stack does this.
This appears to improve performance on a number of games from Feral,
and thanks to Feral for noticing the problem.
I'm reposting this one as Marek indicated he thinks this is what
we should be doing on AMD hw.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The HW will halt when you hit a HALT packet, or when you hit the end
address. Tell CLIF if there's an end address is so that it can stop
correctly. (There was usually a 0 byte after the CL, so it would stop
anyway).
In order to keep early-Z from writing early in a discard shader, you need
to set the "modifies Z" bit in the shader state (which the new
prog_data.discards will indicate). Then, in the shader we do a TLB write
to make Z passthrough happen (the QPU result is ignored, so we use a NULL
source).
I had base_vertex hacked into the shader state setup like in vc4, but it's
not correct for big offsets. Using the proper packet is easier and
hopefully means we can re-emit shader state setup less frequently.
These existed so I could unpack just the sub-id field to switch on in the
old manual CLIF dumper. The new codegen handles sub-id automatically, but
only if these stub packets aren't there with an implicit sub-id=0.
V3D 3.3 is a continuation of the 3D implementation in VC4 (v2.1 and v2.6).
V3D 3.3 introduces an MMU (no more CMA allocations) and support for
GLES3.1. This driver is not currently conformant, though that will be a
target as soon as possible.
V3D 3.x parts use a new texture tiling layout common across many Broadcom
graphics parts including and the HVS scanout engine. It also massively
changes the QPU instructions, introducing a common physical register file
(no more A/B split) and half-float instructions, while removing the 4x8
unorm instructions in favor of half-float for talking to fixed function
interfaces. Because so much has changed, vc5 is implemented in a separate
gallium driver, using only the XML code-generation support from vc4.
v2: Fix tile layout for 64bpp textures. Fix texture swizzling for 32-bit
returns. Fix up a bit of MRT setup. Sync the simulator to kernel
behavior a bit more. Improve uniform debugging code. Rebase on
QIR->VIR rename. Move texture state mostly to the CSOs. Improve
cache flushing on the simulator. Fix program deletion
use-after-frees.
Acked-by: Dave Airlie <airlied@gmail.com> (uabi plan)
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> (uabi plan)
This is a pretty straightforward fork of VC4's NIR compiler to VC5. The
condition codes, registers, and I/O have all changed, making the backend
hard to share, though their heritage is still recognizable.
v2: Move to src/broadcom/compiler to match intel's layout, rename more
"vc5" to "v3d", rename QIR to VIR ("V3D IR") to avoid symbol conflicts
with vc4, use new v3d_debug header, add compiler init/free functions,
do texture swizzling in NIR to allow optimization.
This will be usable with "VC5_DEBUG=cl" on the vc5 driver to stream a CLIF
file (the Broadcom equivalent of i965's AUB) to stderr. I haven't tested
that this is actually usable with the internal CLIF-consuming tools, but
is close enough as a baseline and is useful for visually inspecting the
command stream.
Unlike VC4, I've defined an unpacked instruction format with pack/unpack
functions to convert to 64-bit encoded instructions. This will let us
incrementally put together our instructions and validate them in a more
natural way than the QPU_GET_FIELD/QPU_SET_FIELD used to.
The pack/unpack unfortuantely are written by hand. While I could define
genxml for parts of it, there are many special cases (like operand order
of commutative binops choosing which binop is being performed!) and it
probably wouldn't come out much cleaner.
The disasm unit test ensures that we have the same assembly format as
Broadcom's internal tools, other than whitespace changes.
v2: Fix automake variable redefinition complaints, add test to .gitignore
Unlike vc4, where the compiler and gallium driver live together, for vc5
the compiler will live up in the shared broadcom directory, and need
access to the debug flags. Define a set of debug flags and helpers there,
so it can be shared between compiler, vc5, and vulkan.
My intent is to develop the vc5 driver in-tree for some time to build the
CL generation and shader compiler code, and keep out-of-tree patches for
talking to an actual kernel driver until the kernel driver can be
stabilized on the hardware.
v2: Define a HAVE_BROADCOM_DRIVERS, like HAVE_INTEL or HAVE_AMD.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
I've been doing this inside of vc4, but vc5 wants it as well and it may be
useful for other drivers (Intel has a related path for pre-gen6 with MRT,
and freedreno had a TGSI path for it at one point).
This required defining a common enum for the standard comparison
functions, but other lowering passes are likely to also want that enum.
v2: Add to meson.build as well.
Acked-by: Rob Clark <robdclark@gmail.com>
The intent is to use this extension on vc4 to allow X11 to do overlapping
CopyArea() within a pixmap without first blitting the pixmap to a
temporary. With associated glamor patches, improves x11perf
-copywinwin100 performance on a Raspberry Pi 3 from ~4700/sec to
~5130/sec, and is an even larger boost to uncomposited window movement
performance (most copywinwin100 copies don't overlap).
v2: Fix glIsEnabled() on the new enums.
v3: Drop the local spec since I'm upstreaming the spec.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Because vc4 can control the order that tiles are rasterized in, we can use
it to implement overlapping blits using normal drawing and
GL_ARB_texture_barrier, as long as we can tell the kernel what order to
render the tiles in.
v2: Fix on the simulator.
v3: Add the cap (disabled) to other drivers, add rst docs for the cap.
v4: Rebase on PIPE_CAP_TGSI_ANY_REG_AS_ADDRESS
v5: Split from the core gallium commit, drop some unnecessary code related
to glBlitFramebuffer(), fix a crash with clears before state has been
bound.
Because vc4 can control the order that tiles are rasterized in, we can use
it to implement overlapping blits using normal drawing and
GL_ARB_texture_barrier, as long as we can tell the kernel what order to
render the tiles in.
This commit introduces the core gallium support, vc4 changes will follow.
v2: Fix on the simulator.
v3: Add the cap (disabled) to other drivers, add rst docs for the cap.
v4: Rebase on PIPE_CAP_TGSI_ANY_REG_AS_ADDRESS
v5: Drop vc4 changes from this commit, for clarity.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v3)
Noticed that we had two 0x8bb4 in the spec while grepping to find an open
slot in the MESA enums set. gl.xml had the right value.
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If one uses a parent build script to download/build Mesa we may not
have a full git repository (maybe a tar archive) so the 'git rev-parse'
command will fail.
This updates the script to look for a MESA_GIT_SHA1_OVERRIDE env var.
If it's set, use that sha1 instead of using git rev-parse. With this
change we can put a git hash in the GL_VERSION string even when we
don't have a git repo.
v2: incorporate Dylan's suggestions to simplify the code
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
v2: use !! in the function to be explicit about type conversion. Though,
gcc generates the same code with or without the logical !!.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Try to start removing things from the cluttered imports.h file.
v2: add new header to Makefile.sources
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Before we were doing RGBA4 on GLES3 only, but as of GLES2 2.0.22 it should
be RGBA4 as well. Fixes DEQP
functional.state_query.rbo.renderbuffer_internal_format.
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This extension is effectively a backport of GLES3's internalformat
handling to GLES 1/2. It guarantees that sized internalformats specified
for textures and renderbuffers have at least the specified size stored.
That's a pretty minimal requirement, so I think it can be dummy_true and
exposed as a standard in Mesa.
As a side effect, it also allows GL_RGB565 to be specified as a texture
format, not just as a renderbuffer. Mesa had previously been allowing 565
textures, which angered DEQP in the absence of this extension being
exposed.
v2: Allow 2101010rev with sized internalformats even on GLES3, citing the
extension spec. Extend extension checks for GLES2 contexts exposing
with texture_float, texture_half_float, and texture_rg.
v3: Fix ALPHA/LUMINANCE/LUMINANCE_ALPHA error checking (GLES3 CTS
failures)
v4: Mark GL_RGB10 non-color-renderable on ES, fix A/L/LA errors on GLES2
with float formats.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Previously, we were downconverting to 8888 automatically if the hardware
didn't suport it. However, with the advent of
GL_OES_required_internalformat, we have to actually store the
internalformats we advertise support for. And, it seems rather
disingenuous to advertise the extension if we don't actually support it.
v2: Throw an error when using the format on ES2 without the extension present.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is how VC4 stores 5551 textures, which we need to support for
GL_OES_required_internalformat.
v2: Extend commit message, fix svga driver build, add BE ordering from
Roland.
v3: Rebase on PIPE_FORMAT_R10G10B10X2_UNORM addition.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v2)
For supporting RGB5 in hardware with A in the low bit (vc4), we need this
format as well.
v2: Add proper _mesa_format_matches_format_and_type() support (from
Nicolai).
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
We ought to be able to distinguish between allocation errors and bad
parameters (non-existent renderbuffer object).
Bumps the version of the DRI Image extension to 17.
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Applications might pass in a buffer that is sized too large and rely
on the extra space of the buffer not being overwritten.
Fixes dEQP-GLES31.functional.state_query.internal_format.partial_query.num_sample_counts
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The uploaders can own transfers which need to be unmapped. Destroy them
before the final sync (they're not used from the driver thread anyway)
so that the transfer_unmap call is processed by the driver.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In GL state, textures created from EGL images look like plain 2D textures
with a single level, so we use the existing layer_override facility and
add an analogous level_override one.
Fixes dEQP-EGL.functional.image.create.gles2_cubemap_{positive,negative}_{x,y,z}_rgba_texture
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This can happen with surface-based texture objects derived from EGL
images, since those aren't immutable.
Fixes tests in dEQP-EGL.functional.sharing.gles2.multithread.random.images.teximage2d.* and others
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Unlike uniforms, the limit on shared memory size is not called out
explicitly in the list of things that cause linker errors, but presumably
that's just an oversight in the spec.
Fixes dEQP-GLES31.functional.debug.negative_coverage.{callbacks,get_error,log}.compute.exceed_shared_memory_size_limit
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Now that the real meaning of the 2 bits in PA_SYSTEM_MODE is known,
we can set them according to the rasterizer state, which fixes uses
that are setting provoking vertex first.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
It turned out not to be a hardware bug, but the shader compiler
emitting wrong varying component use information. With that fixed
we can turn flat shading back on.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
It seems that newer cores don't use the PA_ATTRIBUTES to decide if the
varying should bypass the flat shading, but derive this from the component
use. This fixes flat shading on GC880+.
VARYING_COMPONENT_USE_POINTCOORD is a bit of a misnomer now, as it isn't
only used for pointcoords, but missing a better name I left it as-is.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
The logic to decide if we need to flush the GPU command stream was broken
and hard to reason about. Fix and clarify this.
Fixes the data sync subtests from piglit arb_vertex_buffer_object.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
When computing the total size of the URB for tessellation evaluation
inputs we were not accounting for this, and instead we were always
assuming that each input would take a single vec4 slot, which could
lead to computing a smaller read size than required. Specifically, this
is a problem when the last input is a dvec3/4 such that its XY components
are stored in the the second half of a payload register (which can happen
if the offset for the input in the URB is not 64-bit aligned because
there are 32-bit inputs mixed in) and the ZW components in the
first half of the next, as in this case we would fail to account for the
extra slot required for the ZW components.
Fixes (requires another fix in CTS currently in review):
KHR-GL45.enhanced_layouts.varying_locations
KHR-GL45.enhanced_layouts.varying_array_locations
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
TGSI was adjusted to always pass in 64-bit integers but nouveau was left
with the old semantics. Update to the new thing.
Fixes: d10fbe5159 (st/glsl_to_tgsi: fix 64-bit integer bit shifts)
Reported-by: Karol Herbst <karolherbst@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
c7affbf687 enabled GLSLOptimizeConservatively on some
drivers. The idea was to speed up compile times by running
the GLSL IR passes only once each time do_common_optimization()
is called. However loop unrolling can create a big mess and
with large loops can actually case compile times to increase
significantly due to a bunch of redundant if statements being
propagated to other IRs.
Here we make sure to clean things up before moving on.
There was no measureable difference in shader-db compile times,
but it makes compile times of some piglit tests go from a couple
of seconds to basically instant.
The shader-db results seemed positive also:
Totals:
SGPRS: 2829456 -> 2828376 (-0.04 %)
VGPRS: 1720793 -> 1721457 (0.04 %)
Spilled SGPRs: 7707 -> 7707 (0.00 %)
Spilled VGPRs: 33 -> 33 (0.00 %)
Private memory VGPRs: 3140 -> 2060 (-34.39 %)
Scratch size: 3308 -> 2180 (-34.10 %) dwords per thread
Code Size: 79441464 -> 79214616 (-0.29 %) bytes
LDS: 436 -> 436 (0.00 %) blocks
Max Waves: 558670 -> 558571 (-0.02 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
The old code assumed that loop terminators will always be at
the start of the loop, resulting in otherwise unrollable
loops not being unrolled at all. For example the current
code would unroll:
int j = 0;
do {
if (j > 5)
break;
... do stuff ...
j++;
} while (j < 4);
But would fail to unroll the following as no iteration limit was
calculated because it failed to find the terminator:
int j = 0;
do {
... do stuff ...
j++;
} while (j < 4);
Also we would fail to unroll the following as we ended up
calculating the iteration limit as 6 rather than 4. The unroll
code then assumed we had 3 terminators rather the 2 as it
wasn't able to determine that "if (j > 5)" was redundant.
int j = 0;
do {
if (j > 5)
break;
... do stuff ...
if (bool(i))
break;
j++;
} while (j < 4);
This patch changes this pass to be more like the NIR unrolling pass.
With this change we handle loop terminators correctly and also
handle cases where the terminators have instructions in their
branches other than a break.
V2:
- fixed regression where loops with a break in else were never
unrolled in v1.
- fixed confusing/wrong naming of bools in complex unrolling.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
do-while loops can increment the starting value before the
condition is checked. e.g.
do {
ndx++;
} while (ndx < 3);
This commit changes the code to detect this and reduces the
iteration count by 1 if found.
V2: fix terminator spelling
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
These instructions will be executed on every iteration of the loop
we cannot drop them.
V2:
- move removal of unreachable terminators from the terminator list
to the same place they are removed from the IR as suggested by
Nicolai.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This pulls in tons of extra dependencies because the tests are not
properly guarded.
v2: - Put this patch before the one that adds a loader/dri test for
meson
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
This adds support for building the classic swrast implementation. This
driver has been tested with glxinfo and glxgears.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This doesn't include egl support, just dri support.
v2: - when gbm is set to 'auto', only build if a dri driver is also
enabled
- Fix conditional to check for x11 modules with vulkan as well as
with dri drivers
v3: - Set pkgconfig libraries.private value
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: - drop with_ from dri_drivers_path variable (Eric A)
v3: - Move HAVE_X11_PLATFORM to the proper patch (Eric A)
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This gets GLX and the loader building. The resulting GLX and i965 have
been tested on piglit and seem to work fine. This patch leaves a lot of
todo's in it's wake, GLX is quite complicated, and the build options
involved are many, and the goal at the moment is to get dri and gallium
drivers building.
v2: - fix typo "vaule" -> "value"
- put the not on the correct element of the conditional
- Put correct description of dri3 option in this patch not the next
one (Eric A)
- fix non glvnd version (Eric A)
- build glx tests
- move loader include variables to this patch (Eric A)
v3: - set the version correctly for GL_LIB_NAME in libglx
v4: - set pkgconfig private fields
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This gets pretty much the entire classic tree building, as well as
i965, including the various glapis. There are some workarounds for bugs
that are fixed in meson 0.43.0, which is due out on October 8th.
I have tested this with piglit using glx.
v2: - fix typo "vaule" -> "value"
- use gtest dep instead of linking to libgtest (rebase error)
- use gtest dep instead of linking against libgtest (rebase error)
- copy the megadriver, then create hard links from that, then delete
the megadriver. This matches the behavior of the autotools build.
(Eric A)
- Use host_machine instead of target_machine (Eric A)
- Put a comment in the right place (Eric A)
- Don't have two variables for the same information (Eric A)
- Put pre_args at top of file in this patch (Eric A)
- Fix glx generators in this patch instead of next (Eric A)
- Remove -DMESON hack (Eric A)
- add sha1_h to mesa in this patch (Eric A)
- Put generators in loops when possible to reduce code in
mapi/glapi/gen (Eric A)
v3: - put HAVE_X11_PLATFORM in this patch
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This ends up being unworkable as more options get added, and with
description wrapped onto a new line it doesn't improve readability
anyway.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
And add a todo about clover, r600, and radeonsi, which also need libelf.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This was missed in a rebase, and doesn't affect radv or anv, only i965.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This has the same problem as the previous commit, generated headers and
hardcoded paths.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Traversing back through includes is bad idea and should be avoided.
In the case here - indirect_size.h is located in the build directory
$(top_builddir)/src/glx/.
v3: - Update commit message with message provided by Emil
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Saves us from calling util_query_clear_result(..) in every query
type implementation.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
We want the same active handling for every query type. So lets
handle it in the generic layer.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
Now that Marek has split the two drivers apart, drop a bunch
of unnecessary code from the r600 half. There is probably a bunch
more hiding in the video code.
No piglit regressions on caicos.
v2: fix HAVE_LLVM protected code
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Our driver implementation is known to decrease performance for some tests,
but we don't know if any apps and benchmarks (e.g. those tested by Phoronix)
are affected. This disables the feature just to be safe.
Set this to enable partial primitive binning:
R600_DEBUG=dpbb
Set this to enable full primitive binning:
R600_DEBUG=dpbb,dfsm
v2: add new debug options
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
These registers don't change during the lifetime of the
command buffer, there is no need to re-emit them when
binding a new pipeline.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Previously, we allocated memory for image->plane[0].surface.isl.size
which is great if there is no compression. However, on BDW, we can do
CCS_D on X-tiled images so we also have to allocate space for the
auxiliary buffer. This fixes hangs in some of the WSI CTS tests and
should also reduce hangs in real applications. In particular, it fixes
the dEQP-VK.wsi.*.incremental_present.* test group.
When we hand the image off to X11 or Wayland, it will ignore the CCS
entirely which is ok because we do a resolve when it's transitioned to
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
All over mesa we include "nir/nir.h", we should probably do the same
here. This fixes the meson build that was broken by the ycbcr series.
Thanks to Dylan for finding the issue.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: f3e91e78a3 ("anv: add nir lowering pass for ycbcr textures")
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When a texture is immutable, we can't tack on extra levels
after-the-fact like we could with glTexImage. So check against that
level limit and return an error if it's surpassed.
This fixes:
KHR-GL45.geometry_shader.layered_fbo.fb_texture_invalid_level_number
(Based on a patch by Ilia Mirkin.)
Reviewed-by: Antia Puentes <apuentes@igalia.com> [imirkin v2]
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This approach allows drivers to set their own vertex shader and skip
compilation of u_blitter vertex shaders.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is a new interface in libva2 to support wider use-cases of passing
surfaces to external APIs. In particular, this allows export of NV12 and
P010 surfaces.
v2: Convert surfaces to progressive before exporting them (Christian).
v3: Set destination rectangle to match source when converting (Leo).
Add guards to allow building with libva1.
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-and-Tested-by: Leo Liu <leo.liu@amd.com>
The intrinsic is gone, causing shader compilation to crash.
While here, also change the fallback code to match what llvm's auto-updater
of these intrinsics would do (except that there will still be zext/trunc
instructions in there), which should ensure that the sequence gets recognized
and fused back into a pabs in the end (I didn't test this, and it's possible
even the old sequence would get recognized, but I don't see a reason why we
shouldn't use the same sequence in any case).
Tested-by: Vinson Lee <vlee@freedesktop.org>
Causing a crash in ParaView waveletcontour.py test when
_DEBUG defined due to vector aligned copy with unaligned
address.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This ensures that everything gets cleaned up properly. In particular,
it fixes a memory leak where we were leaking the push constants
structs.
Valgrind stats on
dEQP-VK.pipeline.push_constant.graphics_pipeline.range_size_128 :
Before:
HEAP SUMMARY:
in use at exit: 2,467,513 bytes in 1,305 blocks
total heap usage: 697,853 allocs, 696,530 frees, 138,466,600 bytes allocated
LEAK SUMMARY:
definitely lost: 1,068 bytes in 11 blocks
indirectly lost: 24,669 bytes in 412 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 2,441,776 bytes in 882 blocks
suppressed: 0 bytes in 0 blocks
After:
HEAP SUMMARY:
in use at exit: 2,467,381 bytes in 1,304 blocks
total heap usage: 697,853 allocs, 696,531 frees, 138,466,600 bytes allocated
LEAK SUMMARY:
definitely lost: 936 bytes in 10 blocks
indirectly lost: 24,669 bytes in 412 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 2,441,776 bytes in 882 blocks
suppressed: 0 bytes in 0 blocks
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
When writing to set > 0, we were just wrongly writing to set 0. This
commit fixes this by lazily allocating each set as we write to them.
We didn't go for having them directly into the command buffer as this
would require an additional ~45Kb per command buffer.
v2: Allocate push descriptors from system memory rather than in BO
streams. (Lionel)
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Fixes: 9f60ed98e5 ("anv: add VK_KHR_push_descriptor support")
Reported-by: Daniel Ribeiro Maciel <daniel.maciel@gmail.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This change introduce the concept of planes for image & views. It
matches the planes available in new formats.
We also refactor depth & stencil support through the usage of planes
for the sake of uniformity. In the backend (genX_cmd_buffer.c) we have
to take some care though with regard to auxilliary surfaces.
Multiplanar color buffers can have multiple auxilliary surfaces but
depth & stencil share the same HiZ one (only store in the depth
plane).
v2: by Jason
Remove unused aspect parameters from anv_blorp.c
Assert when attempting to resolve YUV images
Drop redundant logic for plane offset in make_surface()
Rework anv_foreach_plane_aspect_bit()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
A good chunk of anv_blorp just wants the aux usage from the image. This
magic aux_usage value means just that.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This pass implements all the implicit conversions required by the
VK_KHR_sampler_ycbcr_conversion specification.
It also inserts plane sources onto sampling instructions that we then
let the pipeline layout pass deal with, when mapping things correctly
to descriptors.
v2: Add new file to meson build (Lionel)
Use nir_frcp() rather than (1.0f / x) (Jason)
Reuse nir_tex_instr_dest_size() rather than handwritten one (Jason)
Return progress (Jason)
Account for array of samplers (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
New settings from the KHR_sampler_ycbcr_conversion specifications
might require different sampler settings for luma and chroma planes.
This change makes the sampler table emission ready to handle multiple
planes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
A given Vulkan format can now be decomposed into a set of planes. We
now use 'struct anv_format_plane' to represent the format of those
planes.
v2: by Jason
Rename anv_get_plane_format() to anv_get_format_plane()
Don't rename anv_get_isl_format()
Replace ds_fmt() by fmt2()
Introduce fmt_unsupported()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Newer format enums start at offset 1000000000, making it impossible to
have them all in one table. This change splits the formats into sets
that we then access through indirection.
v2: rename format_extract to vk_to_anv_format (Chad/Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
And merge radv_meta_save_novertex() with
radv_meta_save_graphics_reset_vport_scissor_novertex().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will allow us to save/restore the different states on-demand
based on the meta operation. For now, this saves/restores all
states. Compute will follow once the graphics part is done.
The main idea is to merge all save/restore helpers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Change b3a44ae7a4 caused regressions on Android where DRI and renderbuffer
can disagree on the format being used. This patch removes the colorspace
parameter and instead we pass renderbuffer format. For non-winsys images we
still do srgb/linear modification in same manner as change b3a44ae7a4 wanted
but take format from renderbuffer instead of DRI image.
This patch fixes regressions seen with following test sets:
dEQP-EGL.functional.color_clears*
dEQP-EGL.functional.render*
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102999
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Found while trying to optimize an application.
Not observed to help performance on i965, but should at least reduce
the memory usage of such textures a bit.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
If the DRM_VC4_GET_TILING ioctl isn't present then we can't tell
if a dmabuf bo is tiled or linear, so will always assume it's
linear.
By not advertising tiled formats in this situation we ensure the
assumption is correct.
This fixes a bug where most attempts to render a gl wayland client
under weston will result in a client side abort.
Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Daniel Stone <daniels@collabora.com> (on irc)
"Driver" isn't a great word for what this layer is, it's effectively a
build-time choice about what OS you're targeting. Despite that both of
the extant backends totally ignore the display argument, the old code
would only set up the backend relative to a display.
That causes problems! One problem is it means eglGetProcAddress can
generate X or Wayland protocol when it tries to connect to a default
display so it can call into the backend, which is, you know, completely
bonkers. Any other EGL API that doesn't reference a display, like
EGL_EXT_device_query, would have the same issue.
Fortunately this is a problem that can be solved with the delete key.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Avoid freeing buffers holding new back content
(with GLX_SWAP_COPY_OML and GLX_SWAP_EXCHANGE_OML)
Prevously that would have resulted in back buffer content becoming
incorrect after a swap, although I haven't managed to trigger such a
situation yet.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Resize only in loader_dri3_get_buffers(),
where the dri driver has a chance to immediately update the viewport.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
When a drawable is resized, and we fill the resized buffers, with data
from the old buffers, use a local blit if there is a local buffer (back or
fake front), and we have local blitting capability.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
In check_os_altivec_support(), allow control of Altivec (first PPC vector
instruction set) code generation via a new environmental control,
GALLIVM_ALTIVEC, which is expected to take on a value of 1 or 0.
The default is to enable Altivec code generation.
This environmental control of Altivec code generation is initially
available only #ifdef DEBUG.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
In init_native_targets, allow the passing of additional options to
the LLC compiler via new GALLIVM_LLC_OPTIONS environmental control.
This option is available only #ifdef DEBUG, initially.
At top, add #include <llvm-c/Support.h> for LLVMParseCommandLineOptions()
declaration.
v2: Fix compile error with old llvm versions (sroland)
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We only need to dirty the descriptors when the pipeline is
a new one, because user SGPRs can be potentially different.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I did not implement:
CNL's restriction on 64-bit int + align16, because I don't think
we'll ever use this combination regardless of hardware generation.
The restriction on immediate DF -> F conversions, because there's no
reason to ever generate that, and I don't even know how DF -> F
conversions are supposed to work in Align16 since (1) the dst stride
must be 1, but (2) the dst stride would have to be 2 for src and dst
strides to be aligned.
Some restrictions require something like strides to match between src
and dest. For multi-source instructions, I'd rather encapsulate the
logic for not inserting already present errors in ERROR_IF than
open-coding it multiple places.
The type suffixes were wrong, and the 16 was missing the 0 prefix.
Fixes: 92f787ff86 ("i965: Add support for disassembling 64-bit integer immediates")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
... without the float -> double conversion. Low power parts have
additional restrictions when it comes to operating on 64-bit types, and
the instruction used to do the conversion violates one of them:
specifically, the restriction that "Source and Destination horizontal
stride must be aligned to the same qword".
Previously we generated a float and then converted, but we can avoid the
conversion by using the same extract-the-sign-bit + or-in-1.0 algorithm
by directly operating on the high four bytes of each double-precision
component in the result.
In SIMD8 and SIMD16 this cuts one instruction from the implementation,
and more importantly that instruction is the one which violated the
regioning restriction.
Along the way I removed some comments that I did not think helped, and
some code about double comparisons which does not seem to be necessary
today.
This prevents validation failures caught by the new EU validation code
added in later patches.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
64-bit operations on Atom parts have additional restrictions over their
big-core counterparts (validated by later patches).
Specifically, the restriction that "Source and Destination horizontal
stride must be aligned to the same qword" is violated by most shift
operations since NIR uses a 32-bit value as the shift count argument,
and this causes instructions like
shl(8) g19<1>Q g5<4,4,1>Q g23<4,4,1>UD
where src1 has a 32-bit stride, but the dest and src0 have a 64-bit
stride.
This caused ~4 pixels in the ARB_shader_ballot piglit test
fs-readInvocation-uint.shader_test to be incorrect. Unfortunately no
ARB_gpu_shader_int64 test hit this case because they operate on
uniforms, and their scalar regions are an exception to the restriction.
We work around this by effectively unpacking the shift count, so that we
can read it with a 64-bit stride in the shift instruction. Unfortunately
the unpack (a MOV with a dst stride of 2) is a partial write, and cannot
be copy-propagated or CSE'd.
Bugzilla: https://bugs.freedesktop.org/101984
A typo caused us to copy src0's reg file to src1 rather than reading
src1's as intended. This caused us to fail to compact instructions like
mov(8) g4<1>D 0D { align1 1Q };
because src1 was set to immediate rather than architecture file. Fixing
this reenables compaction (after the precompact() pass changes the data
types):
mov(8) g4<1>UD 0x00000000UD { align1 1Q compacted };
Fixes: 1cb0a7941b ("i965: Switch to using the logical register types")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This enables tc compatible htile for stencil surfaces as well.
This gives a 3-5fps boost on Mad Max on high@4k.
It also depends on Bas's tc-compat htile patch.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This looks a bit ugly to me, but the existing codepath
is not terribly elegant as well.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The dmabuf interface requires a valid modifier to be sent. If we don't
explicitly get a modifier from the driver, we can't know what to send;
it must be inferred from legacy side-channels (or assumed to linear, if
none exists).
If we have no modifier, then we can only have a single-plane format
anyway, so fall back to the old wl_drm buffer import path.
Fixes: a65db0ad1c ("st/dri: don't expose modifiers in EGL if the driver doesn't implement them")
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Andy Furniss <adf.lists@gmail.com>
Cc: Marek Olšák <marek.olsak@amd.com>
When creating a wl_buffer from a DRIImage, we extract all the DRIImage
information via queryImage. Check whether or not it actually succeeds,
either bailing out if the query was critical, or providing sensible
fallbacks for information which was not available in older DRIImage
versions.
Fixes: a65db0ad1c ("st/dri: don't expose modifiers in EGL if the driver doesn't implement them")
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Andy Furniss <adf.lists@gmail.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Starting with commit ab0589c6ed ("wayland-egl: remove no longer needed
wayland-client dependency") the wayland-egl.h include was missing leading to a
build failure:
CC wayland-egl.lo
wayland-egl.c:33:10: fatal error: wayland-egl.h: No such file or directory
#include "wayland-egl.h"
^~~~~~~~~~~~~~~
Strictly speaking we should be checking for wayland-egl in configure and
propagating its CFLAGS here.
Yet again, the current wayland-egl split is bonkers as the Wayland repo
provides single header, no pkg-config file or library.
That will be resolved at a later stage, but in the meanwhile fix the
build.
Fixes: ab0589c6ed ("wayland-egl: remove no longer needed wayland-client
dependency")
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
[Emil Velikov: add some text about CFLAGS and current wayland-egl situation]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Otherwise it will be missing from the tarball.
Fixes: f7daa737d1 ("mesa: Combine libtxc_dxtn sources into
texcompress_s3tc_tmp.h")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
The situations where we enable it are quite limitied, but it works,
even for madmax, so lets just enable it.
Reviewed-by: Dave Airlie <airlied@redhat.com>
For Vulkan SPIR-V the spec states
fma() Inherited from OpFMul followed by OpFAdd.
Matt says the backend will do the right thing depending on the
hardware being compiled for, if you use the fmuladd intrinsic.
Using the Mad Max pts test, on high settings at 4K:
CHP: 55->60
HGDD: 46->50
LM: 55->60
No change on Stronghold.
Thanks to Feral for spending the time to track this down.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just legacy cruft. We don't push these values; we pass them in
as vertex attributes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
To fix MinGW compiler warning about missing strlen() prototype.
Not sure how I missed this when fixing the malloc() / stdlib.h issue.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
It seems there's no perfect x/y biases for line drawing to satisfy all
applications. Depending on the biases, either real apps produce results
similar to VGPU10 while Piglit's gl-1.0-ortho-pos fails, or vice versa.
Let's lean toward real applications (Solidworks, SolidEdge, Google Earth)
over Piglit.
Using (-0.5, -0.5) for points, lines and triangles, seems to generally
work well.
We don't seem to have these issues with VGPU10.
Tested with Piglit and CAD-oriented apitraces. See VMware bugs 1775498
and 1905053.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We need to be more careful not to treat nr_samples=1 as an msaa surface.
This patch prevents us from errantly declaring an MSAA shader resource
with 1 sample.
No Piglit regressions, fixes the above-described errors.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
It is possible to have holes in the shader emitter's sampler_target array.
0 sampler_target does not necessarily mean there is no sampler view
specified since texture buffer target has the value 0.
With this patch, a sampler_view array is added to the shader emitter structure
to specify if there is a sampler view for each texture unit. Only if there
is a sampler view, we will emit constant for texcoord scale factor or texture
buffer size for that sampler view.
Fixes a rendering issue with Turbine after commit 1020e960440.
Reviewed-by: Brian Paul <brianp@vmware.com>
For the case of SVGA3D_X32_G8X24_UINT we incorrectly returned
SVGA3D_R32_FLOAT_X8X24. We should return SVGA3D_R32G8X24_TYPELESS.
Note that we never actually use SVGA3D_X32_G8X24_UINT so this has
no impact.
No Piglit regressions.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We sometimes pass typeless formats to this function. By adding switch
cases we avoid the "Unexpected format XXX in svga_typeless_format"
warning messages. No functional change.
No Piglit regressions, no above-mentioned warning messages.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This patch allows to use sRGB formats for DISPLAY_TARGET on vgpu10.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Set up new states that the blob started setting for GC3000 consistently.
This makes sure that when another test or driver leaves the GPU in
unpredictable state, these states are set up correctly for our
rendering.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Setting PA_VIEWPORT_UNK state correctly is necessary to make point sprite
rendering on GC3000 work.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
A two-component dot product instruction is supported with HALTI2, use it
on hardware that supports it.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Support opcodes with bit 6 set in assembler, and assert that only ops
0x00..0x7f are used.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
In truth gtest is an external dependency that upstream expects you to
"vendor" into your own tree. As such, it makes sense to treat it more
like a dependency than an internal library, and collect it's
requirements together in a dependency object.
v2: - include with -isystem instead of setting compiler args (Eric)
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
RadeonSI requires C++11, clover requires C++11, LLVM requires it, so
llvmpipe may require it, and that covers most of the C++ code in mesa.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Required for older versions of libelf that don't have a pkgconfig file.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The kms_swrast extension is an actively developed software fallback,
and platform_surfaceless can use it if there are no available
hardware drivers.
v2: Split into 2 patches, use booleans, check LIBGL_ALWAYS_SOFTWARE,
and modify the eglLog level (Emil, Eric, Tomasz).
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
These shouldn't matter for non-cubes, and we always enable them all
for cubes, so we may as well set them all the time.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I decided to use the one-boolean-per-cube-face approach because it's
clearer which bits correspond to which cube face.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When this file is included by Gallium, the fprintf causes it to fail to
compile. This is an unreachable error case, and we shouldn't be calling
fprintf directly.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Use st_egl_image instead. radeonsi doesn't like when we create
a pipe_surface with PIPE_FORMAT_NV12.
This fixes NV12 texturing on radeonsi using kmscube.
Cc: 17.1 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This should be sufficient for testing all kernel/libdrm/radeonsi codepaths
that are used by radeonsi.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We can implement ARB_indirect_parameters for i965 by
taking advantage of the conditional rendering mechanism.
This works by issuing maxdrawcount draw calls and using
conditional rendering to predicate each of them with
"drawcount > gl_DrawID"
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In order to add our ARB_indirect_parameters implementation we
need to refactor brw_try_draw_prims so that it operates on a
per primitive basis and move the loop into brw_draw_prims.
This commit refactors the brw_try_draw_prims function and
renames it to brw_draw_single_prim.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In order to add our ARB_indirect_parameters implementation we
need to refactor brw_try_draw_prims so that it operates on a
per primitive basis and move the loop into brw_draw_prims.
This commit introduces the brw_finish_drawing function where
we move the code that executes once after the loop.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In order to add our ARB_indirect_parameters implementation we
need to refactor brw_try_draw_prims so that it operates on a
per primitive basis and move the loop into brw_draw_prims.
This commit introduces the brw_prepare_drawing function where
we move the code that executes once before the loop.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's inside an if-statement that already checks that the variables are
not NULL.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The optimization as done in opt_copy_propagation would have to be
removed in the next patch. If we just eliminate that optimization
altogether, shader-db results, even on platforms that use NIR, are hurt
quite substantially. I have not investigated why NIR isn't picking up
the slack here.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Instead of generating a sequence like:
run_default = true;
if (i == 3) // some label that appears after default
run_default = false;
if (i == 4) // some label that appears after default
run_default = false;
...
if (run_default) {
...
}
generate something like:
run_default = !((i == 3) || (i == 4) || ...);
if (run_default) {
...
}
This eliminates one use of conditional assignment, and it enables the
elimination of another.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Previously the instruction stream was walked looking for comparisons
with case-label values. This should generate nearly identical code.
For at least fs-default-notlast-fallthrough.shader_test, the code is
identical.
This change will make later changes possible.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The values being compared are scalars, so these are the same. While
I'm here, simplify the run_default condition to just deref the flag
(instead of comparing a scalar bool with true).
There is a bit of extra change in this patch. When constructing an
ir_binop_equal ir_expression, there is an assertion that the types are
the same. There is no such assertion for ir_binop_all_equal, so
passing glsl_type::uint_type with glsl_type::int_type was previously
fine. A bunch of the code motion is to deal with that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This happens to work now because ir_binop_all_equal is used. This
causes vector typed init-expressions to produce scalar Boolean values
after comparison.
The next commit changes ir_binop_all_equal to ir_binop_equal. Vector
typed init-expressions will then produce vector Boolean values, and, in
debug builds, the ir_assignment constructor will fail an assertion.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Per the SPIR-V spec 2.11 Structured Control Flow:
"The only blocks in a construct that can branch outside the construct are
...
- a break block for the innermost loop it is inside of.
..."
With
"Break block: A block containing a branch to the Merge Block of a loop header's merge instruction."
Note that it puts no restriction on not being in an if or switch within the innermost loop.
This passes the loop_break block to the switch body so it can properly detect loop breaks.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Just to be consistent.
v2: - update meson.build too
v3: - remove unrelated whitespace change
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Use calloc instead of malloc + explicitly zeroing the different fields.
We need special handling for the version field which is of type
const intptr_t.
As we're here document why keeping the constness is a good idea.
The wl_egl_window_resize() call is replaced with an explicit set of the
width/height.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Miguel A. Vico <mvicomoya@nvidia.com>
It makes the header self-contained and with later commit we'll remove
the unnecessary wayland-client.h include.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Miguel A. Vico <mvicomoya@nvidia.com>
Shared glapi (libglapi.so) has been a requirement for years, in order
to build EGL.
Remove the no longer necessary dlopen/dlsym dance and link to the
library directly.
This allows us to remove a handful of platform specific workarounds, due
to the different name of the library.
v2:
- Android: export the include dir (RobH)
- Drop unused local variable (Eric)
Cc: Jonathan Gray <jsg@jsg.id.au>
Cc: Jon Turney <jon.turney@dronecode.org.uk>
Cc: Julien Isorce <julien.isorce@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Tomasz Figa <tfiga@chromium.org> (v1)
Tested-by: Rob Herring <robh@kernel.org>
The current convenience function GetEnv feeds the results of getenv
directly into std::string(). That is a bad idea, since the variable
may be unset, thus we feed NULL into the C++ construct.
The latter of which is not allowed and leads to a crash.
v2: Better variable name, implicit char* -> std::string conversion (Eric)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101832
Fixes: a25093de71 ("swr/rast: Implement JIT shader caching to disk")
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Cc: Laurent Carlier <lordheavym@gmail.com>
Cc: Bernhard Rosenkraenzer <bero@lindev.ch>
[Emil Velikov: make an actual commit from the misc diff]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Laurent Carlier <lordheavym@gmail.com> (v1)
Similar to the way width/pitch alignment works, it seems like we need to
do similar for height. Otherwise the BLIT from system memory to GMEM
can over-fetch beyond the end of the buffer, triggering a fault.
I'm not sure if there is a better solution yet. Possibly we could fall
back to pre-a5xx style DRAW packets for cases where BLIT might over-
fetch. (We in theory have that problem already with rendering to higher
mipmap levels, although fortunately those tend to use GMEM bypass.)
This fixes issues reported with glamor.
Reported-by: don.harbin@linaro.org
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
This is purely a file-move + #include fixup + build system changes.
Other cleanups will follow in subsequent commits.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The hardware registers store the half-size/width in 12.4 fixed point
format, so 8192 is the maximum.
Fixes dEQP-GLES3.functional.rasterization.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is a bit conservative, but a more precise solution requires access
to the rasterizer state. This is something to tackle after the fork between
r600 and radeonsi.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We'll use it in the scissors / clip / guardband state.
v2: avoid a performance regression on r600 when applied to
(pre-fork) stable branches
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is the last step of fixing
dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.rgb_unsigned_int_2_10_10_10_rev
for radeonsi.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The EXT_texture_type_2_10_10_10_REV (ES only) states the following issue:
"1. Should textures specified with this type be renderable?
UNRESOLVED: No. A separate extension could provide this functionality."
This partially fixes
dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.{rgb,rgba}_unsigned_int_2_10_10_10_rev
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ES requires it. This is a partial fix for
dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.rgb_unsigned_int_2_10_10_10_rev
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It leads to surprising states with integer inputs and outputs on
vertex processing stages (e.g. geometry stages). Instead, rely on the
driver to choose smooth interpolation by default.
We still allow varyings to match when one stage declares it as smooth
and the other declares it without interpolation qualifiers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Fixes an assert in fd_acc_query_register_provider() about query provider
not already registered.
Fixes: 3f6b3d9d ("gallium: add PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE")
Signed-off-by: Rob Clark <robdclark@gmail.com>
A recent commit fixed the case of 8888 integer cube maps, which need the
workaround of replacing the data format with USCALED/SSCALED. However,
this broke the case of non-8888 integer cube maps; those still need the
fix of shifting the texture coordinates.
Fixes KHR-GL45.texture_gather.plain-gather-int-cube-array and similar.
Fixes: 6fb0c1013b ("radeonsi: workaround for gather4 on integer cube maps")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We shouldn't reach this point because HTILE is only enabled
when the number of levels is 1.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We can start reading the URB at the first offset that contains varyings
that are actually read in the URB. We still need to make sure that we
read at least one varying to honor hardware requirements.
This helps alleviate a problem introduced with 99df02ca26 for
separate shader objects: without separate shader objects we assign
locations sequentially, however, since that commit we have changed the
method for SSO so that the VUE slot assigned depends on the number of
builtin slots plus the location assigned to the varying. This fixed
layout is intended to help SSO programs by avoiding on-the-fly recompiles
when swapping out shaders, however, it also means that if a varying uses
a large location number close to the maximum allowed by the SF/FS units
(31), then the offset introduced by the number of builtin slots can push
the location outside the range and trigger an assertion.
This problem is affecting at least the following CTS tests for
enhanced layouts:
KHR-GL45.enhanced_layouts.varying_array_components
KHR-GL45.enhanced_layouts.varying_array_locations
KHR-GL45.enhanced_layouts.varying_components
KHR-GL45.enhanced_layouts.varying_locations
which use SSO and the the location layout qualifier to select such
location numbers explicitly.
This change helps these tests because for SSO we always have to include
things such as VARYING_SLOT_CLIP_DIST{0,1} even if the fragment shader is
very unlikely to read them, so by doing this we free builtin slots from
the fixed VUE layout and we avoid the tests to crash in this scenario.
Of course, this is not a proper fix, we'd still run into problems if someone
tries to use an explicit max location and read gl_ViewportIndex, gl_LayerID or
gl_CullDistancein in the FS, but that would be a much less common bug and we
can probably wait to see if anyone actually runs into that situation in a real
world scenario before making the decision that more aggresive changes are
required to support this without reverting 99df02ca26.
v2:
- Add a debug message when we skip clip distances (Ilia)
- we also need to account for this when we compute the urb setup
for the fragment shader stage, so add a compiler util to compute
the first slot that we need to read from the URB instead of
replicating the logic in both places.
v3:
- Make the util more generic so it can account for all unused slots
at the beginning of the URB, that will make it more useful (Ken).
- Drop the debug message, it was not what Ilia was asking for.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Allows the instructions to be compacted. The documentation claims that
some of these only accept UD types, even though the type doesn't change
the operation performed. Just normalize the types to ensure we get
instruction compaction.
The only functional changes are for FBL and CBIT (always use UD types)
and FBH (always use the same types).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The operation performed is all the same as LODQ, but with the usual
differences between dx10 and GL texture opcodes, that is separate resource
and sampler indices (plus result swizzling, and setting z/w channels
to zero).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
For 1080p video transcode, the height will be scaled to 1088 when deint
to progressive buffer. Set dst rect to make sure no scale.
Fixes: 3ad8687 "st/va: use new vl_compositor_yuv_deint_full() to deint"
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Andy Furniss <adf.lists@gmail.com>
Note: this causes spurious regressions in some current piglit tests,
because the tests incorrectly assume that there is no denorm support for
doubles. I'm going to send out a fix for those tests as well.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
The LLVM intrinsic has existed for a long time. The current name was
established in LLVM 3.9.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
The status quo is quite the mess:
1. tgsi_exec will do a per-channel computation, and store the dst[0]
result (significand) correctly for each channel. The dst[1] result
(exponent) will be written to the first bit set in the writemask.
So per-component calculation only works partially.
2. r600 will only do a single computation. It will replicate the
exponent but not the significand.
3. The docs pretend that there's per-component calculation, but even
get dst[0] and dst[1] confused.
4. Luckily, st_glsl_to_tgsi only ever emits single-component instructions,
and kind-of assumes that everything is replicated, generating this for
the dvec4 case:
DFRACEXP TEMP[0].xy, TEMP[1].x, CONST[0][0].xyxy
DFRACEXP TEMP[0].zw, TEMP[1].y, CONST[0][0].zwzw
DFRACEXP TEMP[2].xy, TEMP[1].z, CONST[0][1].xyxy
DFRACEXP TEMP[2].zw, TEMP[1].w, CONST[0][1].zwzw
Settle on the simplest behavior, which is single-component calculation
with replication, document it, and adjust tgsi_exec and r600.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Sourcing the exponent for the zw destination pair from Z is consistent
with both tgsi_exec and gallivm. In practice, st_glsl_to_tgsi always
generates per-channel instructions anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
GLSL ES requires both, and while GLSL explicitly doesn't require correct
overflow handling, it does appear to require handling input inf/denorms
correctly.
Fixes dEQP-GLES31.functional.shaders.builtin_functions.precision.ldexp.*
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
A tempting alternative fix would be adding a lock/unlock pair in
util_queue_fence_is_signalled. However, that wouldn't actually
improve anything in the semantics of util_queue_fence_is_signalled,
while making that test much more heavy-weight. So this lock/unlock
pair in util_queue_fence_destroy for "flushing out" other threads
that may still be in util_queue_fence_signal looks like the better
fix.
v2: rephrase the comment
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
This fixes a warning caused by the fork (note the change in the function
signature):
../../../../../mesa-src/src/gallium/drivers/r600/r600_state_common.c: In function ‘r600_init_common_state_functions’:
../../../../../mesa-src/src/gallium/drivers/r600/r600_state_common.c:2974:36: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
rctx->b.set_occlusion_query_state = r600_set_occlusion_query_state;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes the extremely unlikely case that an application uses
0x80000000 or 0x3f800000 as border color for an integer texture and
helps in the also, but perhaps slightly less, unlikely case that 1 is
used as a border color.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
The hardware does this automatically for unorm formats, but we need to
do it manually for unorm depth formats that have been upgraded to
Z32_FLOAT.
Fixes dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth
and others.
Fixes: d4d9ec55c5 ("radeonsi: implement TC-compatible HTILE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
The hardware usually does this automatically. However, we upgrade
depth to Z32_FLOAT to enable TC-compatible HTILE, which means the
hardware no longer clamps the comparison value for us.
The only way to tell in the shader whether a clamp is required
seems to be to communicate an additional bit in the descriptor
table. While VI has some unused bits in the resource descriptor,
those bits have unfortunately all been used in gfx9. So we use
an unused bit in the sampler state instead.
Fixes dEQP-GLES3.functional.texture.shadow.2d.linear.equal_depth_component32f
and many other tests in dEQP-GLES3.functional.texture.shadow.*
Fixes: d4d9ec55c5 ("radeonsi: implement TC-compatible HTILE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Avoid a v_cndmask: the absolute value is free due to input modifiers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Fix the custom cube coord selection sequence to be identical to
the hardware v_cubesc/tc and OpenGL spec. Affects texture sampling
with user-provided derivatives.
Fixes dEQP-GLES3.functional.shaders.texture_functions.texturegrad.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Overriding the default (no-op) swizzle is clearly counter-productive,
since the whole point is putting the destination register as one of
the source operands so that it remains unmodified when the assignment
condition is false.
Fragment depth and stencil outputs are a special case due to how their
source swizzles are manipulated in translate_src when compiling to
TGSI.
Fixes dEQP-GLES2.functional.shaders.conditionals.if.*_vertex
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Found by address sanitizer.
The loop here tries to be safe, but in doing so, it ends up doing
exactly the wrong thing: the safe foreach is for when the loop
variable (inst) could be deleted and nothing else. However, this
particular can delete inst's successor, but not inst itself.
Fixes: 8c6a0ebaad ("st/mesa: add st fp64 support (v7.1)")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
It has to happen after descriptor uploads since otherwise we'll print out
the wrong GPU list / incorrectly claim descriptor corruption.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Triggering the push model when 64-bit inputs are involved is not easy due to
the constrains on the maximum number of registers that we allow for this mode,
however, for GS with 'points' primitive type and just a couple of double
varyings we can trigger this and it just doesn't work because the
implementation is not 64-bit aware at all. For now, let's make sure that we
don't attempt this model whith 64-bit inputs and we always fall back to pull
model for them.
Also, don't enable the VUE handles in the thread payload on the fly when we
find an input for which we need the pull model, this is not safe: if we need
to resort to the pull model we need to account for that when we setup the
thread payload so we compute the first non-payload register properly. If we
didn't do that correctly and we enable it on-the-fly here then we will end up
VUE handles on the first non-payload register which will probably lead to
GPU hangs. Instead, always enable the VUE handles for the pull model so we
can safely use them when needed. The GS is going to resort to pull model
almost in every situation anyway, so this shouldn't make a significant
difference and it makes things easier and safer.
v2: Always enable the VUE handles for pull model, this is easier and safer
and the GS is going to fallback to pull model almost always anyway (Ken)
v3: Only clamp the URB read length if we are over the maximum reserved for
push inputs as we were doing in the original code (Ken).
v4: No need to clamp the urb read length if invocations > 1
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This way, when NIR_PASS_V makes a clone of the shader (for testing
nir_clone), the new and lowered version gets re-assigned to prog->nir.
[jordan.l.justen@intel.com: Tested NIR_TEST_CLONE=1 with valgrind]
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
[jordan.l.justen@intel.com: Tested NIR_TEST_CLONE=1 with valgrind]
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The way NIR_PASS works (and, by extension, nir_optimize) is that they
may clone the shader and throw the old one away. (We use this for
testing nir_clone.) It's better if we just make a temporary variable,
use it for everything, and re-assign to the gl_program at the end.
[jordan.l.justen@intel.com: Tested NIR_TEST_CLONE=1 with valgrind]
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Fix a compile error with G++ 4.4
string_buffer_test.cpp:43: error: ISO C++ forbids initialization of
member ‘str1’
string_buffer_test.cpp:43: error: making ‘str1’ static
string_buffer_test.cpp:43: error: invalid in-class initialization of
static data member of non-integral type ‘const char*’
Tested-by: Vinson Lee <vlee at freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103002
We don't have vasprintf() on Windows so we need to implement it ourselves.
v2: compute actual length of output string, per Nicolai Hähnle.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Since our driver support arb_provoking_vertex, we can start
advertising PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION
Fixes ./clipflat & ./arb-provoking-vertex-render piglit tests
Tested piglit, glretrace on Hw 11 and Hw 13
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Currently we are blitting the whole resource when the RS is used to
de-/tile a resource. This can be very inefficient for large resources
where the transfer is only changing a small part of the resource
(happens a lot with glTexSubImage2D).
Optimize this by only blitting the tile aligned subregion of the
resource, which the transfer is going to change.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
This is useful if we only need to copy part of a larger resource, mostly
when using the RS engine to de-/tile on pipe transfers.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
The RS can blit abitrary tile aligned subregions of a resource by
adjusting the buffer offset.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
It adds reference links for arguments usage and bind of resource_create().
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previous get_paramf links same as get_param. It changes the reference link to
PIPE_CAPF_*
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We have been exposing only 16 since 1e3e72e305 with arguments
based on register pressure and the number of available GRFs, however,
our scalar backend will always limit the number of push registers
for GS threads to 24 and fallback to pull model for anything else,
so there is really no reason to lower the number under those arguments.
By bumping this up to 32 we make it the same as all the other stages,
which is a nice feature to have that can help applications in some
cases (I recently fixed a bug in CTS that assumed that the number
of input locations in a stage matches the number of output locations
in the previous stage for example).
Pre-gen8, we use the vector backend and push model, so in that case
the arguments in 1e3e72e305 are still valid.
v2: check if we have scalar GS instead of the hw gen to enable this (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I remember thinking "gosh, it would be nice if I could do a kernel-style
'if (!IS_ENABLED(DEBUG))' instead of using an #ifdef, so the code was
compiled on both builds", and then forgot to test a release build anyway.
Fixes: a8fd58eae5 ("vc4: Add labels to BOs for debug builds or with VC4_DEBUG=surf set.")
Reported-by: Derek Foreman <derekf@osg.samsung.com>
This has proven to be incredibly useful for debugging CMA allocation
failures and driving memory management improvements. However, we don't
want to burden entry and exit from the BO cache with the labeling ioctl's
overhead on release builds.
This builds, installs, and has been tested on a r290x (Hawaii) with the Vulkan
CTS. It dies horribly in a fire at the same point for the meson build as the
autotools build.
v2: - enable radv by default
- add shader cache support and enforce that it's built for radv
v3: - Fix typo in meson_options (Nicholas)
- strip trailing 'svn' from llvm version before setting the version
preprocessor flag (Bas)
- Check for LLVM module requirements
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This allows building and installing the Intel "anv" Vulkan driver using
meson and ninja, the driver has been tested against the CTS and has
seems to pass the same series of tests (they both segfault when the CTS
tries to run wayland wsi tests).
There are still a mess of TODO, XXX, and FIXME comments in here. Those
are mostly for meson bugs I'm trying to fix, or for additional things to
implement for other drivers/features.
I have configured all intermediate libraries and optional tools to not
build by default, meaning they will only be built if they're pulled in
as a dependency of a target that will actually be installed) this allows
us to avoid massive if chains, while ensuring that only the bits that
need to be built are.
v2: - enable anv, x11, and wayland by default
- add configure option to disable valgrind
v3: - fix typo in meson_options (Nicholas)
v4: - Remove dead code (Eric)
- Remove change to generator that was from v0 (Eric)
- replace if chain with loop (Eric)
- Fix typos (Eric)
- define HAVE_DLOPEN for both libdl and builtin dl cases (Eric)
v5: - rebase on util string buffer implementation
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net> (v4)
It is possible to have DEBUG disabled but asserts on (NDEBUG), which
cannot build because these asserts work on members that are only present
when DEBUG is on.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Meson doesn't allow setting environment variables for custom targets, so
we either need to not pass this as an environment variable or use a
shell script to wrap the invocation. The chosen solution has the
advantage of working for both autotools and meson.
v2: - put rules back in top scope (Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
This unbreaks waffle/gbm (piglit/gbm) which fails initialization.
v2: also don't set queryDmaBufFormats
Reviewed-by: Daniel Stone <daniel@fooishbar.org>
We originally implemented caching to avoid unneeded round-trips to the
compositor when querying surface capabilities etc. to set up the
swapchain. Unfortunately, this doesn't work if vkDestroyInstance is
called after the Wayland connection has been dropped. In this case, we
end up trying to clean up already destroyed wl_proxy objects which leads
to crashes. In particular most of dEQP-VK.wsi.wayland is crashing
thanks to this problem.
This commit gets rid of the cache and simply embeds the wsi_wl_display
struct in the swapchain. While we're at it, we can get rid of the
wl_event_queue that we were storing in the swapchain because we can just
use the one in the embedded wsi_wl_display.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Bugzilla: https://bugs.freedesktop.org/102578
Cc: mesa-stable@lists.freedesktop.org
Ugh the GLX code. __GLX_MAX_CONTEXT_PROPS is 3 because glxproto.h is
just a pile of ancient runes, so when the server begins sending more
than 3 context properties this code refuses to work _at all_. Which is
all just silly. If _XReply succeeds, it will have buffered the whole
reply, we can just walk through each property one at a time.
v2: Now with no arbitrary limits. (Eric Anholt)
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
dri2_fallback_swap_interval() currently used to stub out swap interval
support in Android backend does nothing besides returning EGL_FALSE.
This causes at least one known application (Android Snapchat) to fail
due to an unexpected error and my loose interpretation of the EGL 1.5
specification justifies it. Relevant quote below:
The function
EGLBoolean eglSwapInterval(EGLDisplay dpy, EGLint interval);
specifies the minimum number of video frame periods per buffer swap
for the draw surface of the current context, for the current rendering
API. [...]
The parameter interval specifies the minimum number of video frames
that are displayed before a buffer swap will occur. The interval
specified by the function applies to the draw surface bound to the
context that is current on the calling thread. [...] interval is
silently clamped to minimum and maximum implementation dependent
values before being stored; these values are defined by EGLConfig
attributes EGL_MIN_SWAP_INTERVAL and EGL_MAX_SWAP_INTERVAL
respectively.
The default swap interval is 1.
Even though it does not specify the exact behavior if the platform does
not support changing the swap interval, the default assumed state is the
swap interval of 1, which I interpret as a value that eglSwapInterval()
should succeed if called with, even if there is no ability to change the
interval (but there is no change requested). Moreover, since the
behavior is defined to clamp the requested value to minimum and maximum
and at least the default value of 1 must be present in the range, the
implementation might be expected to have a valid range, which in case of
the feature being unsupported, would correspond to {1} and any request
might be expected to be clamped to this value.
Fix this by defaulting dri2_dpy's min_swap_interval, max_swap_interval
and default_swap_interval to 1 in dri2_setup_screen() and let platforms,
which support this functionality set their own values after this
function returns. Thanks to patches merged earlier, we can also remove
the dri2_fallback_swap_interval() completely, as with a singular range
it would not be called anyway.
v2: Remove dri2_fallback_swap_interval() completely thanks to higher
layer already clamping the requested interval and not calling the
driver layer if the clamped value is the same as current.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
That's unnecessary to double-check that dcc_offset is not 0
because all callers already check that.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Swr caches fb contents in tiles. Those tiles are stored on a per-context
basis.
When switching contexts that share resources we need to make sure that
the tiles of the old context are being stored and the tiles of the new
context are being invalidated (marked as invalid, hence contents need
to be reloaded).
The context does not get any dirty bits to identify this case. This has
to be, then, coordinated by the resources that are being shared between
the contexts.
Add a "curr_pipe" hook in swr_resource that will allow us to identify a
MakeCurrent of the above form during swr_update_derived(). At that time,
we invalidate the tiles of the new context. The old context, will need to
have already store its tiles by that time, which happens during glFlush().
glFlush() is being called at the beginning of MakeCurrent.
So, the sequence of operations is:
- At the beginning of glXMakeCurrent(), glFlush() will store the tiles
of all bound surfaces of the old context.
- After the store, a fence will guarantee that the all tile store make
it to the surface
- During swr_update_derived(), when we validate the new context, we check
all resources to see what changed, and if so, we invalidate the
current tiles.
Fixes rendering problems with CEI/Ensight.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
In the vec4 backend, SHADER_OPCODE_UNTYPED_ATOMIC's src[1] is the
surface index. We want to copy propagate so we can use an immediate
message descriptor, rather than an indirect send.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Atomic operation sources are scalar values, but we were failing to
select the .x component of the second operand. For example,
atomicCounterCompSwapARB(counter, 5u, 10u)
would generate
mov(8) vgrf4.x:D, 5D
mov(8) vgrf5.x:D, 10D
mov(8) vgrf9.x:UD, vgrf4.xyzw:D
mov(8) vgrf9.y:UD, vgrf5.xyzw:D
which wrongly selects the .y component of vgrf5, so the actual 10u value
would get dead code eliminated. The swizzle works for the other source,
but both of them ought to be .xxxx.
Fixes the compare and swap CTS tests in:
KHR-GL45.shader_atomic_counter_ops_tests.ShaderAtomicCounterOpsExchangeTestCase
Cc: "17.2 17.1 17.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Embarassingly, someone enabled the ARB_shader_atomic_counter_ops
extension for Gen7+ but never added the intrinsics to the switch
statement in the vec4 backend, so they just hit an unreachable()
call and died.
Fixes: 40dd45d0c6 (i965: Enable ARB_shader_atomic_counter_ops)
Cc: "17.2 17.1 17.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
vc5 MMU mappings are access-controlled at a 128kb boundary, so the 4kb
here was too small for that purpose. Allowing any valid align2 value that
u_mm's 32-bit addressing can represent will still catch most cases of
people passing in a byte alignment.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
I was implementing the same enum support in broadcom's gen_pack_header.py,
and did this same simplification there.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
cleared_and_retried is always reset to false when jumping to the retry
label, thus leading to an infinite retry loop.
Fix that by moving the cleared_and_retried variable definitions at the
beginning of the function. While we're at it, move the create variable
with the other local variables and explicitly reset its content in the
retry path.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Fixes: 78087676c9 "vc4: Restructure the simulator mode."
I was overwriting view->texture with the shadow resource when we need to
do shadow copies (retiling or baselevel rebase), but that tripped up some
critical new sanity checking in state_tracker (making sure that stObj->pt
hasn't changed from view->texture through TexImage-related paths).
To avoid that, move the shadow resource to the vc4_sampler_view struct.
Fixes: f0ecd36ef8 ("st/mesa: add an entirely separate codepath for setting up buffer views")
The wayland-drm callback struct is referenced, rather than duplicated,
inside wayland-drm. Constifying this struct involved moving it on to the
stack; as a result, starting any EGL client on Wayland called into
random stack memory, and killed the compositor.
This reverts commit 1d0be5b3fe and
39d539e321.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Krzysztof Sobiecki <sobkas@gmail.com>
Fixes: 1d0be5b3fe ("wayland-drm: constify the callbacks struct")
Length of the token was already calculated by flex and stored in yyleng,
no need to implicitly call strlen() via linear_strdup().
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick at intel.com>
V2: Also convert this pattern in glsl_lexer.ll
V3: Remove a misplaced comment
V4: Use a temporary char to avoid type change
Remove bogus +1 on length check of identifier
Migrate removal of line continuations to string_buffer. Before this
it used ralloc_strncat() to append strings, which internally
each time calculates strlen() of its argument. Its argument is
entire shader, so it multiple time scans the whole shader text.
Signed-off-by: Vladislav Egorov <vegorov180@gmail.com>
Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
V2: Adapt to different API of string buffer (Thomas Helland)
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
V2: Pointed out by Timothy
- Fix pp.c reralloc size issue and comment
V3 - Use vprintf instead of printf where we should
- Fixes failing make-check tests
V4 - Use buffer_append_char in a couple places
- Use append_char in even more places
More tests could probably be added, but this should cover
concatenation, resizing, clearing, formatted printing,
and checking the length, so it should be quite complete.
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
V2: Address review feedback from Timothy, plus fixes
- Use a large enough char array
- Actually test the formatted appending
- Test that clear function resets string length
V3: Port to gtest
V4: Fix test makefile
Fix copyright header
Fix missing extern C
Use more appropriate name for C-file
Add tests for append_char
Based on Vladislav Egorovs work on the preprocessor, but split
out to a util functionality that should be universal. Setup, teardown,
memory handling and general layout is modeled around the hash_table
and the set, to make it familiar for everyone.
A notable change is that this implementation is always null terminated.
The rationale is that it will be less error-prone, as one might
access the buffer directly, thereby reading a non-terminated string.
Also, vsnprintf and friends prints the null-terminator.
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
V2: Address review feedback from Timothy and Grazvydas
- Fix MINGW preprocessor check
- Changed len from uint to int
- Make string argument const in append function
- Move to header and inline append function
- Add crimp_to_fit function for resizing buffer
V3: Move include of ralloc to string_buffer.h
V4: Use u_string.h for a cross-platform working vsnprintf
V5: Remember to cast to char * in crimp function
V6: Address review feedback from Nicolai
- Handle !str->buf in buffer_create
- Ensure va_end is always called in buffer_append_all
- Add overflow check in buffer_append_len
- Do not expose buffer_space_left, just remove it
- Clarify why a loop is used in vprintf, change to for-loop
- Add a va_copy to buffer_vprintf to fix failure to append arguments
when having to resize the buffer for vsnprintf.
V7: Address more review feedback from Nicolai
- Add missing va_end corresponding to va_copy
- Error check failure to allocate in crimp_to_fit
For now linking is just removing unused varyings between stages.
shader-db results BDW:
total instructions in shared programs: 13198288 -> 13191693 (-0.05%)
instructions in affected programs: 48325 -> 41730 (-13.65%)
helped: 473
HURT: 0
total cycles in shared programs: 541184926 -> 541159260 (-0.00%)
cycles in affected programs: 213238 -> 187572 (-12.04%)
helped: 435
HURT: 8
V2:
- lower indirects on demoted inputs as well as outputs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will allow us to insert a nir linking step in brw_link_shader().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
This will help us call gather info at a later point and allow us
to do some linking in nir.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
The initial helpers add support for removing unused varyings between
stages.
V2:
- Moved the io mask helper function into this file rather than
nir.h so it's not used elsewhere considering it doesn't handle
all corner cases.
- Use bitmask rather than hash table to handle tcs outputs (Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used by the nir linking pass so that we don't remove
otherwise unused varyings.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Will be used in nir link pass to decided if we can remove a varying
or not.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
This marks the end of code sharing between r600 and radeonsi.
It's getting difficult to work on radeonsi without breaking r600.
A lot of functions had to be renamed to prevent linker conflicts.
There are also minor cleanups.
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
do_flush_locked isn't a great name - especially given that there's no
locking going on in our code relating to execbuf.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
We have a nice utility function for this, which eliminates the need for
locking stuff. This isn't really performance critical, but it's less
code to use the atomic.
p_atomic_inc_return does pre-increment rather than post-increment, so we
change screen->program_id to be initialized to 0 instead of 1. At which
point, we can just delete the initialization because intel_screen is
rzalloc'd.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
There's no real advantage or disadvantage here, it's just for stylistic
consistency with the rest of the codebase.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Supported in JitGatherVertices(); FetchJit::JitLoadVertices() may require
similar changes, will need address this if it is determined that this
path is still in use.
Handle Force Sequential Access in FetchJit::Create.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Move structure, as the size is significantly reduced due to dynamic
allocation of the GS buffers.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
One piglit regression, which was a false pass:
spec@glsl-1.50@execution@geometry@dynamic_input_array_index
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This is needed since we don't update the number of viewports/scissors
when they are set dynamically (according to the spec). In the following
scenario:
* vkCmdSetViewport()
* vkCmdClearColorImage() (or any other meta operations)
The viewports/scissors weren't saved correctly because no pipeline
was bound before, and thus the number of viewports/scissors were 0.
This fixes a regression with:
dEQP-VK.draw.negative_viewport_height.front_ccw_cull_back
Fixes: 60878dd00c ("radv: do not update the number of viewports in vkCmdSetViewport()")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
String handling has changed on python3.
Before this patch, on python3:
#define MESA_GIT_SHA1 "git-b'b99dcbfeb3'"
After:
#define MESA_GIT_SHA1 "git-b99dcbfeb3"
(No change on python2, it always looked ok)
Cc: Jose Fonseca <jfonseca@vmware.com>
Fixes: b99dcbfeb3 "build: Convert git_sha1_gen script to Python."
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Scaling between interlaced buffers, esp. for scale-up, because
blit will scale up top filed and bottom field separately. it'll
result in the weaving for these buffer with lack of accuracy.
So use shader deint for the case.
Acked-by: Christian König <christian.koenig@amd.com>
Before it's impossible to transcode an interlaced video, becasue if
in order for encoder to work, we have to force buffer to progessive,
but the deint with buffer from I to P is missing. Now along With
the new YUV deint full function, it works with weave and bob deint.
Also this will benefit transcoding video with scaling parameters.
Acked-by: Christian König <christian.koenig@amd.com>
We also set src rectangle explicitly just in case of the mismatch
of size between interlaced buffer and progressive buffer
Acked-by: Christian König <christian.koenig@amd.com>
Spec adding corner cases ...
Fixes: 969537d935 "radv: Add support for more DCC compression with VK_KHR_image_format_list."
Reviewed-by: Dave Airlie <airlied@redhat.com>
I tested this 10 times with
./deqp-vk --deqp-case=dEQP-VK.texture.filtering.3d.formats.r4g4b4a4*
and one full run of CTS, seems the issue is gone.
Also reduces CTS runtime by 30% or so.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
In Vulkan, for 'z' (depth) component, the scale and translate values
for the viewport transformation are:
pz = maxDepth - minDepth
oz = minDepth
zf = pz × zd + oz
Being zd, the third component in vertex's normalized device coordinates.
Fixes: dEQP-VK.draw.inverted_depth_ranges.*
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Not quite asciibetical: ARB, then EXT, then vendor, just like the GL
extension enum just below. No functional change, but it bothered me.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add missing includes after 6ace0b8 (etnaviv: don't enable RT
full-overwrite when logicop is enabled), otherwise the etnaviv driver
won't build because of missing macros.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Tested-by: Andres Gomez <agomez@igalia.com>
util_pack_color may leave undefined values in the upper half of the packed
integer. As our hardware needs the upper 16 bits to mirror the lower 16bits,
this breaks clears of those formats if the undefined values aren't masked off.
I've only observed the issue with R5G6B5_UNORM surfaces, other 16bpp
formats seem to work fine.
Fixes: d6aa2ba2b2 (etnaviv: replace translate_clear_color with util_pack_color)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
While using iterparse is potentially a little more efficient, the Vulkan
registry XML is not large and using regular element tree simplifies the
parsing logic substantially.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
New extensions can introduce additional enums. Most of the new enums
will have disjoint numbers from the initial enums. For example new
formats introduced by VK_IMG_format_pvrtc :
VK_FORMAT_ASTC_10x8_UNORM_BLOCK = 177,
VK_FORMAT_ASTC_10x8_SRGB_BLOCK = 178,
VK_FORMAT_ASTC_10x10_UNORM_BLOCK = 179,
VK_FORMAT_ASTC_10x10_SRGB_BLOCK = 180,
VK_FORMAT_ASTC_12x10_UNORM_BLOCK = 181,
VK_FORMAT_ASTC_12x10_SRGB_BLOCK = 182,
VK_FORMAT_ASTC_12x12_UNORM_BLOCK = 183,
VK_FORMAT_ASTC_12x12_SRGB_BLOCK = 184,
VK_FORMAT_PVRTC1_2BPP_UNORM_BLOCK_IMG = 1000054000,
VK_FORMAT_PVRTC1_4BPP_UNORM_BLOCK_IMG = 1000054001,
VK_FORMAT_PVRTC2_2BPP_UNORM_BLOCK_IMG = 1000054002,
VK_FORMAT_PVRTC2_4BPP_UNORM_BLOCK_IMG = 1000054003,
VK_FORMAT_PVRTC1_2BPP_SRGB_BLOCK_IMG = 1000054004,
VK_FORMAT_PVRTC1_4BPP_SRGB_BLOCK_IMG = 1000054005,
VK_FORMAT_PVRTC2_2BPP_SRGB_BLOCK_IMG = 1000054006,
VK_FORMAT_PVRTC2_4BPP_SRGB_BLOCK_IMG = 1000054007,
It's obvious we can't have a single table for handling those anymore.
Fortunately the enum values actually contain the number of the
extension that introduced the new enums. So we can build an
indirection table off the extension number and then index by
subtracting the first enum of the the format enum value.
This change makes the extension number available in the generated enum
code.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
They are now provided by -latomic, which should be linked as needed
since previous commit.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
On some platforms, gcc generates library calls when __atomic_* functions
are used, but does not link the required library (libatomic) automatically
(supposedly to allow the app to use some other atomics implementation?).
Detect this at configure time and add the library when needed. Tested
on armel (library was added) and on x86_64 (was not, as expected).
Some documentation on this is provided in GCC wiki:
https://gcc.gnu.org/wiki/Atomic/GCCMM
Fixes: 8915f0c0 "util: use GCC atomic intrinsics with explicit memory model"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102573
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Logicop is a form of blending with the framebuffer, so we must allow
framebuffer reads when logicop is enabled.
Fixes: piglit gl-1.0-logicop on GC3000, which has logicop support
Signed-off-by: Lucas Stach <dev@lynxeye.de>
drm-intel is in favor of keeping the unused pci-id's which
are still listed in the h/w specs. To keep it uniform
across multiple gfx stack components, I'm reverting below
Mesa patches:
b2dae9f8fdebc5ccf3cc.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
This is not used anywhere in the codebase. It's a hashtable
implementation that is based around cso_hash, and is therefore
(and as mentioned in a comment in the source) quite similar to
u_hash_table.
CC: Brian Paul<brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If transform feedback is recording a varying, it needs a slot in the
VUE map, regardless of whether or not the shader writes it.
Together with the previous patch, this fixes:
- KHR-GL45.enhanced_layouts.xfb_capture_struct
The test captures a structure where the vertex shader writes the first
and third members - but the second still needs a slot.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
unify_interfaces() only updates the NIR program info, not the copy
in the gl_program itself. So, by using the old copy, we were missing
out on these updates.
The TCS/TES ones already did this correctly.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
This can occur if the shader is capturing some of the values from the
VUE header for transform feedback, but the shader hasn't written all of
them.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
brw_finish_batch emits commands needed at the end of every batch buffer,
including any workarounds. In the past, we freed up some "reserved"
batch space before calling it, so we would never have to flush during
it. This was error prone and easy to screw up, so I deleted it a while
back in favor of growing the batch.
There were two problems:
1. We're in the middle of flushing, so brw->no_batch_wrap is guaranteed
not to be set. Using BEGIN_BATCH() to emit commands would cause a
recursive flush rather than growing the buffer as intended.
2. We already recorded the throttling batch before growing, which
replaces brw->batch.bo with a different (larger) buffer. So growing
would break throttling.
These are easily remedied by shuffling some code around and whacking
brw->no_batch_wrap in brw_finish_batch(). This also now includes the
final workarounds in the batch usage statistics. Found by inspection.
Fixes: 2c46a67b41 (i965: Delete BATCH_RESERVED handling.)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested with AMD's Anvil OutOfOrderRasterization demo on a RX 560.
Signed-off-by: Nicholas Miell <nmiell@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: Force T and R wrap modes to GL_CLAMP_TO_EDGE for 1D textures.
This fixes a regression in tex1d-2dborder. The test uses a 1D texture
but it provides S and T texture coordinates. Since the T wrap mode
would (correctly) be set to GL_CLAMP, the texture would gradually
blend (incorrectly) with the border color.
I also tried setting NV20_3D_TEX_FORMAT_DIMS_1D instead of
NV20_3D_TEX_FORMAT_DIMS_2D for 1D textures, but that did not help.
It is possible that the same problem exists for 2D textures with the
R-wrap mode, but I don't think there are any piglit tests for that.
No test changes on NV20 (10de:0201).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
There's no reason to use va_copy here.
CID: 1418113
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Fixes: e7fc664b91 ("winsys/amdgpu: add addrlib - texture
addressing and alignment calculator")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The number of viewports/scissors can only be specified at pipeline
creation time, so make sure to copy them when binding a new one
because the dynamic state is cleared in BeginCommandBuffer().
Fixes: dcf46e995d ("radv: do not update the number of scissors in vkCmdSetScissor()")
Fixes: 60878dd00c ("radv: do not update the number of viewports in vkCmdSetViewport()")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Having this separate just makes the code harder to follow, and
requires an extra walk of the IR.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
The Broadwell method of handling uncompressed views of compressed
textures was to make the texture linear and have a tiled shadow copy.
This isn't needed on Sky Lake because the HALIGN and VALIGN parameters
are specified in surface elements and required to be a multiple of 4.
This means that we can just use the X/Y Offset fields and we can avoid
the shadow copy song and dance. This also makes ASTC work because ASTC
can't be linear and so the shadow copy method doesn't work there.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In order to get support everywhere, this gets a bit complicated. On Sky
Lake and later, everything is fine because HALIGN/VALIGN are specified
in surface elements and are required to be at least 4 so any offsetting
we may need to do falls neatly within the heavy restrictions placed on
the X/Y Offset parameter of RENDER_SURFACE_STATE. On Broadwell and
earlier, HALIGN/VALIGN are specified in pixels and are hard-coded to
align to exactly the block size of the compressed texture. This means
that, when reinterpreted as a non-compressed texture, the tile offsets
may be anything and we can't rely on X/Y Offset.
In order to work around this issue, we fall back to linear where we can
trivially offset to whatever element we so choose. However, since
linear texturing performance is terrible, we create a tiled shadow copy
of the image to use for texturing. Whenever the user does a layout
transition from anything to SHADER_READ_ONLY_OPTIMAL, we use blorp to
copy the contents of the texture from the linear copy to the tiled
shadow copy. This assumes that the client will use the image far more
for texturing than as a storage image or render target.
Even though we don't need the shadow copy on Sky Lake, we implement it
this way first to make testing easier. Due to the hardware restriction
that ASTC must not be linear, ASTC does not work yet.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This struct represents a full surface state including the addresses of
the referenced main and auxiliary surfaces (if any). This makes
relocation setup substantially simpler and allows us to move 100% of the
surface state setup logic into anv_image where it belongs. Before, we
were manually fishing data out of surface states when emitting
relocations so we knew how to offset aux address. It's best to keep all
of the surface state emit logic together. This also gets us closer, at
least cosmetically, to a world of no relocations where addresses are
placed in surface states up-front.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This gives us a single centralized place where we take an image view and
use it to fill out a surface state.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It's not SPIR-V that's backwards from GLSL, it's Vulkan that's backwards
from GL. Let's make NIR consistent with the source language and do the
flipping inside the Vulkan driver instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: wait in map_buffer and map_image as well
v3: use event::wait instead of wait (skips fence wait for hard_event)
v4: use wait_signalled()
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
And define a method for other threads to wait until the action
function associated with an event has been executed to completion.
For hard events, this will mean waiting until the corresponding
command has been submitted to the pipe driver, without necessarily
flushing the pipe_context and waiting for the actual command to be
processed by the GPU (which is what hard_event::wait() already does).
This weaker kind of event wait will allow implementing blocking memory
transfers efficiently.
Acked-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
We are really not going to use a winsys which does not need to store
the va, so might as well store it in a standard field.
Not sure this helps perf much though, as most of the cost is in the
cache miss accessing the bo anyway, which we stil need to do.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Since most games use only a few, iterating through all of them is
a waste. Simplifies the code too.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Nothing too exciting, just adding the possibility for a pNext pointer,
and batch binding. Our binding is pretty much trivial.
It also adds VK_IMAGE_CREATE_ALIAS_BIT_KHR, but since we store no
state in radv_image, I don't think we have to do anything there.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This uses all the existing code to calculate lod values for mip linear
filtering. Though we'll have to disable the simplifications (if we know some
parts of the lod calculation won't actually matter for filtering purposes due
to mip clamps etc.). For better or worse, we'll also disable lod calculation
hacks (mostly should make a difference for cube maps) always - the issue with
per-pixel lod being difficult is mostly because we then have different mipmaps
needed for the actual texel fetch, which isn't a problem with lodq.
We still use approximation for the log2 - for that reason I believe the float
part of the lod is only accurate to about 4-5 bits (and one bit less with 1d
textures actually) which is hopefully good enough (though d3d10 technically
requires 6 bits - could use quadratic interpolation instead of linear to get
8 bits or so).
Since lodq requires unclamped lod, we also have to move some sampler key
calculations to texture sampling code - even if we know we're going to access
mipmap 0 we still have to calculate lod and apply lod_bias for lodq.
Passes piglit ARB_texture_query_lod tests (after having fixed the test).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Some DRI image properties weren't properly duplicated in the
new image. Some properties are still missing, but I'm not
certain if there was a good reason to let them out in the first
place.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fixes a bug with nearest ("point") mip selection when the fractional
part of max_lod is in (0.5,1). In this case, the spec mandates that
we still select the mip level ceil(max_lod) in the clamping case. However,
MIP_POINT_PRECLAMP will clamp before the mip selection, which is wrong.
Supposedly this setting was originally copied from the closed Vulkan
driver, but as far as I can tell, closed Vulkan was actually changed back
recently :)
Fixes dEQP-GLES3.functional.texture.mipmap.2d.max_lod.{nearest,linear}_nearest
Fixes: f7420ef5b4 ("radeonsi: enable some sampler fields to match the closed driver")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Like for cube map (array) gather, we need to round to nearest on <= VI.
Fixes tests in dEQP-GLES3.functional.shaders.texture_functions.texture.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Prevent an overflow caused by too many output variables. To limit the
scope of the issue, write to the assigned array only for the non-ES
fragment shader path, which is the only place where it's needed.
Since the function will bail with an error when output variables with
overlapping components are found, (max # of FS outputs) * 4 is an upper
limit to the space we need.
Found by address sanitizer.
Fixes dEQP-GLES3.functional.attribute_location.bind_aliasing.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This change makes etna_get_driver_query_info(..) more generic
and puts the knowledge of supported queries directly besides
the implementation.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
The Vulkan spec (1.0.61) says:
"The number of scissors used by a pipeline is still specified
by the scissorCount member of VkPipelinescissorStateCreateInfo."
So, the number of scissors is defined at pipeline creation
time and shouldn't be updated when they are set dynamically.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The Vulkan spec (1.0.61) says:
"The number of viewports used by a pipeline is still specified
by the viewportCount member of VkPipelineViewportStateCreateInfo."
So, the number of viewports is defined at pipeline creation
time and shouldn't be updated when they are set dynamically.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
If we don't have a depth piece, we don't get a correct
swizzle mode and we hit an assert in addrlib.
In case of no depth get the preferrred swizzle mode for
stencil alone.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Unreal Engine 4 seems to really like this format for some reason. We
don't technically have the hardware format but we do have L8_SRGB. It's
easy enough to fake with that and a swizzle.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Vulkan needs to be able to clear any texture you can create. We want to
add support for VK_FORMAT_R8_SRGB and we need to use L8_UNORM_SRGB to do
that so we need to be able to clear it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of having three, almost identical but not quite,
_eglDebugReport* functions, simply fold them into one.
While doing so drop the unnecessary arguments 'command' and
'objectLabel'. Former is identical to funcName, while the latter is
already stored (yet unused) in _EGLThreadInfo::CurrentObjectLabel.
Cc: Kyle Brenneman <kbrenneman@nvidia.com>
Cc: Adam Jackson <ajax@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (IRC)
Seemingly, the original intent behind _eglError's 'msg' was aimed to
provide a function name.
At some point, people started using it the way EGL_KHR_debug's
callback() message is meant to be used. Aka providing meaningful
information to the developer/user.
Swap the funcName/msg argument order in the _eglDebugReport() call.
The 'funcName' variable is implicitly set, props to the
_eglSetFuncName() call at the start of each public entrypoint.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Vulkan does not depend on the library or any of the objects
created in the process.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
At the moment wayland-clients, such as the Vulkan drivers were
over-linking against libwayland-server.so.
That went unnoticed, since both client and server code uses the
wl*interface symbols, which are present in both libwayland-client.so and
libwayland-server.so.
I've looked at correcting that, although that's orthogonal to this fix.
Note: wayland-egl does _not_ depend on wayland-client, although it does
need wayland-egl.h. There's no distinct package that provides it (I have
a WIP on the topic) so current solution will do for now.
v2: Rebase with the "...inline wayland_drm_buffer_get" patch removed.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Due to GCC feature described in previous commit, the expected
deprecation warnings may be missing.
Set the WL_HIDE_DEPRECATED macro which will omit the deprecated
functionality, resulting in more distinct build issues.
That is safe since the symbols guarded within the macro is static.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Suggested-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Wayland v1.2 with commit 1488c96a5db ("Add accessor functions for
wl_resource and deprecate wl_client_add_resource") paves the way towards
making wl_resource opaque.
Namely, new helpers were introduced and the struct was annotated as
deprecated.
Since wayland headers are normally installed in /usr/include, which is
in -isystem, GCC did not generate warnings as documented in the manual.
"Warnings from system headers are normally suppressed..."
Signed-off-by: Micah Fedke <micah.fedke@collabora.co.uk>
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
[Emil Velikov: add commit message]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Unused anywhere throughout the codebase. We could start using it,
although that contradicts to an evil plan* of mine.
* Only wayland servers will make use of the static library, providing
actual distinction between server vs client.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Fixes:
CC isl/isl_format_layout.lo
In file included from
../../../../src/intel/isl/isl_storage_image.c:24:0:
../../../../src/intel/isl/isl_priv.h:170:29: fatal error:
isl_genX_priv.h: No such file or directory
compilation terminated.
Makefile:2936: recipe for target 'isl/isl_storage_image.lo' failed
make[5]: *** [isl/isl_storage_image.lo] Error 1
make[5]: *** Waiting for unfinished jobs....
In file included from ../../../../src/intel/isl/isl.c:36:0:
../../../../src/intel/isl/isl_priv.h:170:29: fatal error:
isl_genX_priv.h: No such file or directory
compilation terminated.
make[5]: *** [isl/isl.lo] Error 1
Makefile:2936: recipe for target 'isl/isl.lo' failed
make[4]: *** [all] Error 2
when running `make distcheck`.
v2: Fix commit title (Emil)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixes:
CCLD libvulkan_wsi.la
ar: `u' modifier ignored since `D' is the default (see `U')
../../../../src/vulkan/util/vk_enum_to_str.c:26:45: fatal error:
vulkan/vk_android_native_buffer.h: No such file or directory
compilation terminated.
make[5]: *** [util/vk_enum_to_str.lo] Error 1
When running `make distcheck`.
v2: Fix commit title (Emil)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
There was no reason to treat array types and record types differently.
Unifying them saves a bunch of code and saves a few bytes in every
ir_constant.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
The next patch will unify ::array_elements and ::components, so the
name ::array_elements wouldn't be appropriate. A lot of things use
the names array_elements and components, so grepping for either is
pretty useless.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
In GLSL ES 3.10 session 4.9 [Memory Access Qualifiers], it has the
following description:
"A variable could be qualified as both readonly and writeonly,
disallowing both read and write, but still be passed to
imageSize() to have the size queried.".
This is for image variable, but not for buffer variables.
According to https://github.com/KhronosGroup/OpenGL-API/issues/7 Khronos
intent is to allow both readonly and writeonly in buffer variables, and
as such it will update the GLSL specification.
This commit address this issue, and fixes:
KHR-GL{43,44,45}.shader_storage_buffer_object.basic-readonly-writeonly
KHR-GLES31.core.shader_storage_buffer_object.basic-readonly-writeonly
v2: set correctly fields[i] memory flags (Samuel Pitoiset).
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Use the plumbing introduced with previous patch to interact with the
Android framework.
Namely: currently we use an invalid fd of -1 for our calls to
ANativeWindow::{queue,cancel}Buffer.
At the same time applications (like flatland) may rely on it being
a valid one. Thus as they attempt to query the timestamp of the fence,
they get unexpected results/behaviour.
In the case of flatland - the benchmark hang inside getSignalTime().
Make use of the out fence and pass the correct fd to Android.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101655
Signed-off-by: Zhongmin Wu <zhongmin.wu@intel.com>
Signed-off-by: Yogesh Marathe <yogesh.marathe@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
[Emil Velikov: split from larger patch]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Add plumbing to allow creation of per display surface out fence.
This can be used to implement explicit sync. One user of which is
Android - which will be addressed with next commit.
Signed-off-by: Zhongmin Wu <zhongmin.wu@intel.com>
Signed-off-by: Yogesh Marathe <yogesh.marathe@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
[Emil Velikov: reorder so there's no intermetent regressions, split]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Originally dri3 egl surface was wrapped around _EGLSurface.
With next commit we'll add additional attributes, which will be checked
from generic code. Thus in order to access that we need to use
dri2_egl_surface.
The name of the latter is a misnomer - it should really be dri or
dri_common...
Signed-off-by: Yogesh Marathe <yogesh.marathe@intel.com>
[Emil Velikov: commit message, squash the patches appropriately, add
relevant _eglInitSurface hunk to prevent build breakage]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
By leaving the compiled shader in the context's stage state, the next
compile of a new FS would look in the old compiled FS for figuring out
whether to set various dirty flags for the VS compile. Clear out the
pointer when deleting the program, and make sure that we always mark the
state as dirty if the previous program had been lost. Fixes valgrind
warnings on glsl-max-varyings.
Fixes: 2350569a78 ("vc4: Avoid VS shader recompiles by keeping a set of FS inputs seen so far.")
I originally wrote the code to call the maps 'batch' and 'state',
until I remembered that 'batch' is the intel_batchbuffer struct pointer.
The NULL check was still using the wrong variable.
Caught by Coverity.
CID: 1418109
The blitter will bind just the depth buffer, which flushes the current job
if we had both a color and depth/stencil. If the clear was doing partial
depth/stencil (quad-based) and color (tile-based), we'd go on to try to
set up the rest of the tile clear in the now flushed job.
Instead, move the partial clear up before we start setting up the job for
the current FBO state, and re-fetch the job if we're continuing on to a
tile-based clear. Fixes valgrind failures in fbo-depthtex.
Fixes: 9421a6065c ("vc4: Fix fallback to quad clears of depth in GLX.")
I was trying to continue the hash table loop, not the inner loop. This
tended to work out, because we would have *just* freed the job struct.
Fixes some valgrind failures in fbo-depthtex.
Fixes: f597ac3966 ("vc4: Implement job shuffling")
Only one of the three checks for dim was updated, so we would try to set a
UBO buffer index source value on a nir_load_uniform, and wouldn't actually
declare non-UBO uniforms.
Fixes: 37dd8e8dee ("gallium: all drivers should accept two-dimensional constant buffer indexing")
Tested-by: Derek Foreman <derekf@osg.samsung.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Android's Vulkan loader implements VK_KHR_surface and VK_KHR_swapchain,
and applications cannot access the driver's implementation. Moreoever,
if the driver exposes the those extension strings, then tests
dEQP-VK.api.info.instance.extensions and dEQP-VK.api.info.device fail
due to the duplicated strings.
v2: Replace !ANDROID with ANV_HAS_SURFACE. (for jekstrand)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Feed the XML to anv_extensions.py and anv_entrypoints_gen.py.
Do it on all platforms, not just Android. Tested on Android and Fedora.
We always parse the Android XML, regardless of target platform, to
help reduce the chance that people working on non-Android break the
Android build.
v2:
- Squash in Tapani's changes to Android.*.mk.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1)
The taught scripts are anv_extensions.py and anv_entrypoints_gen.py. To
give a script multiple XML files, call it like so:
anv_extensions.py --xml a.xml --xml b.xml --xml c.xml ...
The scripts parse the XML files in the given order.
This will allow us to feed the scripts XML files for extensions that are
missing from the official vk.xml, such as VK_ANDROID_native_buffer.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
To give the script multiple XML files, call it like so:
gen_enum_to_str.py --xml a.xml --xml b.xml --xml c.xml ...
The script parses the XML files in the given order.
This will allow us to feed the script XML files for extensions that are
missing from the official vk.xml, such as VK_ANDROID_native_buffer.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The VK_ANDROID_native_buffer extension is missing from the official
vk.xml. This patch defines the extension in a separate, minimal XML
file: vk_android_native_buffer.xml.
I chose to add the extension to a new XML file instead of adding it to
the official vk.xml in order to avoid conflicts each time we sync the
vk.xml from Khronos.
This should be only a temporary solution until Jesse Hall is persuaded
to add it to the official vk.xml.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This patch consolidates many potential `#ifdef ANDROID` messes
throughout src/vulkan and src/intel/vulkan into a simple, localized
hack. The hack is an `#ifdef ANDROID` in vk_android_native_buffer.h
that, on non-Android platorms, avoids including the Android platform
headers and typedefs any Android-specific types to void*.
This hack doesn't remove *all* the `#ifdef ANDROID`s in upcoming
patches, but it does remove a lot.
I first tried implementing VK_ANDROID_native_buffer without this hack,
but eventually gave up when the yak shaving became too much.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The setTexBuffer2 hook from GLX is used to implement glxBindTexImageEXT
which has tighter restrictions than just "it's shared". In particular,
it says that any rendering to the image while it is bound causes the
contents to become undefined. This means that we can do whatever aux
tracking we want between glxBindTexImageEXT and glxReleaseTexImageEXT so
long as we always transition from external in Bind and to external in
Release.
The fact that we were using make_shareable before was a problem because
it would resolve away 100% of the aux data and then throw away our
reference to the aux buffer. If the aux data was shared with some other
application (i.e. if we're using I915_FORMAT_MOD_Y_TILED_CCS) then we
would forget that the aux data even existed for the rest of eternity.
This is fine for the first frame but any subsequent calls to
glxBindTexImageEXT would bind the texture as if it has no aux
whatsoever and no resolves would happen and texturing would happen as if
there is no aux. This was causing rendering corruption in mutter when
running on top of X11 with modifiers.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
The old code made a new miptree that referenced the same BO as the
renderbuffer and just trusted in the memory aliasing to work. There are
only two ways in which the new miptree is liable to differ from the one
in the renderbuffer and neither of them matter:
1) It may have a different target. The only targets that we can ever
see in intelSetTexBuffer2 are GL_TEXTURE_2D and GL_TEXTURE_RECTANGLE
and the difference between the two doesn't matter as far as the
miptree is concerned; genX(update_sampler_state) only looks at the
gl_texture_object and not the miptree when determining whether or
not to use normalized coordinates.
2) It may have a very slightly different format. Again, this doesn't
matter because we've supported texture views for quite some time so
we always look at the gl_texture_object format instead of the
miptree format for hardware setup anyway.
On the other hand, because we were recreating the miptree, we were using
intel_miptree_create_for_bo which doesn't understand modifiers. We
really want this function to work without doing a resolve so long as you
have modifiers so we need to fix that.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When we get a miptree in through glxBindImageEXT, we don't know the
current aux state so we have to assume the worst-case. If the image
gets recreated, everything is fine because miptreecreate_for_dri_image
sets it to the default. However, if our miptree is recycled, then we
may have stale aux_usage and we need to reset to the default otherwise
our aux_state tracking will get messed up.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This shouldn't really happen in practice, but I hit it a couple of times
when running a driver with a bad memory leak. We may as well hook up
the warning, because if it ever triggers, we'll know something is wrong.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
We do not enable this by default for additive blending, since it slightly
breaks OpenGL invariance guarantees due to non-determinism.
Still, there may be some applications can benefit from white-listing
via the radeonsi_commutative_blend_add drirc setting without any real
visible artifacts.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This option enables a performance optimization where typical non-blending
draws with depth buffer may be rasterized out-of-order (on VI+, multi-SE
chips).
This optimization can lead to incorrect results when an applications
renders multiple objects with the same Z value at the same pixel, so we
will never enable it by default. But there may be applications that could
benefit from white-listing.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This does not take commutative blending into account yet.
R600_DEBUG=nooutoforder disables it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
To be able to properly distinguish between GL_ANY_SAMPLES_PASSED
and GL_ANY_SAMPLES_PASSED_CONSERVATIVE.
This patch goes through all drivers, having them treat the two
query types identically, except:
1. radeon incorrectly enabled conservative mode on
PIPE_QUERY_OCCLUSION_PREDICATE. We now do it correctly, only
on PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE.
2. st/mesa uses the new query type.
Fixes dEQP-GLES31.functional.fbo.no_attachments.*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The NIR-to-LLVM pass already does this; now the same fix covers
radeonsi as well.
Fixes various tests of
dEQP-GLES31.functional.texture.filtering.cube_array.combinations.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is the same workaround that radv already applied in commit
3ece76f03d ("radv/ac: gather4 cube workaround integer").
Fixes dEQP-GLES31.functional.texture.gather.basic.cube.rgba8i/ui.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It can't *really* happen since we don't use subroutines.
CID: 1417491
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-By: Gert Wollny <gw.fossdev@gmail.com>
Fixes a regression introduced with b96313c0e1, which removed
BRW_NEW_BLORP for a bunch of SURFACE_STATE setup code, including render
targets, on the basis that blorp invalidates binding tables but not
surface states, however, at least on Broadwell, this caused a regression
in a CTS test, which Ken and Jason tracked down to the fact that we
are not uploading new render target surface states after allocating
new CCS_D surfaces for fast clears (which allocation is deferred until
an actual clear occurs).
The reason this only fails in BDW is that on SKL+ we use CCS_E which
is allocated up front so it exists in the initial surface state, the
problem can be reproduced in these platforms too if we use
INTEL_DEBUG=norcb to force the CCS_D path.
This patch, together with the ones preceding it, fixes the regression
by ensuring that we track and flag as dirty all aux state changes.
Credit goes to Jason and Ken for figuring out the reason for the
regression.
Fixes:
KHR-GL45.transform_feedback.draw_xfb_test
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We want to use this flag to signal changes to the aux surfaces,
so let's not make it about fast clearing only. Suggested by Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Adding gbm_device_get_format_modifier_plane_count made the
test gbm-symbols-check fail, this patch adds the according
function name to the test.
Fixes: 8824141b8d
(gbm: Add a gbm_device_get_format_modifier_plane_count function)
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Jason and I use this for debugging all the time. Recompiling the driver
to enable it is kind of annoying. It's a great thing to try along with
always_flush_batch=true and always_flush_cache=true to detect a class of
problems - namely, atoms listening to an insufficient set of dirty bits.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Only on GFX9 we implement them as 2D images.
This fixes:
dEQP-VK.image.image_size.1d_array.readonly_12x34
dEQP-VK.image.image_size.1d_array.readonly_1x1
dEQP-VK.image.image_size.1d_array.readonly_32x32
dEQP-VK.image.image_size.1d_array.readonly_7x1
dEQP-VK.image.image_size.1d_array.readonly_writeonly_12x34
dEQP-VK.image.image_size.1d_array.readonly_writeonly_1x1
dEQP-VK.image.image_size.1d_array.readonly_writeonly_32x32
dEQP-VK.image.image_size.1d_array.readonly_writeonly_7x1
dEQP-VK.image.image_size.1d_array.writeonly_12x34
dEQP-VK.image.image_size.1d_array.writeonly_1x1
dEQP-VK.image.image_size.1d_array.writeonly_32x32
dEQP-VK.image.image_size.1d_array.writeonly_7x1
Fixes: 1bcb953e16 "radv: handle GFX9 1D textures"
Reviewed-by: Dave Airlie <airlied@redhat.com>
It's nearly the same so there's no good reason why it can't be in a
common function. The one difference is that _mesa_store_teximage
calls AllocTextureImageBuffer for us, while _mesa_store_texsubimage
doesn't, but we don't need that anyway - intelTexImage already does it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
It is set to false in both callers. It isn't needed for glTexImage
because intelTexImage calls AllocTextureImageBuffer before calling
texsubimage_tiled_memcpy.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
These two paths are basically the same. There's no good reason to have
them in different files.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This fixes a crash on Haswell when we try to upload a stencil texture
with blorp. It would also be a problem if someone tried to texture from
stencil after glBlitFramebuffers.
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
libunwind is a optional dependency used by the gallium aux module
(libgallium) and consequently the final binaries must be linked against
it. To test whether the library is properly specified in the link pass
add it to the travis-ci build environment and force its use.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
In Ubuntu Trusty the default version of llvm is 3.4 and the build was
actually randomly picking 3.5 or 3.9. Adding libunwind would then result
is build success or failure depending of what version was picked.
Install the llvm-3.3-dev package and force its use: On one hand it is
the minimum required version we want to the build test against, and on
the other hand forcing the version stabilizes the build.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Include src/gallium/Automake.inc, correct the build flags accordingly.
Force -std=c++11 (extensively used by the test) as otherwise it gets
defined only when building against llvm >= 3.9.
Fixes: 7be6d8fe12 ("mesa/st: glsl_to_tgsi: add tests for the new
temporary lifetime tracker")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102665
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
Otherwise it will be missing from the tarball, leadin to build failure.
Fixes: d4d777317b ("radv: move shaders related code to radv_shader.c")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
fixes following warning:
warning: format specifies type 'long' but the argument has type 'uint64_t' (aka 'unsigned long long')
cast is needed to avoid this change turning in to another warning:
warning: format specifies type 'unsigned long long' but the argument has type 'uint64_t' (aka 'unsigned long')
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Also, it's useless to set the error code twice. Though, we
should probably skip the next commands when the command buffer
is considered invalid.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The virgl protocol version of tgsi doesn't handle this yet,
transform it back to the old ways.
Thanks to Nicolai Hähnle <nicolai.haehnle@amd.com>
for also writing nearly the same patch.
Fixes: 41e342d5 tgsi/ureg: always emit constants (and their decls) as 2D
Tested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Otherwise we end up using a 32-bit comparison which didn't end well.
Timothy caught this while playing around with some opt passes.
Fixes: 278580729a (st/glsl_to_tgsi: add support for 64-bit integers)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It's nice to have this information. While we're at it, tweak the
formatting to try and vertically align numbers in the common case.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
We now flush the batch when either the batchbuffer or statebuffer
reaches the original intended batch size, instead of when the sum of
the two reaches a certain size (which makes no sense now that they're
separate buffers).
With this change, we also need to update our "are we near the end?"
estimate to require separate batch and state buffer space. I obtained
these estimates by looking at the size of draw calls in the Unreal 4
Elemental Demo (using INTEL_DEBUG=flush and always_flush_batch=true).
This will significantly impact the size of our batches. I've adjusted
both down to try and be roughly similar to what we had been doing. On
various benchmarks, a 20kB batch and 16kB statebuffer seemed to about
right, but we may need to adjust this further. I tried a 16kB batch,
but that regressed Synmark OglMultithread performance by a fair bit.
32kB for both would have significantly increased our batch sizes.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Now that we can grow the batchbuffer if we absolutely need the extra
space, we don't need to reserve space for the final do-or-die ending
commands.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
We need to set brw->no_batch_wrap to actually avoid flushing in the
middle of our BLORP operation, and instead grow the batchbuffer.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Previously, we would just assert fail and die in this case. The only
safeguard is the "estimated max prim size" checks when starting a draw
(or compute dispatch or BLORP operation)...which are woefully broken.
Growing is fairly straightforward:
1. Allocate a new larger BO.
2. memcpy the existing contents over to the new buffer
3. Set the new BO to the same GTT offset as the old BO. When emitting
relocations, we write the presumed GTT offset of the target BO. If
we changed it, we'd have to update all the existing values (by
walking the relocation list and looking at offsets), which is more
expensive. With the old BO freed, ideally the kernel could simply
place the new BO at that offset anyway.
4. Update the validation list to contain the new BO.
5. Update the relocation list to have the GEM handle for the new BO
(which we can skip if using I915_EXEC_HANDLE_LUT).
v2: Update to handle malloc'd shadow buffers.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Previously, we emitted GPU commands and indirect state into the same
buffer, using a stack/heap like system where we filled in commands from
the start of the buffer, and state from the end of the buffer. We then
flushed before the two met in the middle.
Meeting in the middle is fatal, so you have to be certain that you
reserve the correct amount of space before emitting commands or state
for a draw. Currently, we will assert !no_batch_wrap and die if the
estimate is ever too small. This has been mercifully obscure, but has
happened on a number of occasions, and could in theory happen to any
application that issues a large draw at just the wrong time.
Estimating the amount of batch space required is painful - it's hard to
get right, and getting it right involves a lot of code that would burn
CPU time, and also be painful to maintain. Rolling back to a saved
state and retrying is also painful - failing to save/restore all the
required state will break things, and redoing state emission burns a
lot of CPU. memcpy'ing to a new batch and continuing is painful,
because commands we issue for a draw depend on earlier commands as well
(such as STATE_BASE_ADDRESS, or the GPU being in a pirtacular state).
The best plan is to never run out of space, which is totally doable but
pretty wasteful - a pessimal draw requires a huge amount of space, and
rarely occurs. Instead, we'd like to grow the batch buffer if we need
more space and can't safely flush.
We can't grow with a meet in the middle approach - we'd have to move the
state to the end, which would mean updating every offset from dynamic
state base address. Using separate batch and state buffers, where both
fill starting at the beginning, makes it easy to grow either as needed.
This patch separates the two concepts. We create a separate state
buffer, with a second relocation list, and use that for brw_state_batch.
However, this patch tries to retain the original flushing behavior - it
adds the amount of batch and state space together, as if they were still
co-existing in a single buffer. The hope is to flush at the same time
as before. This is necessary to avoid provoking bugs caused by broken
batch wrap handling (which we'll fix shortly). It also avoids suddenly
increasing the size of the batch (due to state not taking up space),
which could have a significant performance impact. We'll tune it later.
v2:
- Mark the statebuffer with EXEC_OBJECT_CAPTURE when supported (caught
by Chris). Unfortunately, we lose the ability to capture state data
on older kernels.
- Continue to support the malloc'd shadow buffers.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
We'll need to read from both buffers when decoding state.
This also drops the "failed to map" fallback - it's completely useless
on LLC systems where we write directly to the mapped BO. It's not that
useful on non-LLC systems either.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
brw_batch_reloc emits a relocation from the batchbuffer to elsewhere.
brw_state_reloc emits a relocation from the statebuffer to elsewhere.
For now, they do the same thing, but when we actually split the two
buffers, we'll change brw_state_reloc to use the state buffer.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
I'm planning on splitting batch and state into separate buffers, at
which point we'll need two relocation lists. In preparation for that,
this patch refactors the relocation stuff into a structure we can
replicate...which looks a lot like anv_reloc_list.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The batch buffer and state buffer code is fairly tied together,
and having it in one .c file will make refactoring easier.
Also, drop some commentary above brw_state_batch. The "aperture
checking performance hacks" are long since gone, so that paragraph
makes little sense at this point.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Prior to the previous patch, we would pwrite the batchbuffer contents,
and wanted to skip the execbuffer if that failed. Now that we memcpy,
we don't set ret != 0 on failure anymore, so it will always be 0.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
We'd like to eliminate the malloc'd shadow copy eventually, but there
are still unresolved performance problems. In the meantime, let's at
least get rid of pwrite.
On Apollolake, improves Synmark OglBatch6 performance by:
1.53581% +/- 0.269589% (n=108).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
This assertion prevents you from doing intel_batchbuffer_require_space
with a size so huge it won't fit in the batchbuffer. This doesn't seem
like a common mistake, and I've never seen the assert to be useful.
Soon, I hope to have batches grow, at which point this won't make sense.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
For non-CCS images, we were reporting just one plane even though they
may have multiple in the case of YUV.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
This allows the user to query the number of planes required by a given
format+modifier combination without having to create a bo or surface.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
After get_variable_being_redeclared() has been called, it is no longer
safe to access the original variable pointer, since its memory might have
been freed.
Since callers of this function should only be accessing the variable pointer
returned by the function, avoid potential bugs by re-assigning the
original variable pointer to the result of the function call,
making it impossible for the remaining code to access an invalid variable
pointer.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
get_variable_being_redeclared() can delete the original variable
in a specific scenario. The code sets it to NULL after this so other
code in that same function doesn't try to access trashed memory after
the fact, however, the copy of that variable in the caller code
won't see any of this making it very easy to overlook.
Make the function a bit safer by taking a pointer to the original
variable so we can also make NULL the caller's pointer to the variable
if this function deletes it.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Useful to know which debug/perftest options were enabled when
a hang report is generated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Might be useful for checking if all descriptors are sets by
the application.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This might be very useful in order to figure out where a shader
is stucked. This uses UMR to detect which instruction is executing
bad things.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Might report some useful information to help figuring out where
does the hang happened.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When a GPU hang is detected in radv_gpu_hang_occured() we know
which command buffer is faulty but the bound pipelines might
have been updated during the execution.
The pointers to the radv_pipeline objects are emitted just
after the second trace ID, that way it would be easy to dump
the active shaders at the moment of the hang.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When a batch is submitted, INTEL_DEBUG=bat prints a message indicating
which part of the code triggered the flush, and some statistics about
the batch/state buffer utilization.
It also decodes the batchbuffer in debug builds...which is so much
output that it drowns out the utilization messages, if that's all you
care about.
INTEL_DEBUG=submit now just does the utilization messages.
INTEL_DEBUG=bat continues to do both (as the message is a good indicator
that we're starting decode of a new batch).
v2: Rename from "flush" to "submit" (suggested by Chris) because we
might want "flush" for PIPE_CONTROL debugging someday.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
With the shaders in the ssao demo, the nir_opt_if wasn't
working properly without this, after this the if gets optimised
so that loop unrolling gets called.
(loop unrolling fails due to instruction count, but at least
it gets to do that.)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix the build for Android Nougat.
The dladdr(3) manpage says that <dlfcn.h> is required. On Linux, the
build succeeded without it because build_id.c includes <link.h> which
includes <dlfcn.h>. On Android, we must include <dlfcn.h> directly.
Fixes: 5c98d382 "util: Query build-id by symbol address, not library name"
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch renames build_id_find_nhdr() to
build_id_find_nhdr_for_addr(), and changes it to never examine the
library name.
Tested on Fedora by confirming that build_id_get_data() returns the same
build-id as the file(1) tool. For BSD, I confirmed that the API used
(dladdr() and struct Dl_info) is documented in FreeBSD's manpages.
This solves two problems:
- We can now the query the build-id without knowing the installed library's
filename.
This matters because Android requires specific filenames for HAL
modules, such as "/vendor/lib/hw/vulkan.${board}.so". The HAL
filenames do not follow the Unix convention of "libfoo.so". In
other words, the same query code will now work on Linux and Android.
- Querying the build-id now works correctly when the process
contains multiple shared objects with the same basename.
(Admittedly, this is a highly unlikely scenario).
Cc: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
enclosing_scope already contains enclosing_scope_first_read.
What we really want to check here -- not for correctness, but
for speed -- is whether last_read_scope already contains
enclosing_scope.
Reviewed-By: Gert Wollny <gw.fossdev@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This assertion is triggered on Stoney in Piglit
./bin/framebuffer-blit-levels {draw,read} stencil -auto -fbo
and similar tests. It should be harmless -- just relax it until
we can get internal clarification.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The GLSL rules for interpolateAtSample are unfortunate:
"Returns the value of the input interpolant variable at
the location of sample number sample. If
multisample buffers are not available, the input
variable will be evaluated at the center of the pixel.
If sample sample does not exist, the position used to
interpolate the input variable is undefined."
This fix will fallback to monolithic shader compilation when
interpolateAtSample is used without multisampling.
One alternative would be to always upload 16 sample positions,
filling the buffer up with repetition when the actual number of
samples is less, and then ANDing the sample ID with 0xf. However,
that punishes all well-behaving users of interpolateAtSample,
when in reality, only conformance tests should be affected by
the issue.
Fixes
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.non_multisample_buffer.*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
gl_SampleMaskIn is supposed to contain set bits only for the samples that
are covered by the current fragment shader invocation, but the VGPR
initialization hardware loads the set of all bits that are covered at the
current pixel.
Fixes various tests in
dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If the last operation happens to be a non-draw, such as a
transfer_map that triggers a decompress blit, there may be
interesting messages left in the driver log.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add InstanceStrideEnable field and rename InstanceDataStepRate to
InstanceAdvancementState in INPUT_ELEMENT_DESC structure.
Add stubs for handling InstanceStrideEnable in FetchJit::JitLoadVertices()
and FetchJit::JitGatherVertices() and assert if they are triggered.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Make more robust to handle strange strange configurations like a vmware
exported 4-way numa X 1-core configuration.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Add new field in SWR_BACKEND_STATE::vertexClipCullOffset to specify the
start of the clip/cull section of the vertex header. Removed use of
hardcoded slot from binner.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
SwrStallBE stalls the backend threads until all work submitted before
the stall has finished. The frontend threads can continue to make
forward progress.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
The function is only called from one place, which is hidden behind
the same `#ifdef DEBUG`.
Fixes: ca73c3358c "glsl: Mark functions static"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Per the spec:
"Resetting a command buffer is an operation that discards any
previously recorded commands and puts a command buffer in the
initial state."
As far I'm concerned, that flag can be changed by calling
VkCmdPushConstants() (or any other functions which update it),
so it should be cleared as well.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This field covers the whole resource.
Fixes:
dEQP-VK.pipeline.image.suballocation.sampling_type.combined.view_type.3d.format.*
dEQP-VK.texture.filtering.3d.combinations.*
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
As GFX9 can't handle 1D depth textures, radeonsi and
apparantly pro just update all 1D textures to 2D,
and work around it.
This ports the workarounds from radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Work out the width/height from the level manually, as on GFX9
we won't minify the iview width/height.
This fixes:
dEQP-VK.api.image_clearing.core.clear_color_image* on gfx9
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We are looking up the execution type prior to checking how many sources
we have. This leads to looking for a type for src1 on MOV instructions
which is bogus. On BDW+, the src1 register type overlaps with the
64-bit immediate and causes us problems.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
We can't use it anyway in fast clears, and on GFX9 it seems to
actually hange the card if we specify it.
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
The current DCC init routine doesn't account for initializing a
single layer or level. Multilayer seems hard for small textures on
pre-GFX9 as tre metadata for the layers can be interleaved. For
GFX9 multilevel textures are a problem for similar reasons.
So just disable this for now, until we handle the texture modes
correctly.
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Instead of setting based on set/unset, allow users to use boolean values.
In the docs and tests, use `DISABLE=true` instead of `DISABLE=1` as it's
clearer IMO.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Instead of setting based on set/unset, allow users to use boolean values.
In the docs, use `NO_DRAWARRAYS=true` instead of `NO_DRAWARRAYS=1` as it's
clearer IMO.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Instead of setting based on set/unset, allow users to use boolean values.
In the docs, use `DISABLE=true` instead of `DISABLE=1` as it's clearer IMO.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Instead of setting based on set/unset, allow users to use boolean values.
In the docs, use `ALWAYS=true` instead of `ALWAYS=1` as it's clearer IMO.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Instead of setting based on set/unset, allow users to use boolean values.
In the help string, use `ALLOW=true` instead of `ALLOW=1` as it's clearer IMO.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Instead of setting based on set/unset, allow users to use boolean values.
In the docs, use `ALWAYS=true` instead of `ALWAYS=1` as it's clearer IMO.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This was a bugfix to the spec addressed in OpenGL 4.5 (revision
7 of the spec) and there is a CTS test to check this.
Fixes:
KHR-GL45.shader_atomic_counters.negative-unsized-array
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
anv_debug adds 'debug:' already, this is to clean following:
debug: debug: anv_CreateDebugReportCallbackEXT: ignored VkStructureType 1000011000
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently anv_perf_warn call in anv_compute_heap_size does not ever
report a perf warning. Move debug variable read as the first thing
in case there will be other perf_warn calls added.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Patch adds required functionality for extension to manage a list of
application provided callbacks and handle debug reporting from driver
and application side.
v2: remove useless helper anv_debug_report_call
add locking around callbacks list
use vk_alloc2, vk_free2
refactor CreateDebugReportCallbackEXT
fix bugs found with crucible testing
v3: provide ANV_FROM_HANDLE and use it
misc fixes for issues Jason found
use vk_find_struct_const for finding ctor_cb
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We were skipping this fallback for depth, but not for stencil
which the hardware always requires to be W-tiled.
Also, make the checks for whether we need to apply retiling
strategies based on usage instead of tiling flags, which is
safer and more explicit.
This fixes a regression in a CTS test introduced with commit
4ea63fab77 that started applying re-tiling stencil surfaces
in certain scenarios.
v2: discard retiling based on usage fields instead of tiling
flags. This is safer and more explicit.
v3: Add a comment indicating that texturing of stencil in gen7
requires an Y-tiled copy (Topi).
Fixes:
KHR-GL45.direct_state_access.renderbuffers_storage
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
When a conditional branch has the same labels in the "if" part and in the
"else" part, then we have the same cfg block, and it must be handled
once.
v2: handle it the same way as OpBranch (Jason).
Fixes:
dEQP-VK.spirv_assembly.instruction.compute.conditional_branch.same_labels*
dEQP-VK.spirv_assembly.instruction.graphics.conditional_branch.same_labels*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Otherwise, when doing an out-of-tree build you can expect the following:
make[6]: Entering directory \
'${MESA_SRC}/build/src/mesa/state_tracker/tests'
CXX test_glsl_to_tgsi_lifetime.o
In file included from \
${MESA_SRC}/src/mesa/src/mesa/state_tracker/st_glsl_to_tgsi_private.h:31:0,
from \
${MESA_SRC}/src/mesa/src/mesa/state_tracker/st_glsl_to_tgsi_temprename.h:27,
from \
${MESA_SRC}/src/mesa/src/mesa/state_tracker/tests/test_glsl_to_tgsi_lifetime.cpp:24:
${MESA_SRC}/src/compiler/glsl/ir.h:1502:37: \
fatal error: ir_expression_operation.h: No such file or directory
#include "ir_expression_operation.h"
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Gert Wollny <gw.fossdev@gmail.com>
This moves a bunch of non-draw dependent calcs into the pipeline code,
to reduce CPU overheads in the draw path.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This realigns this code with the radeonsi version and fixes
the indirect case to work properly.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This removes the barrier and LDS stores and loads for tess factors
when it's possible. The removal of the barrier seems more important
to me though.
In one shader, it removes 17 * 4 bytes from the shader binary.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The pass tries to deduce whether tess factors are always written by
all shader invocations.
The implication for radeonsi is that it doesn't have to use a barrier
near the end of TCS, and doesn't have to use LDS for passing the tess
factors to the epilog.
v2: Handle barriers and do the analysis pass for each code segment
surrounded by barriers separately, and AND results from all
such segments writing tess factors. The change is trivial in the main
switch statement.
Also, the result is renamed to "tessfactors_are_def_in_all_invocs"
to make the name accurate.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The Android version in AOSP master has changed now to P, so we need to add
LLVM flags for it. Duplicating the lines because I expect the version will
get bumped at some point and diverge from O.
Cc: Chih-Wei Huang <cwhuang@android-x86.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Since commit 552aaa11 the compiler complains:
external/mesa/src/amd/common/ac_debug.c:124:51: error: use of undeclared identifier 'gfx9d_reg_table'; did you mean 'sid_reg_table'?
reg = find_register(gfx9d_reg_table, ARRAY_SIZE(gfx9d_reg_table), offset);
^~~~~~~~~~~~~~~
sid_reg_table
It's because the commit ef97cc0c ("radeonsi/gfx9: add IB parser support")
add gfx9d.h as a recipe of sid_tables.h. But the corresponding Android.mk
was not updated. However, it's not spotted since gfx9d_reg_table is not
really used until commit 552aaa11 was landed.
Fixes: 552aaa11 (ac/debug: take ASIC generation into account when printing registers)
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Don't get distracted by record dereferences between array references.
Fixes dEQP-GLES31.functional.tessellation.user_defined_io.per_vertex_block.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Currently we support 32-bit indexes/offsets all over the driver, so we
convert them to that bit size.
Fixes dEQP-VK.spirv_assembly.instruction.*.indexing.*
v2: Use u2u32 instead (Jason).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now, depth-only clears and custom passes don't read memory in VS.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Brian Paul <brianp@vmware.com>
src/mesa/drivers/dri/i965/intel_tex.h:52:40: warning: ‘enum intel_miptree_create_flags’ declared inside parameter list will not be visible outside of this definition or declaration
enum intel_miptree_create_flags flags);
^~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: cadcd89278 "i965/tex: Change the flags type on
create_for_teximage"
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The code can check for vm faults having happened. If we only do it
on a hang we don't know when the faults happened. This changes the
behavior to when the first VM faults is found, even without a hang.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
With GALLIVM_DEBUG=perf set, output the relevant stats for shader cache usage
whenever we have to evict shader variants.
Also add some output when shaders are deleted (but not with the perf setting
to keep this one less noisy).
While here, also don't delete that many shaders when we have to evict. For fs,
there's potentially some cost if we have to evict due to the required flush,
however certainly shader recompiles have a high cost too so I don't think
evicting one quarter of the cache size makes sense (and, if we're evicting
based on IR count, we probably typically evict only very few or just one
shader too). For vs, I'm not sure it even makes sense to evict more than
one shader at a time, but keep the logic the same for now.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This was implemented since forever, but not enabled.
It passes all piglit tests except one, arb_pipeline_statistics_query-frag.
The reason is that the test (for drawing a 10x10 rect) expects between
100 and 150 pixel shader invocations. But since llvmpipe counts this with
4x4 granularity (and due to the rect being 2 tris) we end up with 224
invocations. I believe however what llvmpipe is doing violates neither the
spirit nor the letter of the spec (our fragment shader granularity really
is 4x4 pixels, albeit we will bail out early on 2x2 or 4x2 (the latter
if AVX is available) granularity), the spec allows to count additional
invocations due to implementation reasons.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
gather is defined in terms of bilinear filtering, just without the filtering
part. However, there's actually some subtle differences required in our
implementation, because we use some tricks to simplify coord wrapping for the
two coords per direction.
For bilinear filtering, we don't care if we end up with an incorrect
texel, as long as the filter weight is 0.0 for it. Likewise, the order of
the texels doesn't actually matter (as long as they still have the correct
filter weight).
But for gather, these tricks lead to incorrect results.
Fix this for CLAMP_TO_EDGE, and add some comments to the other wrap functions
which look broken (the 3 mirror_clamp plus mirror_repeat) (too complex to fix
right now, and noone really seems to care...).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch aborts shader translation upon indirect indexing of temporary
register on non-vgpu10 device. This prevents non-supported feature
sending to the device.
Tested wth MTT-piglit, glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
This will allow to dump the active shaders when a hang is
detected. Only the ASM will be dumped for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reduce size of radv_pipeline.c and improve code isolation. More
code can probably moved but it's a start.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
A return code error is stored in the command buffer and should
be returned to the user via EndCommandBuffer().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Just to make sure we are using the set 0, because it's the
only one which is saved/restored when doing meta operations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes a regression introduced with commit
"mesa/st: Reduce the number of frontbuffer flush calls"
where we, after flushing the front buffer marked it as not-rendered-to,
the idea being that it should be marked as "rendered-to" again as soon as
any rendering was touching the front.
Now the latter part never happened, because it was part of a state
validation and we never marked that part of the state as dirty.
So mark the framebuffer state dirty after a frontbuffer flush.
(fdo bugzilla 102496)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102496
Fixes: eceb671002 (mesa/st: Reduce the number of frontbuffer flush calls)
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
We don't need to special case the batch - when we add the batch to the
validation list, we can simply increase the refcount to 2, and when we
make a new batch, we'll drop it back down to 1 (when unreferencing all
buffers in the validation list). The final reference is still held by
brw->batch.bo, as it was before.
This removes the special case from a bunch of loops.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
This copies what amdgpu-pro does, and allocates the memory
for an event with an uncached mtype.
This fixes hangs with:
dEQP-VK.api.command_buffers.record_simul_use_primary
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a precursor to the gfx9 fix to use uncached for the event
memory. Move to the interface which allows setting the flags,
but wrap it to avoid having to copy it around the place.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reverts commit 10dec2de2d.
The environment variable is no longer needed with the previous change
Reviewed-by: Christian König <christian.koenig@amd.com>
v2: use deinterlace common function
v3: make sure deinterlace only
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
So that it makes more clear for buffer reallocation based
on buffers layout for both decoder and encoder.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The similar function is in OMX, and only used by OMX. Now have it
moved to vl/compositor for other state tracker to use later.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Fixes the build in classic only mode, i.e. the new state tracker tests are
only build when Gallium is enabled.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The spec has special rules for querying buffer offsets and sizes
when BindBufferBase is used, described in the OpenGL 4.6 spec,
section 6.8 Buffer Object State:
"To query the starting offset or size of the range of a buffer
object binding in an indexed array, call GetInteger64i_v with
target set to respectively the starting offset or binding size
name from table 6.5 for that array. Index must be in the range
zero to the number of bind points supported minus one. If the
starting offset or size was not specified when the buffer object
was bound (e.g. if it was bound with BindBufferBase), or if no
buffer object is bound to the target array at index, zero is
returned."
Transform feedback buffer queries should follow the same rules, since
it is the same case for them. There is a CTS test for this.
Fixes:
KHR-GL45.direct_state_access.xfb_buffers
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Now it's able to generate ds_write2_b64 instead of ds_write2_b32.
-20 bytes in one shader binary. (having only 1 output)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It looks like commit 391673af7a that should
have fixed the perf regression didn't really change much if anything.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If we're seeing a drawable size change, in particular after processing a
configure notify event, make sure we invalidate so that the state tracker
picks up the new geometry.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This tries to mimic dri2 behaviour where events are typically processed
while waiting for X replies. Since, during steady-state dri3 rendering, we
seldom wait for xcb replies, and haven't enabled any automatic event
processing, instead check for events after a fence wait.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
I changed the behaviour earlier today, but forgot to update the
corresponding docs.
Fixes: 77713a0acb "mesa: allow user to set MESA_NO_ERROR=0"
Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Useless to track which one has been updated because we
re-upload all the vertex buffers in one shot.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Trivial. We already support tg4 for legacy tex opcodes, so the actual
texture sampling code already handles it.
(Just like TG4, we don't handle additional capabilities and always sample
red channel.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We're not particularly concerned with memory usage, if the tradeoff is
shader recompiles. And it's common for apps to have a lot of shaders
nowadays (and, since our shaders include a LOT of context state of course
we may create quite a bit more shaders even).
So quadruple the amount of shaders draw will cache (from 128 to 512).
For llvmpipe (fs shaders) quadruple the number of instructions, keep the
number of variants the same for now (only with very simple, non-texturing
shaders the variant limit could really be reached), and simplify the
definition, it's probably easier to just have one different definition
per branch...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This being declared bool means it won't get merged with the previous
bitfields, this seems like an oversight rather than deliberate.
Noticed when running pahole.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes:7319ff87("radeon/uvd: add YUYV format support for target buffer")
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The macro itself is a well defined string, which cannot cause issues
with printf or other printf-like functions.
All other places through Mesa already use it directly, so let's update
the final two instances.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Use consistent way to manage "non-default" llvm installations, clearly
documenting it.
AKA, use LLVM_CONFIG throughout and unset for the Windows/mingw builds.
v2: unset the save_ variable (Andres)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
One can control the number of jobs via MAKEFLAGS. As such there's
little reason to set the number of jobs for each make invocation.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Back in 2012 (commit 1e7776ca2b - egl: Remove bogus invalidate code.)
the loader use of invalidate() was purged as "bogus". One of the factors
defining that statement was the lack of the loader-side invalidate
extension - __DRI_USE_INVALIDATE.
Since then the commit was reverted (commit eed0a80137 - egl: Restore
"bogus" DRI2 invalidate event code.), always performing the driver
invalidate call, although the loader was never updated to expose the
extension.
Do so allowing the driver to do fine grained tuning.
Cc: Eric Anholt <eric@anholt.net>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net
As Marek pointed out in earlier commit - exposing RGBA on other
platforms introduces ~500 Visuals, which are not tested.
Note that this does not quite happen, yet. Reason being that the GLX
code does not check the masks - see scaralEqual().
Thus as we fix that, we'll run into the issue described.
v2: Rebase, while keeping loaderPrivate
v3: Beef-up commit message, getCapability() returns unsigned (Tapani)
Fixes: 1bf703e4ea ("dri_interface,egl,gallium: only expose RGBA visuals
on Android")
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Chad Versace <chadversary@chromium.org>
Cc: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Needed to compensate for change to fetch jit requiring
alignment.
Fixes regressions in piglit: vertex-buffer-offsets and about
another hundred of the vs-input*byte* tests.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
If <windows.h> is included then max is a macro that clashes
with std::numeric_limits::max, hence undefine it.
For some reason the struct access_record is not recognizes
outside the anonymouse namespace, make it a class.
The patch successfully was tested on AppVeyor.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This patch replaces the old register lifetime estiamtion and
rename mapping evaluation with the new one.
Performance to compare between the current and the new implementation
were measured by running the shader-db in one thread.
-----------------------------------------------------------
old new(std::sort)
---------------- time ./run -j1 shaders --------------------
real 5.80s 5.75s
user 5.75s 5.70s
sys 0.05s 0.05s
---- valgrind --tool=callgrind --dump-instr=yes------------
merge 0.08% 0.18%
estimate lifetime 0.02% 0.11%
evaluate mapping (incl=0.3%) 0.04%
apply mapping 0.03% 0.02%
--- perf (approximate because of statistic sampling) ----
merge (total) 0.09% 0.16%
estimate lifetime 0.03% 0.10%
evaluate mapping (incl=0.02%) 0.04%
apply mapping 0.04% 0.04%
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The patch adds tests for the register rename mapping evaluation and
combined life time estimation and renaming.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The remapping evaluator first sorts the temporary registers ascending
based on their first life time instruction, and then uses a binary search
to find merge canidates.
For the initial sorting it uses std::sort because qsort is quite slow in
comparison. By removing the define USE_STL_SORT in
src/mesa/state_tracker/st_glsl_to_tgsi_temprename.cpp
one can enable the alternative code path that uses qsort.
Registers that are not written to are not considered for renaming since in
glsl_to_tgsi_visitor::renumber_registers they are eliminated anyway.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This patch adds a class for tracking the life times of temporary registers
in the glsl to tgsi translation. The algorithm runs in three steps:
First, in order to minimize the number of needed memory allocations the
program is scanned to evaluate the number of scopes.
Then, the program is scanned second time to record the important register
access time points: first and last reads and writes and their link to the
execution scope (loop, if/else branch, switch case).
In the third step for each register the actual minimal life time is
evaluated.
In addition, when compiled in debug mode (i.e. NDEBUG is not defined)
the shaders and estimated temporary life times can be logged to stderr
by setting the environment variable GLSL_TO_TGSI_RENAME_DEBUG.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
To prepare the implementation of a temp register lifetime tracker
some of the classes are moved into seperate header/implementation
files to make them accessible from other files.
Specifically these are:
class st_src_reg;
class st_dst_reg;
class glsl_to_tgsi_instruction;
struct rename_reg_pair;
int swizzle_for_type(const glsl_type *type, int component);
as inline:
bool is_resource_instruction(unsigned opcode);
unsigned num_inst_dst_regs(const glsl_to_tgsi_instruction *op);
unsigned num_inst_src_regs(const glsl_to_tgsi_instruction *op);
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Instead of having to search the whole array, just use the whole
thing and store a valid bit in there with the rename.
Removes this from the profile on some of the fp64 tests
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When the HS wave is empty, the hardware writes the LS VGPRs starting at
v0 instead of v2. Workaround by shifting them back into place when
necessary. For simplicity, this is always done in the LS prolog.
According to the hardware team, this will be fixed in future chips,
so take that into account already.
Note that this is not a bug fix, as the bug was already worked
around by commit 166823bfd2 ("radeonsi/gfx9: add a temporary workaround
for a tessellation driver bug"). This change merely replaces the
workaround by one that should be better.
v2: add workaround code to shader only when necessary
v3: clarify the prefer_mono comment
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There were some overlapping changes in gfx9 especially in the CB/DB
blocks which made register dumps rather misleading.
The split is along the lines of the header files, so we'll print VI-only
fields on SI and CI, for example, but we won't print GFX9 fields on
SI/CI/VI, and we won't print SI/CI/VI fields on GFX9.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Automatically re-use table entries like StringTable and IntTable do.
This allows us to get rid of the "fields_owner" logic, and simplifies
the next change.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Found by inspection.
I'm not aware of any actual failures caused by this, but a precise
sequence of ralloc_adopt and ralloc_free should be able to cause
problems.
v2: make the code slightly clearer (Eric)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
GetTex*Image should return INVALID_ENUM if target is not valid, however,
GetTextureImage does not receive a target, and instead should return
INVALID_OPERATION if the effective target is not valid. From the
OpenGL 4.6 core profile spec, section 8.11 Texture Queries:
"An INVALID_OPERATION error is generated by GetTextureImage if the effective
target is not one of TEXTURE_1D, TEXTURE_2D, TEXTURE_3D, TEXTURE_1D_ARRAY,
TEXTURE_2D_ARRAY, TEXTURE_CUBE_MAP_ARRAY, TEXTURE_RECTANGLE, or
TEXTURE_CUBE_MAP (for GetTextureImage only)."
Fixes:
KHR-GL45.direct_state_access.textures_image_query_errors
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This reverts commit 611076a41a.
With the two previous commits, vega shouldn't be unstable,
doesn't pass CTS, but can do a complete run, and games shouldn't
hang anymore, so bring it back online.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is required on GFX9, fixes a bug in Talos where all the
mipmaps overlay each other.
Just pushing this as well as it fixes Talos.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This causes hangs in some of the CTS tests with a 2d
1536x2 texture.
This fixes hangs with:
dEQP-VK.pipeline.image.suballocation.sampling_type.combined.iew_type.1d_aray.format.r4g4b4a4_unorm_pack16.count_1.size.512x1_array_of_3
if we reenable it, make sure these don't regress.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The VI sizing only applies to VI.
This fixes:
dEQP-VK.image.image_size.buffer.*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The buffer bind flags can be promoted in svga_buffer_handle(), so
move the assertion after it. This has already been done for
vertex buffer in commit 6b4bf7e8be, but it misses the one for
index buffer.
Fixes assertion running WarThunder.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Minor performance improvement in avoiding binding the same shader resource
or the same vertex buffer for the same slot.
Tested with MTT glretrace.
v2: Per Brian's suggestion, add a helper function to do vertex buffer
comparision.
v3: Change the helper function to vertex_buffers_equal().
Reviewed-by: Brian Paul <brianp@vmware.com>
The copySubBuffer functionality always attempted a server side blit from
back to fake front if a fake front was present, and we weren't displaying
on a remote GPU.
Now that we always have local blit capability on modern drivers, first
attempt a local blit, and only if that fails, try the server blit.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Axel Davy <axel.davy@normalesup.org>
This increases performance, but it was tuned for Raven, not Vega.
We don't know yet how Vega will perform, hopefully not worse.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
3 flags for primitive binning, 2 flags for out-of-order rasterization
(but that will be done some other time)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The data is read when the render_cond_atom is emitted, so we must
delay emitting the atom until after the flush.
Fixes: 0fe0320dc0 ("radeonsi: use optimal packet order when doing a pipeline sync")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The result written by the shader workaround needs to be written back, or
the CP may read stale data.
Fixes: 78476cfe07 ("radeonsi: enable ARB_transform_feedback_overflow_query")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
src_register has no meaningful standalone use, it only makes sense when
called from translate_src.
v2: fix input array handling
Acked-by: Roland Scheidegger <sroland@vmware.com> (v1)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Most older drivers seem to just ignore the Dimension setting, so virtually
no changes should be needed.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
`anv_formats[ARRAY_SIZE(anv_formats)]` is already one too far.
Spotted by Coverity.
CovID: 1417259
Fixes: 242211933a "anv/formats: Nicely handle unknown VkFormat enums"
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This reduces the size from 96 to 80 bytes but putting all the
32-bit sizes at the start.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Otherwise radv_cmd_state_setup_attachments() will complain it has no clearvalues,
when called via radv_process_depth_image_inplace().
v2: use LOAD/STORE instead of DONT_CARE, to preserve stencil values.
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Otherwise, the simultaneous uage bit doesn't get set from the begin
info, which we need for batchchaining.
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
It doesn't seem like the old code could possibly work.
1. brw_gs_state_dirty made us bail unless one of these flags were set:
_NEW_TEXTURE, BRW_NEW_GEOMETRY_PROGRAM, BRW_NEW_TRANSFORM_FEEDBACK
2. If there was no geometry program, we called brw_upload_ff_gs_prog()3
3. That checked brw_ff_gs_state_dirty and bailed unless these were set:
_NEW_LIGHT, BRW_NEW_PRIMITIVE, BRW_NEW_TRANSFORM_FEEDBACK,
BRW_NEW_VS_PROG_DATA.
4. brw_ff_gs_prog_key pv_first and attr fields were set based on data
depending on _NEW_LIGHT and BRW_NEW_VS_PROG_DATA.
This means that if we needed a FF GS program, and changed the VS
outputs or provoking vertex mode, we'd fail to notice that we needed
to emit a new program.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
It is kind of pointless for compute, and avoids issues with apps kicking
off more than 32 compute shaders at once.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
CentOS 6 and RHEL 6 have autoconf 2.63.
Fixes: e4b2b69e82 ("configure: Add and use AX_CHECK_COMPILE_FLAG")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We can drop the meaningless "64" suffix - libdrm_intel originally had
an "offset" field that was an "unsigned long" which was the wrong size,
and we couldn't remove/alter that field without breaking ABI, so we had
to add a uint64_t "offset64" field.
"gtt_offset" is also more descriptive than "offset".
(Patch originally written by Ken, but Chris suggested a better name and
supplied the giant comment making up the bulk of the patch, so I changed
the authorship to him.)
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
It's used in exactly one place these days, and not much simpler than
just calling intel_batchbuffer_data directly.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
intel_batchbuffer_reset calls add_exec_bo on the batch right away,
which adds in the batch BO size.
Fixes: 29ba502a4e ("i965: Use I915_EXEC_BATCH_FIRST when available.")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Accompanying patch "st/mesa: only try to create 1x msaa surfaces for
'fake' msaa" requires driver to report max_samples=1 to enable "fake"
msaa. Previously, 0 and 1 were treated equivalently in st_init_extensions()
and either could enable "fake" msaa.
This patch raises the swr default msaa_max_count from 0 to 1, so that
swr_is_format_supported will report max_samples=1.
Real msaa can still be enabled by exporting SWR_MSAA_MAX_COUNT with a
pow2 value between 2 and 16.
This patch is necessary to prevent an OpenSWR regression resulting from
the st/mesa patch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102038
Acked-by: Brian Paul <brianp@vmware.com>
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
This introduces a new separate option because the output can
be quite verbose. If spirv-dis is not found in the path, this
debug option is useless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
At the moment, debugging radv is not really easy because the
driver doesn't report enough information when it hangs. This
new file will be the main location for all debug tools.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For software drivers where we want "fake" msaa support for GL 3.x, we
treat 1 sample as being msaa.
For drivers with real msaa support, start format probing at 2x msaa.
For drivers with fake msaa support, start format probing at 1x msaa.
This also tweaks the MaxSamples code in st_init_extensions() so that
we use MaxSamples=1 for fake msaa. This allows the format proble loops
to run at least one iteration.
This fixes a llvmpipe/VTK regression from commit 6839d33699.
And for drivers with fake msaa support, calls such as
glTexImage2DMultisample(samples=1) will now succeed.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102038
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102125
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
On using builtin functions we have to move the input to registers $0 and $1, if
one of the input value is an immediate, we fail to propagate the immediate:
...
mov u32 $r477 0x00000003 (0)
...
mov u32 $r0 %r473 (0)
mov u32 $r1 $r477 (0)
call abs BUILTIN:0 (0)
mov u32 %r495 $r1 (0)
...
With this patch the immediate is propagated, potentially causing the first MOV
to be superfluous, which we'd remove in that case:
...
mov u32 $r0 %r473 (0)
mov u32 $r1 0x00000003 (0)
call abs BUILTIN:0 (0)
mov u32 %r495 $r1 (0)
...
Shaderdb stats:
total instructions in shared programs : 4893460 -> 4893324 (-0.00%)
total gprs used in shared programs : 582972 -> 582881 (-0.02%)
total local used in shared programs : 17960 -> 17960 (0.00%)
local gpr inst bytes
helped 0 91 112 112
hurt 0 0 0 0
v2:
implement some changes proposed by imirkin, the manual deletion of the dead
mov is necessary after ea22ac23e0 ("nvc0/ir: unlink values pre- and post-call
to division function") as the potentially dead mov is unlinked properly,
causing later passes to not notice the mov op at all and thus not cleaning it
up. That makes up a big chunk of the regression the above commit caused.
Keep the deletion of the op where it is, deleting it later unnecessarily blows
up size of the change.
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
cs_invocations are currently unsupported, but leaving the field uninitialized
is even worse.
fixes on nvc0:
* KHR-GL45.pipeline_statistics_query_tests_ARB.functional_default_qo_values
* KHR-GL45.pipeline_statistics_query_tests_ARB.functional_non_rendering_commands_do_not_affect_queries
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Fix loading of a 3x16 vector as a single 48-bit load
on big-endian systems (PPC64, S390).
Roland Scheidegger's commit e827d91756
plus Ray Strode's patch reduce pre-Roland Piglit failures from ~4000 to ~2000. This patch fixes
three of the four regressions observed by Ray:
- draw-vertices
- draw-vertices-half-float
- draw-vertices-half-float_gles2
One regression remains:
- draw-vertices-2101010
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100613
Cc: "17.2" "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
lp_build_fetch_rgba_soa fetches a texel from a texture.
Part of that process involves first gathering the element
together from memory into a packed format, and then breaking
out the individual color channels into separate, parallel
arrays.
The code fails to account for endianess when reading the packed
values.
This commit attempts to correct the problem by reversing the order
the packed values are read on big endian systems.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100613
Cc: "17.2" "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ray Strode <rstrode@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Instead of saving primitive offset in the minmax cache key,
save the actual buffer offset which is used in the cache lookup.
Fixes rendering artifact seen with GoogleEarth when run with
VMware driver.
v2: Per Brian's comment, initialize offset to avoid compiler warning.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
error: incompatible pointer to integer conversion initializing 'VkFence'
(aka 'unsigned long long') with an expression of type 'void *' [-Werror,-Wint-conversion]
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When the kernel supports it set the local flag and
stop adding those BOs to the BO list.
Can probably be optimized much more.
v2: rename new flag to AMDGPU_GEM_CREATE_VM_ALWAYS_VALID
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For lower overhead in the CS ioctl.
Winsys allocators are not used with interprocess-sharable resources.
v2: It shouldn't crash anymore, but the kernel will reject the new flag.
v3 (christian): Rename the flag, avoid sending those buffers in the BO list.
v4 (christian): Remove setting the kernel flag for now
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Improves performance of GFXBench4 tests at 1024x768 on a Kabylake GT2:
- Manhattan 3.1 by 1.32134% +/- 0.322734% (n=8).
- Car Chase by 1.25607% +/- 0.291262% (n=5).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When we blit data into a buffer object, we may need to invalidate any
caches that might contain stale data, so the new data becomes visible.
For example, if the buffer object is bound as a vertex buffer, we need
to invalidate the vertex fetch cache.
While this flushing was missing, it usually happened implicitly for
non-obvious reasons: we're usually on the render ring, and calling
intel_emit_linear_blit() would require switching to the BLT ring,
causing an implicit flush. This likely provoked the kernel to do
PIPE_CONTROLs on our behalf. Although, Gen4-5 wouldn't have this
behavior. At any rate, we should do it ourselves.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Although we're phasing out brw_emit_mi_flush(), we still use it in some
places in order to "flush everything". In a number of those places, we
write data to a buffer that we may then bind as an image surface, SSBO,
or atomic buffer. Those usages require us to flush the data cache.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This exposes the new blorp_copy_buffer() functionality to i965.
It should be a drop-in replacement for intel_emit_linear_blit()
(other than the arguments being backwards, for consistency with BLORP).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I want to be able to copy between buffer objects using BLORP in the i965
driver. Anvil already had code to do this, in a reasonably efficient
manner - first using large bpp copies, then smaller bpp copies.
This patch moves that logic into BLORP as blorp_buffer_copy(), so we
can use it in both drivers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently if table_size is 0, it's falling through to:
unreachable("hash table should never be full");
But table_size can be 0 when RADV_DEBUG=nocache is set, or when the
table allocation fails (which is not considered an error).
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We need to take some take here as brw->is_broxton has been used to
check whether the device is a low power gen9 (aka Atom gen9 platform).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This reverts commit 13c23b19d0.
Mesa CI was brought down by this commit, with:
mesa/drivers/dri/i965/brw_sync.c:491: brw_dri_create_fence_fd:
Assertion `brw->screen->has_exec_fence' failed.
For Gen8, add 2xMSAA. For Gen9, add 2xMSAA and 16xMSAA.
Special thanks to Eero Tamminen for reporting rasterizer
numbers being twice what it should be for 2xMSAA under
a benchmark.
V2: Make pointer name less ugly + add 2xMSAA for Gen8
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit f6d38785e8.
Kevin's original patch accidentally didn't add 2x for Gen8; he sent
a v2 with a bunch of style fixes shortly after I pushed the original
patch, not knowing it was coming. Let's just revert this one, apply
v2, and move on.
Add plumbing to allow creation of per display surface out fence.
Currently enabled only on android, since the system expects a valid
fd in ANativeWindow::{queue,cancel}Buffer. We pass a fd of -1 with
which native applications such as flatland fail. The patch enables
explicit sync on android and fixes one of the functional issue for
apps or buffer consumers which depend upon fence and its timestamp.
v2: a) Also implement the fence in cancelBuffer.
b) The last sync fence is stored in drawable object
rather than brw context.
c) format clear.
v3: a) Save the last fence fd in DRI Context object.
b) Return the last fence if the batch buffer is empty and
nothing to be flushed when _intel_batchbuffer_flush_fence
c) Add the new interface in vbtl to set the retrieve fence
v3.1 a) close fd in the new vbtl interface on none Android platform
v4: a) The last fence is saved in brw context.
b) The retrieve fd is for all the platform but not just Android
c) Add a uniform dri2 interface to initialize the surface.
v4.1: a) make some changes of variable name.
b) the patch is broken into two patches.
v4.2: a) Add a deinit interface for surface to clear the out fence
v5: a) Add enable_out_fence to init, platform sets it true or
false
b) Change get fd to update fd and check for fence
c) Commit description updated
v6: a) Heading and commit description updated
b) enable_out_fence is set only if fence is supported
c) Review comments on function names
d) Test with standalone patch, resolves the bug
v6.1: Check for old display fence reverted
v6.2: enable_out_fence initialized to false by default,
dri2_surf_update_fence_fd updated, deinit changed to fini
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101655
Signed-off-by: Zhongmin Wu <zhongmin.wu@intel.com>
Signed-off-by: Yogesh Marathe <yogesh.marathe@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
This fixes a rendering issue with Hitman when bindless textures
are enabled.
Fixes: 2263610827 ("radeonsi: flush DB caches only when transitioning from DB to texturing")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This structure contains two fields, binding and index, that store the
binding in the descriptor set and the index inside the binding.
These structures are defined as uint8_t, but the types in Vulkan
specification are uint32_t, so big values are clamp.
This fixes dEQP-VK.binding_model.shader_access.*.multiple_arbitrary_descriptors.*
v2: use UINT32_MAX for index when having no render targets (Tapani)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
If llvmpipe_set_scissor_states() is never called, we still need to be sure
that derived scissor/clip state is updated. As of commit 743ad599a9
that function might not be called.
Fixes regressed Piglit gl-1.0-scissor-offscreen -fbo -auto test.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101709
Fixes: 743ad599a9 ("st/mesa: don't set 16 scissors and 16 viewports
if they're unused")
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Our initial size of 4kB is way too small to do anything useful, so we
end up growing it at least a few times. We may as well start it larger.
Some data points:
- Dinoshade (from Mesa Demos): hit 8kB.
- Chromium 60: hit 16kB after browsing a few things in Google Docs.
- GFXBench4 TRex/Manhattan 3.1: hit 128kB
- Unigine Valley 1.0: hit 512kB
It might make sense to start it even larger.
Acked-by: Matt Turner <mattst88@gmail.com>
Special thanks to Eero Tamminen for reporting rasterizer
numbers being twice what it should be for 2xMSAA under
a benchmark.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise clang warns:
glsl/glsl_lexer.cpp:3507:16: warning: function 'yyinput' is not needed
and will not be emitted [-Wunneeded-internal-declaration]
static int yyinput (yyscan_t yyscanner)
^
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Fixes warnings like
warning: implicit conversion from enumeration type 'enum isl_format' to
different enumeration type 'enum GEN10_SURFACE_FORMAT'
[-Wenum-conversion]
.SourceElementFormat = ISL_FORMAT_R32_UINT,
^~~~~~~~~~~~~~~~~~~
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The functions we're marking as UNUSED in isl_surface_state.c are used
only when compiling for particular generations.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Fixes warnings like
warning: implicit conversion from enumeration type 'enum isl_format' to
different enumeration type 'enum GEN10_SURFACE_FORMAT'
[-Wenum-conversion]
.SourceElementFormat = ISL_FORMAT_R32_UINT,
^~~~~~~~~~~~~~~~~~~
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Unless you have data, the compiler knows better than you whether a
function should be inlined.
Unlike all other cases in this series, the removal of the inline keyword
from isl_format_has_channel_type actually changes the resulting binary
with gcc-6.3.0:
text data bss dec hex filename
7831116 346384 420648 8598148 833284 i965_dri.so before
7830716 346384 420648 8597748 8330f4 i965_dri.so after
I think this is likely an improvement. No difference in the resulting
binary with clang-4.0.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The functions we're marking as UNUSED in genX_pipeline.c are used only
when compiling for particular generations.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Fixes warnings like
warning: implicit conversion from enumeration type 'enum isl_format' to
different enumeration type 'enum GEN10_SURFACE_FORMAT'
[-Wenum-conversion]
.SourceElementFormat = ISL_FORMAT_R32_UINT,
^~~~~~~~~~~~~~~~~~~
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Unless you have data, the compiler knows better than you whether a
function should be inlined.
No difference in the resulting binary with gcc-6.3.0 or clang-4.0.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Avoids Clang's warning about the current code:
warning: suggest braces around initialization of subobject
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
brw_surface_formats.c and genX_blorp_exec.c do this a lot, causing lots
of warnings from clang.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The functions we're marking as UNUSED in genX_state_upload.c are used
only when compiling for particular generations.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Fixes warnings like
warning: implicit conversion from enumeration type 'enum isl_format' to
different enumeration type 'enum GEN10_SURFACE_FORMAT'
[-Wenum-conversion]
.SourceElementFormat = ISL_FORMAT_R32_UINT,
^~~~~~~~~~~~~~~~~~~
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Clang doesn't realize that 0 and 1 are the only possibilities, a thinks
lots of variables might be uninitialized.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
brw_texture_view_sane() is only used by an assert()...
No difference in the resulting binary with gcc-6.3.0 or clang-4.0.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Clang warns:
warning: absolute value function 'fabsf' given an argument of type
'const float64_t' (aka 'const double') but has parameter of type 'float'
which may cause truncation of value [-Wabsolute-value]
float64_t dst = bit_size == 64 ? fabs(src0) : fabsf(src0);
The type of the ternary expression will be the common type of fabs() and
fabsf(): double. So fabsf(src0) will be implicitly converted to double.
We may as well just convert src0 to double before a call to fabs() and
remove the needless complexity, à la
float64_t dst = fabs(src0);
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Clang has "-Wno-initializer-overrides", while gcc has
"-Wno-override-init". Quiets a lot of warnings with clang.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This makes it a lot clearer what's happening (at least I think so), and
will make future additions much simpler.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Otherwise eglCreateWaylandBufferFromImageWL will fail, since we
have no "supported" format.
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Make the code a bit easier to follow. There should be no functional
change since none of the bits set are accessible until the
eglCreateWindowSurface call is complete.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
The dimensions are already set [to 0 or the value provided by the
attributes list] by the _eglInitSurface() call further up.
The values are updated, as the DRI driver calls the DRI2/IMAGE_LOADER'
get_buffers, shortly before making use of the values.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
For most/all cases today, we have wl_drm available alongside wl_dmabuf.
Yet in the long run, we want to make sure the latter can operate without
any traces of the former.
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
The wl_drm wrapper is created before the wl display/surface ones.
Thus make sure we destroy it after them. In reality it should not make
any difference either way.
Fixes: 03dd9a88b0 ("egl/wayland: Use per-surface event queues")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
If the specific initialize was successfull, dri2_egl_display() will
return a non NULL pointer. Thus we can drop the check and flatten the
codeflow.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Daniel Stone <daniels@collabora.com>
In order to implement VK_KHR_external_fence, we need to back our fences
with something that's shareable. Since the kernel wait interface for
sync objects already supports waiting for multiple fences in one go, it
makes anv_WaitForFences much simpler if we only have one type of fence.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is just a refactor, similar to what we did for semaphores, in
preparation for handling VK_KHR_external_fence.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This commit changes fences to work a bit more like BO semaphores.
Instead of the fence being a batch, it's simply a BO that gets added
to the validation list for the last execbuf call in the QueueSubmit
operation. It's a bit annoying finding the last submit in the execbuf
but this allows us to avoid the dummy execbuf.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We didn't allow them before because it didn't look like the spec allowed
it. It certainly doesn't make much sense. However, there are CTS tests
that apparently hit this. What the spec actually says is:
"Importing a payload using handle types with copy transference
creates a duplicate copy of the payload at the time of import, but
makes no further reference to it. Fence signaling, waiting, and
resetting operations performed on the target of copy imports must
not affect any other fence or payload."
A SYNC_FD has copy transference but the import may be temporary or
permanent. If you do a permanent import of something with copy
transference, I guess it's supposed to work and end up resetting the
permanent state. In any case, there seems to be no real harm in
allowing it, so why not.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
location is never set to INTERP_SAMPLE, and Nicolai comments:
"... that part is misleading. location refers to the base location, not
the final location of the sample, and it can never be INTERP_SAMPLE."
Suggested-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
These are likely false positives, but are also annoying because they
show up on every "make install", which causes ac_nir_to_llvm to be
rebuilt here. Initializing those variables to NULL should be harmless
even when unnecessary.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
amd/common/ and amd/vulkan/ are using tabs for indent, which doesn't
match the settings in root .editorconfig, so let's override.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The discard range codepath takes precedence, so if we get both
unsynchronized and discard_range, choose unsynchronized.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
A previous expression presents same as TGSI_SEMANTIC_SUBGROUP_GT_MASK.
It fixes a direction of an inequality for TGSI_SEMANTIC_SUBGROUP_LT_MASK.
before:
bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION
after:
bit index < TGSI_SEMANTIC_SUBGROUP_INVOCATION
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will allow to propagate VK_ERROR_OUT_OF_HOST_MEMORY to
vkEndCommandBuffer() when necessary.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In ef42423e7b I enabled the check for release builds however we
still want to assert in debug builds in case of collisions or
just general bugs with the key building/compare code. Otherwise
it will just fail silently effectively disabling the cache.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
This reverts commit fc99cb3c9e.
"The performance went down from 64.7 to 51.4 fps in Valley and from 30.8 to
25.1 fps in Heaven on Radeon HD 7970. Other games seem to have also a 10-25%
performance decrease."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102429
It looks like we can't use the raster config values from the kernel.
One could easily introduce version 3 of the DRI2fenceExtension,
extending the struct, while not implementing the above function.
Thus we'll end up with NULL pointer, and dereferencing it won't fare
too well.
Fixes: 0201f01dc4 ("egl: add EGL_ANDROID_native_fence_sync")
Cc: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The scripts are invoked with the correct version of python and are
missing the execute bit.
Follow the rest of Mesa and drop the shebang line.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Follow the example used through mesa and use "..." + "__VA_ARGS__".
The former tends to be more common and portable.
v2: use ##__VA_ARGS__ (Eric)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
In two places we called pipe_resource_reference() to remove a reference
to a vertex buffer resource. But we neglected to check if the buffer was
a user buffer and not a pipe_resource. This caused us to pass an invalid
pipe_resource pointer to pipe_resource_reference().
Instead of calling pipe_resource_reference(&vbuf->resource, NULL), use
pipe_vertex_buffer_unreference(&vbuf) which checks the is_user_buffer
field and does the right thing.
Also, explicity set the is_user_buffer field to false after setting the
vbuf->resource pointer to out_buffer.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102377
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Bruce Cherniak <bruce.cherniak@intel.com>
If we merge a mapping with the mapping before it, we also need
to not only change the offset, but also the bo offset.
Fixes: 715df30a4e "radv/amdgpu: Add winsys implementation of virtual buffers."
Reviewed-by: Dave Airlie <airlied@redhat.com>
We don't use the render path so totally unneeded.
Fixes: 19be95f71e "radv: add subpass resolve compute path"
Reviewed-by: Dave Airlie <airlied@redhat.com>
The snprintf stuff here already constructs the right name for the device
node, and if it doesn't, you configured Mesa wrong, don't do that.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
We were using brw->gen, brw->is_haswell, and devinfo->gen in a few
places, when we could just use GEN_GEN and GEN_IS_HASWELL, which are
evaluated at compile time.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
The PRM SKL-Vol 2b-05.16 says:
"Within a VERTEX_ELEMENT_STATE structure, if a Component Control
field is set to something other than VFCOMP_STORE_SRC, no
higher-numbered Component Control fields may be set to
VFCOMP_STORE_SRC. In other words, only trailing components can be set
to something other than VFCOMP_STORE_SRC."
Since we set the component 1 to VFCOMP_STORE_0 on gen8+, and
VFCOMP_STORE_IID on gen5+, and we are not using components 2 and 3,
let's also set them to VFCOMP_STORE_0.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Semantically identical to the EXT version (whose string is still valid
for GLES), so rename the bit but expose both extension strings.
(Suggested by Ilia Mirkin and Ian Romanick.)
v3: Fix the entrypoint alias in GL4x.xml (Ilia)
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The only difference from the EXT version is bumping the minmax to 16, so
just hit all the drivers at once.
v2: Fix driver names, add to 17.3 release notes (Ilia Mirkin)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
An allocation check is already done when the buffer is created at
context creation.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
All callers already check that, and the common behaviour is to
check in the _mesa_new_XXX() helpers anyway.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
In get_back_bo, we use wl_display_dispatch_queue() to block and wait for
a buffer release event. However, not all Wayland compositors flush the
client socket on posting a buffer-release event, so by only blocking
client-side, we may block indefinitely, or at least need to wait for an
input event / frame completion to arrive for the compositor to flush.
We now use dispatch_queue as a first pass, but if our entire buffer pool
is exhausted, use a roundtrip (an immediately-triggered wl_callback) to
ensure that the compositor flushes out our release event immediately.
[daniels: Modified comment and commit message.]
Signed-off-by: Kai Chen <kai.chen@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
CC: <mesa-stable@lists.freedesktop.org>
Found by address sanitizer:
==22621==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61400000cbd8 at pc 0x7f561610a4ff bp 0x7ffca85f9d50 sp 0x7ffca85f94f8
READ of size 344 at 0x61400000cbd8 thread T0
#0 0x7f561610a4fe (/usr/lib/x86_64-linux-gnu/libasan.so.3+0x5f4fe)
#1 0x7f560bb305a5 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53
#2 0x7f560bb305a5 in blob_write_bytes ../../../mesa-src/src/compiler/glsl/blob.c:136
#3 0x7f560be7d7ff in encode_type_to_blob ../../../mesa-src/src/compiler/glsl/shader_cache.cpp:153
#4 0x7f560be81222 in write_program_resource_data ../../../mesa-src/src/compiler/glsl/shader_cache.cpp:950
#5 0x7f560be81222 in write_program_resource_list ../../../mesa-src/src/compiler/glsl/shader_cache.cpp:1118
#6 0x7f560be81222 in shader_cache_write_program_metadata(gl_context*, gl_shader_program*) ../../../mesa-src/src/compiler/glsl/shader_cache.cpp:1407
#7 0x7f560b825fdb in link_program ../../../mesa-src/src/mesa/main/shaderapi.c:1163
Fixes: 073a84ff60 ("glsl: stop adding pointers from glsl_struct_field to the cache")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
If we're rendering to a format without alpha, convert DST_ALPHA blend to
a ONE so that factors are properly computed. This same workaround is
done on a3xx+ as well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This will be used to store more information about the cache item
in it's header. This information is intended for 3rd party and
cache analysis use but can also be used for detecting the unlikely
scenario of cache collisions.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Steam is already analysing cache items, unfortunatly we did not
introduce a versioning mechanism for identifying structural changes
to cache entries earlier so the only way to do so is to rename the
cache directory.
Since we are renaming it we take the opportunity to give the directory
a more meaningful name.
Adding a version field to the header of cache entries will help us to
avoid having to rename the directory in future. Please note this is
versioning for the internal structure of the entries as defined in
disk_cache.{c,h} as opposed to the structure of the data provided to
the disk cache by the GLSL compiler and the various driver backends.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Recording secondaries with no framebuffer attachment may
make this happen, though this might not be the complete solution.
(esp if someone does meta stuff in there, would we have to
save things, not sure).
Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When I added gfx9 I did it wrong, this fixes it.
Fixes: 5247b311e9 "radv/gfx9: fix set predication packet."
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Right now, OpenGL uses the GLSL lowering for shared variables and anv
uses NIR to lower them. For a long time, we've done this weird thing
where we do the NIR lowering unconditionally and then add the SLM sizes
from the two together. This works because one of them will always be 0
but it's a bit sketchy. Let's just move the NIR-based lowering into
anv_pipeline and get rid of the sketch.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Similar to e09d04cd56 "radeonsi: use util_strchrnul() to fix android build error"
Android Bionic does not support strchrnul() string function,
gallium auxiliary util/u_string.h provides util_strchrnul()
This change avoids the following warning and error:
external/mesa/src/amd/common/ac_debug.c:501:15: warning: implicit declaration of function 'strchrnul' is invalid in C99
char *end = strchrnul(out, '\n');
^
external/mesa/src/amd/common/ac_debug.c:501:9: error: incompatible integer to pointer conversion initializing 'char *' with an expression of type 'int'
char *end = strchrnul(out, '\n');
^ ~~~~~~~~~~~~~~~~~~~~
1 warning and 1 error generated.
Fixes: c2c3912410 "ac/debug: annotate IB dumps with the raw values"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
With the release of O, the MESA_ANDROID_MAJOR_VERSION has changed to 8.
Change the LLVM check to match. There's no point to continue to support 'O'
as no one is going to use an old AOSP master.
Presumably, we'll be back here again to fix things again for P (or 9).
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Signed-off-by: Rob Herring <robh@kernel.org>
Taken from c21e602b9fda1d3bbaecb08194592f67e6a0649b from
OpenGL-Registry. (This time without breaking glext.h.)
Signed-off-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
It uses an user SGPR to pass the view index to the shaders, except
for the fragment shader where we use layer=view (which comes in
handy when we want to do the NV ext that allows us to execute pre-FS
stages once instead of per view).
Reviewed-by: Dave Airlie <airlied@redhat.com>
To use when we have e.g. input attachments, but there is no layer
export in the previous shader and hence no layered rendering.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The int32->float semantic conversion got dropped in a testcase,
because the src was already float. On closer inspection I decided
to add a few more casts for integer op operands to be safe too.
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Taken from c21e602b9fda1d3bbaecb08194592f67e6a0649b from
OpenGL-Registry.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Gets rid of a few warnings of the form:
src/mesa/drivers/dri/i965/intel_screen.c:918:49: warning: passing argument 2 of ‘modifier_is_supported’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
!modifier_is_supported(&screen->devinfo, f, 0, modifier))
^
src/mesa/drivers/dri/i965/intel_screen.c:301:1: note: expected ‘struct intel_image_format *’ but argument is of type ‘const struct intel_image_format *’
Fixes: 1efd73df39 "i965: Advertise the CCS modifier"
Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Depending on which extension or GL spec you read the behavior of
glVertexAttrib(index=0) either sets the current value for generic
attribute 0, or it emits a vertex just like glVertex(). I believe
it should do either, depending on context (see below).
The piglit gl-2.0-vertex-const-attr test declares two vertex attributes:
attribute vec2 vertex;
attribute vec4 attr;
and the GLSL linker assigns "vertex" to location 0 and "attr" to location 1.
The test passes.
But if the declarations were reversed such that "attr" was location 0 and
"vertex" was location 1, the test would fail to draw properly.
The problem is the call to glVertexAttrib(index=0) to set attr's value
was interpreted as glVertex() and did not set generic attribute[0]'s value.
Interesting, calling glVertex() outside glBegin/End (which is effectively
what the piglit test does) does not generate a GL error.
I believe the behavior of glVertexAttrib(index=0) should depend on
whether it's called inside or outside of glBegin/glEnd(). If inside
glBegin/End(), it should act like glVertex(). Else, it should behave
like glVertexAttrib(index > 0). This seems to be what NVIDIA does.
This patch makes two changes:
1. Check if we're inside glBegin/End for glVertexAttrib()
2. Fix the vertex array binding for recalculate_input_bindings(). As it was,
we were using &vbo->currval[VBO_ATTRIB_POS], but that's interpreted
as a zero-stride attribute and doesn't make sense for array drawing.
No Piglit regressions. Fixes updated gl-2.0-vertex-const-attr test and
passes new gl-2.0-vertex-attrib-0 test.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101941
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
All of the coordinates and LOD args are integers for TXF. This mostly
doesn't matter, except for converting into a levelZero=true operation by
removing an explicit zero LOD. For the comparison against zero to work
properly, the sType of the instruction has to be set correctly.
Fixes: KHR-GL45.robust_buffer_access_behavior.texel_fetch
Reported-by: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Useless to do that before checking errors. It's now similar to
the other bind_XXX_buffers() helpers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
In the following patch we will stop writing the pointer to cache.
Unfortunately adding empty strings to that cache seems to be the
only thing we can do here once we no longer have the pointers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This is so we always create reproducible cache entries. Consistency
is required for verification of any third party distributed shaders.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
For gfx9 the addressing for images has changed, so we need to
provide the hw with the level0, however we still need to scale
for format block differences (so our compressed upload paths still
work).
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Avoid passing the vulkan image creation into the image view descriptor
setup. This cleans up the usage of range inside the init, instead
using the properly inited values in the image view.
This is just a cleanup but some future vega changes will depend on it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
GFX9 needs the SX MRT blend registers programmed, port over
the code from radeonsi to workout the values from the blend
state, and program the registers on rbplus systems.
This fixes lots of:
dEQP-VK.pipeline.blend.*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Seems like we actually enabled it already, but did not implement
the shader part. With this patch we do.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Render target surfaces always start at binding table index 0.
This is required for us to use headerless FB writes, which we
really want to do. So, we'll never change that.
Given that, it's not necessary to look up a wm_prog_data field
which we already know contains 0. We can drop the dependency in
brw_renderbuffer_surfaces (Gen4-5)...which was already confusingly
missing from gen6_renderbuffer_surfaces.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We either want the framebuffer dimensions or 1x1x1. Passing fb and
falling back to 1x1x1 lets us shorten some calls.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We don't need yet another set of flags. The function already has access
to both brw and the unit, so it can check brw->draw_aux_buffer_disabled
itself in one line of code. The layered flag was only used to assert
that Gen4-5 doesn't do layered rendering, which isn't that useful.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Also rename it to gen6_update_renderbuffer_surface, as this is the
function for Gen6+. Having functions named "brw_*" and "gen4_*"
is confusing...if we're using gens, let's stick with those.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
BLORP invalidates the binding tables, but it doesn't destroy any of the
existing SURFACE_STATE entries in the statebuffer. We can reuse those.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When changing fast clear colors, we need to emit new SURFACE_STATE
with the updated color at the next draw call.
Most things work today because the atoms that handle SURFACE_STATE
for images (mutable images, textures, render targets) also listen to
BRW_NEW_BLORP, causing us to re-emit these on every BLORP operation.
However, this is overkill - most BLORP operations don't require us
to re-emit SURFACE_STATE.
One case where this is broken today is a fast clear to a different
color followed by a non-coherent framebuffer fetch. The renderbuffer
read atom doesn't listen to BRW_NEW_BLORP, and would not get the new
fast clear color.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
brw_ff_gs.c is about using the geometry shader to implement things
that the fixed function ought to do, but doesn't on old hardware.
Gen7+ does not need this. We should drop the misleading comment
about Gen7 not using geometry shaders.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
All shader stages do the exact same thing, so we don't need the switch
statement, or the redundant FS case. I believe these used to be
different before Tim eliminated the (e.g.) brw_vertex_program
subclasses.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Since encoder only support de-interlaced buffers.
v2: move to parameter call to tell dec/enc
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Only copy this value when in restart drawing mode.
Eliminates valgrind errors when running trivial programs.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
They are only used for debug info.
Together with making tgsi_opcode_info::opcode a bitfield, this reduces
the size of tgsi_opcode_info on 64-bit systems from 24 bytes to 4 bytes,
and makes the whole data structure a bit more linker friendly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
So we can easily re-arrange members of tgsi_opcode_info, and readers of
the code don't have to guess what all the 0s mean.
Mostly done with regex search&replace.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It's not clear why they were ever 2 bits to begin with. Perhaps
the original intent was to use signed values, but that doesn't
seem to have ever been the case in master.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Various index-related fields are only initialized when required, so
they should only be dumped in those cases.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When assertions were disabled, the compiler removed
the call to util_idalloc_alloc() and the first allocated
bindless slot was 0 which is invalid per the spec.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Pass the dri.sym version script to the linker. This ensures only
explicitly exported symbols are exported and shrinks the library by up
to 60KB.
HAVE_DLADDR also needs to be set so that __driDriverExtensions is defined.
We need to pass "--undefined-version" because the Android build system
sets --no-undefined-version by default and we get an error on
driver specific symbols if those drivers are disabled without the option.
Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Probably harmless, but will overwrite errno with a failure status
code. Reported by coverity.
CID 1416600: Argument cannot be negative (NEGATIVE_RETURNS)
Fixes: 5c4e4932e0 (anv: Implement support for exporting semaphores as FENCE_FD)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The anv_execbuf_add_bo() call can actually fail in practice, which
should cause the QueueSubmit operation to fail. Reported by Coverity.
CID: 1416606: Unchecked return value (CHECKED_RETURN)
Fixes: 017cdb10cf (anv: Submit a dummy batch when only semaphores are provided.)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We want the type of the field, not of the struct.
This fixes a regression in the following piglit test:
arb_bindless_texture/compiler/images/arrays-of-struct.frag
Fixes: 49d9286a3f ("glsl: stop copying struct and interface member names")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
+ 4 piglit regressions, but it's correct accorcing to the GL spec and
performance is more important than piglit.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
- don't precompile LS and ES (they don't exist on GFX9), compile as VS instead
- don't precompile HS and GS (we don't have LS and ES parts)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
platform_drm, platform_wayland and platform_android have similiar local buffer
allocation routines. For deduplicating, it unifies dri2_egl_surface's
local buffer allocation routines. And it polishes inconsistent indentations.
Note that as dri2_wl_get_buffers_with_format() have not make a __DRI_BUFFER_BACK_LEFT
attachment buffer for local_buffers, new helper function, dri2_egl_surface_free_local_buffers(),
will drop the __DRI_BUFFER_BACK_LEFT check.
So if other platforms use new helper functions, we have to ensure not to make
__DRI_BUFFER_BACK_LEFT attachment buffer for local_buffers.
v2: Fixes from Emil's review:
a) Make local_buffers variable, dri2_egl_surface_alloc_local_buffer() and
dri2_egl_surface_free_local_buffers() unconditionally.
b) Preserve the original codeflow for error_path and normal_path.
c) Add note on commit messages for dropping of __DRI_BUFFER_BACK_LEFT check.
c) Rollback the unrelated whitespace changes.
d) Add a missing blank line.
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
They should not be exposed when the extension is unsupported.
Note that ARB_direct_state_access is always exposed and
EXT_semaphore is not supported at all.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
From the EXT_external_objects_fd spec:
"If the GL_EXT_memory_object_fd string is reported, the following
commands are added:
void ImportMemoryFdEXT(uint memory,
uint64 size,
enum handleType,
int fd);"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Currently, when the array is full it is resized but it can grow
over and over because we don't try to re-use descriptor slots.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Using VRAM address as bindless handles is not a good idea because
we have to use LLVMIntToPTr and the LLVM CSE pass can't optimize
because it has no information about the pointer.
Instead, use slots indexes like the existing descriptors. Note
that we use fixed 16-dword slots for both samplers and images.
This doesn't really matter because no real apps use image handles.
This improves performance with DOW3 by +7%.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Looks like it's useless to initialize that field when CE is
unused. This will also allow to declare more than 64 elements
for the array of bindless descriptors.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The number of bindless descriptors is dynamic and we definitely
have to support more than 256 slots.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
A new pair of user SGPR is needed for loading the bindless
descriptors from shaders. Because the descriptors are global for
all stages, there is no need to add separate indices for GFX9.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Use the auto logger facility, so that CS chunks will be interleaved
with other log info.
v2:
- fix some crashes when not using CE
- fix skipping "previous" chunks of current (unflushed) IB
- fix error handling in si_begin_cs_debug
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We'll add radeonsi-specific code to set_log_context in later patches,
but we may want to log from common code. Hence keep the log pointer
in r600_common_context.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
copy_constant_to_storage, set_uniform_initializer,
populate_consumer_input_sets, and get_matching_input are all used by
tests in src/compiler/glsl/tests:
glsl/tests/varyings_test.o: In function `link_varyings_single_simple_input_Test::TestBody()':
src/compiler/glsl/tests/varyings_test.cpp:131: undefined reference to `linker::populate_consumer_input_sets(void*, exec_list*, hash_table*, hash_table*, ir_variable**)'
glsl/tests/varyings_test.o: In function `link_varyings_gl_ClipDistance_Test::TestBody()':
src/compiler/glsl/tests/varyings_test.cpp:159: undefined reference to `linker::populate_consumer_input_sets(void*, exec_list*, hash_table*, hash_table*, ir_variable**)'
glsl/tests/varyings_test.o: In function `link_varyings_gl_CullDistance_Test::TestBody()':
src/compiler/glsl/tests/varyings_test.cpp:186: undefined reference to `linker::populate_consumer_input_sets(void*, exec_list*, hash_table*, hash_table*, ir_variable**)'
glsl/tests/varyings_test.o: In function `link_varyings_single_interface_input_Test::TestBody()':
src/compiler/glsl/tests/varyings_test.cpp:208: undefined reference to `linker::populate_consumer_input_sets(void*, exec_list*, hash_table*, hash_table*, ir_variable**)'
glsl/tests/varyings_test.o: In function `link_varyings_one_interface_and_one_simple_input_Test::TestBody()':
src/compiler/glsl/tests/varyings_test.cpp:241: undefined reference to `linker::populate_consumer_input_sets(void*, exec_list*, hash_table*, hash_table*, ir_variable**)'
glsl/tests/varyings_test.o:src/compiler/glsl/tests/varyings_test.cpp:272: more undefined references to `linker::populate_consumer_input_sets(void*, exec_list*, hash_table*, hash_table*, ir_variable**)' follow
glsl/tests/varyings_test.o: In function `link_varyings_interface_field_doesnt_match_noninterface_Test::TestBody()':
src/compiler/glsl/tests/varyings_test.cpp:289: undefined reference to `linker::get_matching_input(void*, ir_variable const*, hash_table*, hash_table*, ir_variable**)'
glsl/tests/varyings_test.o: In function `link_varyings_interface_field_doesnt_match_noninterface_vice_versa_Test::TestBody()':
src/compiler/glsl/tests/varyings_test.cpp:314: undefined reference to `linker::populate_consumer_input_sets(void*, exec_list*, hash_table*, hash_table*, ir_variable**)'
src/compiler/glsl/tests/varyings_test.cpp:328: undefined reference to `linker::get_matching_input(void*, ir_variable const*, hash_table*, hash_table*, ir_variable**)'
Fixes: ca73c3358c ("glsl: Mark functions static")
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In f9fd976e8a we changed the clear value to be stored as an
isl_color_value. This had the side-effect same clear value check is now
happening directly between the f32[0] field of the isl_color_value and
ctx->Depth.Clear. This isn't what we want for two reasons. One is that
the comparison happens in floating point even for Z16 and Z24 formats.
Worse than that, ctx->Depth.Clear is a double so, even for 32-bit float
formats, we were comparing as doubles and not floats. This means that
the test basically always fails for anything other than 0.0f and 1.0f.
This caused a slight performance regression in Lightsmark 2008 because
it was using a depth clear value of 0.999 which can't be stored in a
32-bit float so we were doing unneeded resolves.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/101678
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Here we also make use of the UseSTD430AsDefaultPacking constant
and call the new get_internal_ifc_packing() helper.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will be used to enable the STD430 layout as the default for
UBOs and SSBOs with layouts of shared/packed rather than STD140.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The CL CTS queries CL_DEVICE_MEM_BASE_ADDR_ALIGN for a device and
then allocates user pointers aligned to that value for its tests.
The minimum value is defined as:
the size (in bits) of the largest OpenCL built-in data type supported
by the device (long16 in FULL profile, long16 or int16 in EMBEDDED
profile) for devices that are not of type CL_DEVICE_TYPE_CUSTOM.
At the moment, all known devices that support user pointers require
CPU page alignment for buffers created from user pointers, so just
query that from sysconf.
v3: Use std::max instead of MAX2 (Francisco)
Add missing unistd include
v2: Use system page size instead of a new pipe cap
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by (v2): Jan Vesely <jan.vesely@rutgers.edu>
After the context is initialized, the API and context flags won't
change. So, we can compute whether vertex attribute 0 aliases
vertex position just once.
This should make the glVertexAttrib*() functions a little quicker.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is an unoffical unmaintained driver, we don't really want
people wasting effort trying to improve it.
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This code was separated from the validation code so it could
use used with KHR_no_error paths. The return values were inverted
to reflect the name of the helper, but here the condtion was
mistakenly inverted rather than the return value.
Fixes: 4df2931a87 (mesa/vbo: move some Draw checks out of validation)
Reported-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The INTEL_performance_query spec says
"Performance counter id 0 is reserved as an invalid counter."
GLuint counterid_to_index(GLuint counterid) just returns counterid - 1,
so with unsigned overflow rules, it will generate 0xFFFFFFFF given an
input of 0. 0xFFFFFFFF will trigger the counterIndex >= queryNumCounters
check, so the code worked as is. It just contained a useless comparison.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously clang would warn about redefinition of typedef EGLDisplay. Avoid
this by adding preprocessor guards to mesa_glinterop.h and including it
after EGL.h is indirectly included.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
brw_hw_type_to_reg_type() needs to know only whether the file is
BRW_IMMEDIATE_VALUE or not, which is not a valid file for the
destination. gcc and clang will evaluate __builtin_strcmp() at compile
time, so we can use it to pass a constant file for the destination.
text data bss dec hex filename
7816214 346248 420496 8582958 82f72e i965_dri.so before
7816070 346248 420496 8582814 82f69e i965_dri.so after
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
text data bss dec hex filename
7816886 346248 420496 8583630 82f9ce i965_dri.so before
7816214 346248 420496 8582958 82f72e i965_dri.so after
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Previously the brw_inst{,_set}_{dst,src0,src1}_reg_type() functions
provided access to the hardware encodings for the register types. We
often mixed these with the logical BRW_REGISTER_TYPE_* enums (which
themselves used to be the hardware format!) with bad results.
With that functionality now available with the hw_ versions (see
previous commit), we now add functions that take the logical
BRW_REGISTER_TYPE_* enums and convert into the hardware format and vice
versa. To do the conversion we also have to provide the file.
Note the asymmetry between the two functions: the new getter reads the
file from the instruction word, and to ensure that is always set the
setter writes both the file and the type.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
I'm going to encapsulate all of the logic dealing with register types in
this file.
Rename the parameters for the hardware encodings from type -> hw_type at
the same time.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
After the last patch converted things into enums, I helpfully got a
compiler warning about these missing from the switch statement.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
The hardware encodings often mean different things depending on whether
the source is an immediate.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
These vaguely corresponded to the hardware encodings, but that is purely
historical at this point. Reorder them so we stop making things "almost
work" when mixing enums.
The ordering has been closen so that no enum value is the same as a
compatible hardware encoding.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
UB and B type encodings are the same as UV and VF. Noticed when writing
the following patch.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
The destination stride must be equivalent to a dword if VF is used.
Also, since the only compaction table entires with "i:vf" have the
destination as "r:f" specifically check that the destination is of type
float.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Note that there's no point in testing on G45, since its compaction is
the same as Gen5. Same logic applies to Gen7 variants and low-power
parts.
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Both statically linking libLLVMCore and dynamically linking libLLVM causes
duplicated symbols in gallium_dri.so and it fails to dlopen. We don't
really need to link libLLVMCore, but just need generated headers to be
built first. Dynamically linking to libLLVM instead is enough to do
that. Thanks to Qiang Yu for finding the root cause.
With this change, we can align all versions and just have libLLVM as a
shared lib dependency.
This also requires changes in the M and N versions of LLVM to export the
include paths for libLLVM. AOSP master is okay.
Fixes: 26aee6f4d5 ("Android: rework LLVM build support")
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Qiang Yu <Qiang.Yu@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
To find if the format is supported YUYV by sampling factor which
is embedded from bitstream. So we could use this info for buffer
reallocation on the correct format.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
So we have to detect it for reallocation of de-interlaced buffers
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The current tier 1 mjpeg firmware only supports at the bitstream
level, the later tier 2 support will be at the buffers level with
newer hardware.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
There is no need of dpb buffer for mjpeg codec
v2: check dpb_size instead of format
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
"Alloc for render" is a terrible name for a flag because it means
basically nothing. What the flag really does is allocate a busy BO
which someone theorized at one point in time would be more efficient if
you're planning to immediately render to it. If the flag really means
"alloc a busy BO" we should just call it that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In 76e2f390f9, when Topi switched num_samples from 0 to 1 for
single-sampled, he accidentally switched the last parameter in the call
to miptree_create_for_teximage from 0 to 1 thinking it was num_samples
when it was actually layout_flags. Switching from 0 to 1 added the
MIPTREE_LAYOUT_ACCELERATED_UPLOAD flag which causes us to allocate a
busy BO instead of an idle one. This caused the subsequent CPU upload
to consistently stall. The end result was a 15% performance drop in the
SynMark v7 DrvRes microbenchmark. This restores the old behavior and
fixes the performance regression.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fixes: 76e2f390f9
Bugzilla: https://bugs.freedesktop.org/102260
Cc: mesa-stable@lists.freedesktop.org
We handle the Sandybridge multisampled 2D surface hack here, rather
than in ISL, because it requires allocating a BO, and is kind of messy.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
ISL already offers functions to fill out most kinds of SURFACE_STATE,
so why not handle null surfaces too?
Null surfaces are simple, so we can just take the dimensions, rather
than an entirte fill structure.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This little optimization improves the performance of SynMark v7
TexFilterTri by almost 10% on Sky Lake GT4 among other improvements.
We've been doing it for some time but somehow it got dropped during
the miptree refactoring.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/102258
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Looking at NewDriverState is not safe in general. The state atom system
is set up to ensure that new bits that get added to NewDriverState get
accumulated into the set of bits used when emitting atoms but it doesn't
go the other way. If we read NewDriverState, we may not get the full
picture because the per-pipeline state (3D or compute) does not get
added to NewDriverState before state emit is done. It's especially
dangerous to do this from BLORP (either explicitly or implicitly when
BLORP calls gen7_upload_urb) because that does not happen during one of
the normal state upload paths.
This commit solves the problem by whacking all of the per-shader-stage
URB sizes to zero whenever we change the total URB size. We still have
to flag BRW_NEW_URB_SIZE to ensure that the gen7_urb atom triggers but
the actual decision in gen7_upload_urb can now be based entirely on URB
sizes rather than on state atoms. This also makes BLORP correct because
it just asks for a new URB config whenever the vsize is too small and so
any change to the total URB size will trigger blorp to re-emit as well
because 0 < vs_entry_size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Bugzilla: https://bugs.freedesktop.org/102289
Cc: mesa-stable@lists.freedesktop.org
EGLimages are shared with external users, and we don't know what they're
going to do with them. They might scan them out. They might access
them in a way that doesn't work with our explicit clflushing.
It's safest to simply mark them non-coherent.
Chris Wilson caught this problem and wrote a similar (though less
aggressive) patch to solve it; the miptree code has since undergone
a lot of refactoring so I had to rewrite it.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
VC5 introduces packet variants where the same opcode has behavior that is
decided by a sub-id field in the early bits of the packet. Keep iterating
over packets until we find the one with the matching sub-id.
In the vc5 NIR backend, I want to use the XML code-generation to set up
pack/unpack of structs for the texture uniforms, and setting up the
unpacked copy needs a default header.
make[4]: Entering directory '/wip/mesa/build/src/gallium/targets/dri'
CXXLD gallium_dri.la
../../../../src/gallium/auxiliary/pipe-loader/.libs/libpipe_loader_static.a(libpipe_loader_static_la-pipe_loader.o): In function `pipe_loader_get_driinfo_xml':
/mesa/build/src/gallium/auxiliary/pipe-loader/../../../../../src/gallium/auxiliary/pipe-loader/pipe_loader.c:117: undefined reference to `pipe_loader_drm_get_driinfo_xml'
b4ff5e90 uses pipe_loader_get_driinfo_xml() unconditionally in
pipe_loader.c, but it's definition in pipe_loader_get_driinfo_xml() is only
built if HAVE_LIBDRM.
Arrange to always use the default XML if HAVE_LIBDRM isn't defined.
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The driver supported this since way before the GL spec for it existed.
Just need to support both the per-stream and for all streams variants
(which are identical due to only supporting 1 stream).
Passes piglit arb_transform_feedback_overflow_query-basic.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The driver was supposed to support this since way before the GL spec for it
existed, albeit it was apparently broken, so fix and enable it.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
https://bugs.llvm.org/show_bug.cgi?id=6823 still affects current LLVM.
llvm-config --libs only reports the single shared library if LLVM was
built with -DLLVM_LINK_LLVM_DYLIB=ON. llvm-config --shared-mode reports
"shared" in that case, "static" otherwise (even if LLVM was built with
-DLLVM_BUILD_LLVM_DYLIB=ON).
v2: Keep the LLVM < 4.0 test. (llvm-config --shared-mode is actually
available since LLVM 3.8, but that would make the test too
complicated :)
Fixes: 3d8da1f678 ("configure: Trust LLVM >= 4.0 llvm-config --libs
for shared libraries")
Bugzilla: https://bugs.freedesktop.org/102247
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
With GLX_SWAP_COPY_OML and GLX_SWAP_EXCHANGE_OML it may happen in situations
when glXSwapBuffers() is immediately followed by for example another
glXSwapBuffers() or glXCopyBuffers() or back buffer age querying, that we
haven't yet allocated and initialized a new back buffer because there was
no GL rendering in between.
Make sure that we have a back buffer in those situations.
v2: Eliminate the drawable have_back_format member.
v3: Make sure we re-initialize the back even if it exists.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Add support for the exchange swap method. Since we're now forcing a fake front
buffer and we exchange the back and fake front on swaps, we don't need to add
much code.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Eliminate the back-to-fake-front copy by exchanging the previous back buffer
and the fake front buffer. This is a gain except when we need to preserve
the back buffer content but in that case we still typically gain by replacing
a server-side blit by a client side non-flushing blit.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
It's not used anywhere and now that we're about to exchange back- and
fake fronts it doesn't serve a purpose.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Support the GLX_SWAP_COPY_OML method. When this method is requested, we use
the same swapbuffer code path as EGL_BUFFER_PRESERVED.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
EGL uses the force_copy parameter to loader_dri3_swap_buffers_msc() to indicate
that it wants to preserve back buffer contents across a buffer swap.
While the loader then turns off server-side page-flipping there's nothing to
guarantee that a new backbuffer isn't chosen when EGL starts to render again,
and that buffer's content is of course undefined.
So rework the functionality:
If the client supports local blits, allow server-side page flipping and when
a new back is grabbed, if needed, blit the old back's content to the new back.
If the client doesn't support local blits, disallow server-side page-flipping
to avoid a client deadlock and then, when grabbing a new back buffer, sleep
until the old back is idle, which may take a substantial time depending on
swap interval.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The code was relying on us always having a current context for client local
image blit operations. Otherwise the blit would be skipped. However,
glxSwapBuffers, for example, doesn't require a current context and that was a
common problem in the dri1 era. It seems the problem has resurfaced with dri3.
If we don't have a current context when we want to blit, try creating a private
dri context and maintain a context cache of a single context.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
It's not very usable since in the rare, but definitely existing case that
we don't have a current context, it will return NULL.
Presumably it will always be safe to use the dri screen the drawable was
created with for operations on that drawable.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This affects which inputs are marked as used. In a situation where only
the texture instruction uses an input, it might have been ignored as
unused due to input masks.
Affects subtests of KHR-GL45.texture_cube_map_array.sampling
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
I see no evidence that opengl32.dll's wglSwapBuffers calls glFinish.
It looks like Jose removed that dependency years ago, but this hack
remained.
Removing this code also fixes the Piglit sync_api test since commit
eceb671002.
No piglit regressions. No glretrace regressions, per Charmaine.
Fixes VMware bug 1937990.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
LLC platforms are magic in that reads from the CPU are always cache
coherent, or rather GPU writes that bypass LLC do still invalidate the
appropriate cache line.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The vce firmware interface should now be stable, all firmwares with
major version equals to 53 are supported.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig at amd.com>
Improves performance of 3DMark "Ice Storm Unlimited" benchmark
by 1-2% on Apollolake (on Android-IA using clang 3.8.256229).
Change is based on the performance profiling work and results
by Aravindan Muthukumar and Yogesh Marathe.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
Signed-off-by: Yogesh Marathe <yogesh.marathe@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We continue in the code to do some more things with the rhs, including
setting a constant initializer. If the type is wrong, this causes some
confusion down the line, leading to assertions. This makes sure that the
rhs processing continues to flow as-if the type was correct to start
with (even though the state has been marked as an error state).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101766
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Vulkan allows you to do a submit whose only job is to wait on and
trigger semaphores. The easiest way for us to support that right
now is to insert a dummy execbuf.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This patch adds an implementation based on DRM BOs. We don't actually
advertise the extension yet because we want to add a couple more paths
first.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
From CL 2.0 Section 5.11 (Event Objects):
clSetEventCallback returns CL_SUCCESS if the function is executed successfully. Otherwise, it
returns one of the following errors:
...
CL_INVALID_VALUE if pfn_event_notify is NULL or if command_exec_callback_type is
not CL_SUBMITTED , CL_RUNNING or CL_COMPLETE .
Fixes: OpenCL CTS test_conformance/events/test_events callbacks
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Changed all register and instruction names, works the same.
v2: Rebase on build system changes (by anholt)
v3: Fix build on clang (by anholt, reported by Rob)
Signed-off-by: Jonas Pfeil <pfeiljonas@gmx.de>
Tested-by: Rob Herring <robh@kernel.org>
If you don't pass this, the compiler refuses to compile the assembly for
pre-v7 CPUs. This also keeps us from building identical, non-NEON code on
aarch64 and x86.
Fixes: a373f77662 ("vc4: Use a wrapper file to set VC4_BUILD_NEON instead of CFLAGS.")
v2: Fix Android build by just appending NEON_C_SOURCES when
ARCH_ARM_HAVE_NEON.
Tested-by: Rob Herring <robh@kernel.org>
I've been trying to get away without these conditionals in vc4's NEON
code, but it meant compiling extra unused code on x86, and build failing
on ARMv6.
v2: Use the _arm/_arm64 flags to simplify detection (suggested by Rob),
but hide the _arm version under ARCH_ARM_HAVE_NEON to keep from trying
to build this stuff for armv5te.
Tested-by: Rob Herring <robh@kernel.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We need to link librt for u_thread.h's clock_gettime() call.
Fixes: b822d9dd67 ("gallium/util: move u_queue.{c,h} to src/util")
Reviewed-by: Matt Turner <mattst88@gmail.com>
BLEND_STATE packing was modified to be variable-length in:
9670124e31 genxml: Make BLEND_STATE command support variable length array.
The initial gen10.xml still had the old, fixed-length style
definition for BLEND_STATE. So gen10_upload_blend_state would
overwrite the packed BLEND_STATE_ENTRYs with its own fixed array
of all-zero entries when packing BLEND_STATE. This caused
BLEND_STATE upload to not work at all.
Fixes: aa416f515a ("i965/genxml: Add gen10.xml")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Gallium drivers use this code path so we need to account for
bindless after all.
Fixes: 365d34540f ("mesa: correctly calculate the storage offset for i915")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For memobj imports we weren't setting the surface to 0, which
meant sometimes we'd end up with tile_swizzle garbage, which
would corrupt rendering.
This seems to fix the image corruption on the imported memory
objects in vrdashboard for me.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When generating the storage offset for struct members we need
to skip opaque types as they no longer have backing storage.
Fixes: fcbb93e860 ("mesa: stop assigning unused storage for non-bindless opaque types")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101983
Reviewed-by: Dave Airlie <airlied@redhat.com>
When generating the storage offset for struct members we need
to skip opaque types as they no longer have backing storage.
Fixes: fcbb93e860 ("mesa: stop assigning unused storage for non-bindless opaque types")
V2: simplify since bindless will never be supported in this code
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101983
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Rename modifier to be more smart (Jason)
FINISHME: Use the kernel's final choice for the fb modifier
bwidawsk@norris2:~/intel-gfx/kmscube (modifiers $) ~/scripts/measure_bandwidth.sh ./kmscube none
Read bandwidth: 603.91 MiB/s
Write bandwidth: 615.28 MiB/s
bwidawsk@norris2:~/intel-gfx/kmscube (modifiers $) ~/scripts/measure_bandwidth.sh ./kmscube ytile
Read bandwidth: 571.13 MiB/s
Write bandwidth: 555.51 MiB/s
bwidawsk@norris2:~/intel-gfx/kmscube (modifiers $) ~/scripts/measure_bandwidth.sh ./kmscube ccs
Read bandwidth: 259.34 MiB/s
Write bandwidth: 337.83 MiB/s
v2: Move all references to the new fourcc code(s) to this patch.
v3: Rebase, remove Yf_CCS (Daniel)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Instead of always doing a full resolve, only resolve the bits that are
needed. This means that we only do a partial resolve when the miptree
modifier is I915_FORMAT_MOD_Y_TILED_CCS.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
v2: move is_aux into if block. (Jason)
Use else block instead of goto (Jason)
v3: Fix up logic for is_aux (Ben)
Fix up size calculations and add FIXME (Ben)
v4 (Jason Ekstrand):
Use the aux_pitch in the image instead of calculating it
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This code will disable actually creating these buffers for the scanout,
but it puts the allocation in place.
Primarily this patch is split out for review, it can be squashed in
later if preferred.
v2:
assert(mt->offset == 0) in ccs creation (as requested by Topi)
Remove bogus is_scanout check in miptree_release
v3:
Remove is_scanout assert in intel_miptree_create. It doesn't work with
latest codebase - not sure it ever should have worked.
v4:
assert(mt->last_level == 0) and assert(mt->first_level == 0) in ccs setup
(Topi)
v5 (Jason Ekstrand):
- Base the decision to allocate a CCS on the image modifier
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Previously images did not support any auxiliary compression surfaces
(CCS, MCS, or HiZ). That's about to change. This patch just adds the
fields to __DRIimageRec to make auxiliary surfaces possible.
v2 (Jason Ekstrand):
- Add an aux_pitch parameter as well as aux_offset
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
intel_miptree_texture_aux_usage() takes an isl_format, but we are
passing a mesa_format. clang warns:
brw_blorp.c:305:52: warning: implicit conversion from enumeration
type 'mesa_format' to different enumeration type
'enum isl_format' [-Wenum-conversion]
intel_miptree_texture_aux_usage(brw, src_mt, src_format);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~
Fixes: fc1639e46d ("i965/blorp: Use texture/render_aux_usage for blits")
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The picture_id was assumed to be a frame number so in 0-31.
But the vaapi client gstreamer-vaapi uses the surfaces handles
as identifier which are unsigned int.
This bug can happen when using a lot of vaapi surfaces within
the same process. Indeed Mesa/st/va increments a counter for the
surface ID: mesa/util/u_handle_table.c::handle_table_add which
starts from 0 and incremented by 1 at each call.
So creating more than 32 surfaces was a problem.
The following bug contains a test that reproduces the problem
by running a couple of vaapih264enc in the same process. The
above also explains why there was no pb when running them in
separated processes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102006
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Tested-by: Tomas Rataj <rataj28@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-and-tested-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
No need to manually look for the library files anymore with current
LLVM. This sidesteps the manual method failing when LLVM was built with
-DLLVM_APPEND_VC_REV=ON.
(This might already work with older versions of LLVM)
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Since we don't iterate to a fixed point, we can end up in situations
where we have a SAT instruction + a long immediate. This is not legal.
However since it's immediately computable, just run unary straight away
to handle the situation.
Fixes: 24a799ad35 ("nv50/ir: fix ConstantFolding with saturation")
Reported-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
While technically correct, this can lead to e.g. getImmediate assuming
that it can walk up the value chain. It could be fixed to not do this,
but it seems easier and less error-prone to just not link the two values
to save on one LValue object.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
otherwise there is corruption in most apps.
Fixes: 0fe0320 radeonsi: use optimal packet order when doing a pipeline sync
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This fixes corrupted shadows in Unigine Valley.
The corruption disappeared when I stopped setting IMG_DATA_FORMAT_24_8
for depth.
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When using dmabuf import, make sure that the modifier is actually
allowed to add planes to the base format, as implied by the comment.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Mesa will map user defined vertex input attributes to slots
starting at VERT_ATTRIB_GENERIC0 which gives us room for only 16
slots (up to GL_VERT_ATTRIB_MAX). This sufficient for GL, where
we expose exactly 16 vertex attributes for user defined inputs, but
in Vulkan we can expose up to 28 (which are also mapped from
VERT_ATTRIB_GENERIC0 onwards) so we need to account for this when
we scope the size of the array of attribute workaround flags
that is used during the brw_vertex_workarounds NIR pass. This
prevents out-of-bounds accesses in that array for NIR shaders
that use more than 16 vertex input attributes.
Fixes:
dEQP-VK.pipeline.vertex_input.max_attributes.*
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The cloning was introduced in f81ede4699 to fix a problem with
shaders including IR that was owned by builtins.
However the approach of cloning the whole function each time we
reference a builtin lead to a significant reduction in the GLSL
IR compilers performance.
The previous patch fixes the ownership problem in a more precise
way. So we can now remove this cloning.
Testing on a Ryzen 7 1800X shows a ~15% decreases in compiling the
Deus Ex: Mankind Divided shaders on radeonsi (which take 5min+ on
some machines). Looking just at the GLSL IR compiler the speed up
is ~40%.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The main motivation for this is that threaded compilation can fall
over if we were to allocate IR inside constant_expression_value()
when calling it on a builtin. This is because builtins are shared
across the whole OpenGL context.
f81ede4699 worked around the problem by cloning the entire
builtin before constant_expression_value() could be called on
it. However cloning the whole function each time we referenced
it lead to a significant reduction in the GLSL IR compiler
performance. This change along with the following patch
helps fix that performance regression.
Other advantages are that we reduce the number of calls to
ralloc_parent(), and for loop unrolling we free constants after
they are used rather than leaving them hanging around.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The Deus Ex: Mankind Divided shaders go from spending ~20 seconds
in the GLSL IR compilers front-end down to ~18.5 seconds on a
Ryzen 1800X.
Tested by compiling once with shader-db then deleting the index file
from the shader cache and compiling again.
v2:
- fix rebasing issue in v1
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
This function differs from ralloc_strcat() and ralloc_strncat()
in that it does not do any strlen() calls which can become
costly on large strings.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
We are currently copying the name for each member dereference
but we can just share a single instance of the string provided
by the type.
This change also stops us recalculating the field index
repeatedly.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Also add a comment that this should only be used by the ir_reader
interface for testing purposes.
v2:
- fix grammar in comment
- use unreachable rather than assert
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Extra validation is added to ir_validate to make sure this is
always updated to the correct numer of operands, as passes like
lower_instructions modify the instructions directly rather then
generating a new one.
The reduction in time is so small that it is not really
measurable. However callgrind was reporting this function as
being called just under 34 million times while compiling the
Deus Ex shaders (just pre-linking was profiled) with 0.20%
spent in this function.
v2:
- make num_operands a unit8_t
- fix unsigned/signed mismatches
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Also, silence an obnoxious finishme that started occurring for all
GL applications which use stencil after the i965 ISL conversion.
v2: Check against 3DSTATE_STENCIL_BUFFER's pitch bits when using
separate stencil, and 3DSTATE_DEPTH_BUFFER's bits when using
combined depth-stencil.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If we have an invalid display fed into the functions, the display lookup
will return NULL. Thus as we attempt to get the platform type, we'll
deref. it leading to a crash.
Keep in mind that this will not happen if Mesa is built without X11 or
when the legacy eglCreate*Surface codepaths are used.
A similar check was added with earlier commit 5e97b8f5ce ("egl: Fix
crashes in eglCreate*Surface), although it was only applicable when the
surfaceless platform is built.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The function can be called only when the type is EGL_WINDOW_BIT.
Remove the unneeded switch statement.
v2: Rename the local variable window to surface (Eric)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1)
The next patch is going to stop passing XCB_WINDOW_NONE (of type
xcb_window_enum_t) as an argument where these functions expect a void *,
which clang does not appreciate.
This patch cleans things up to better convince me and reviewers that
it's safe to do that.
v2: Emil Velikov: rebase/integrate with series
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The basic (null) check is identical across all backends.
Just move it to the top.
v2:
- Split the WINDOW vs PIXMAP into separate patches
- Move check after the dpy and config - dEQP expects so
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The code in _eglCreateWindowSurfaceCommon() already has a NULL check
which handles the condition. There's no point in checking again further
down the stack.
v2: Split the WINDOW vs PIXMAP into separate patches
v3: Resolve typos, s/EGL_PIXMAP_BIT_BIT/EGL_PIXMAP_BIT/
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The current two implementations - X11 and Wayland were identical,
barrind the upper limit.
Instead of having same code twice - introduce a helper and pass the
limit as an argument.
Thus as Android/DRM/others get support - they only need to call the
function ;-)
v2: Rebase on top of keeping ::swap_available
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
spirv_info.c existed as a static file until commit 2dd4e2ece3 began
generating it as part of the build process. autotools is incapable of
coping, and so a build-tree from before this commit would then fail with
it:
[4]: *** No rule to make target '../../../mesa/src/compiler/spirv/spirv_info.c', needed by 'spirv/spirv_info.lo'. Stop.
Add a few lines to configure.ac to update the broken build files.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
UE4Editor has this issue.
This commit prevents hangs (release build) or assertion failures (debug
build). It doesn't fix the editor, but catastrophic scenarios are
prevented.
Cc: 17.1 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We shouldn't be using GLX tokens in the dri subsystem, so define dri
SWAP_METHOD tokens and translate when necessary. Unfortunately the X server
uses the dri swap method value untranslated as the GLX fbconfig swapMethod,
so we can't enumerate these tokens arbitrarily, but rather need to make them
have the same values as the corresponding GLX tokens.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Due to bugs in dri swap method reporting, neither the fbconfigs received from
the server nor the value reported from driconfigs were correct. Now that's been
fixed and we can enable config swapmethod matching again.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Due to the recently fixed bug where dri drivers didn't report a correct
__DRI_ATTRIB_SWAP_METHOD value, and the fact that X servers just forward this
incorrect value (from the AIGLX dri driver) untranslated as
GLX_SWAP_METHOD_OML, the latter value might be undefined when old dri AIGLX
drivers are used, which breaks client fbconfig matching with server fbconfigs.
So work around this by assuming GLX_SWAP_METHOD_UNDEFINED when a bogus value
is read.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The attribMap had two entries for this attribute, and
driGetConfigAttribIndex didn't return a proper value for this attribute.
Fix this, and also make sure we return SWAP_UNDEFINED for single-buffer
configs as required by the GLX_OML_swap_method spec.
Finally bump the dri core extension version to 2, indicating that we
correctly report __DRI_ATTRIB_SWAP_METHOD.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This seems like a workaround, but we don't see the bug on CIK/VI.
On SI with the dEQP-VK.memory.pipeline_barrier.host_read_transfer_dst.*
tests, when one tests complete, the first flush at the start of the next
test causes a VM fault as we've destroyed the VM, but we end up flushing
the compute shader then, and it must still be in the process of doing
something.
Could also be a kernel difference between SI and CIK.
v2: hit this with a bigger hammer. This fixes a bunch of hangs
in the vk cts with the robustness tests.
Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101334
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For mul(a, +-1) codegen can generate OP_MOV with a saturation flag
set which is ignored at emission. The same can happen with add(a, 0),
and others.
Adding an assert for detecting more of such issues.
Fixes wrongly rendered water in Hitman Absolution running under wine.
Also a few shaders in Mad Max and Alien Isolation produce such MOVs.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
[imirkin: generalize the fix for other cases]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Commit e794f8bf8b ("gallium: move loading of drirc to pipe-loader")
moved the option cache to the pipe_loader_device. However, the
screen->dev pointer is not set when dri_init_options() is called. Move
the call to after the pipe_loader_sw_probe_kms() call so screen->dev is
set. This mirrors the code flow for dri2_init_screen().
Fixes: e794f8bf8b ("gallium: move loading of drirc to pipe-loader")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Previous behavior was inconsistent with other texture targets so this has been
fixed in OpenGL 4.6.
Fixes:
KHR-GL45.direct_state_access.textures_storage_errors
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The OpenGL 4.6 specs have been updated so that GetTextureParameter*
with a texture object with an incompatible TEXTURE_TARGET should now
report INVALID_OPERATION instead of INVALID_ENUM.
Fixes:
KHR-GL45.direct_state_access.textures_parameter_errors
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Currently swrastGetDrawableInfo always initializes w and h, patch
refactors function as x11_get_drawable_info that returns success and
sets the values only if no error happened. Add swrastGetDrawableInfo
wrapper function as expected by DRI extension.
v2: init w,y,w,h in swrastGetDrawableInfo (Eric)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reported-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
After a successful wait, we know the buffer ought to be idle.
Chris points out that: "The only caveat here is that bo is global, and
we have a very unlikely (and probably unnoticeable) race condition with
multiple contexts."
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
RELOC_NEEDS_GGTT is only meaningful on Sandybridge - it's skipped on
other generations - so this has no purpose. Just use rw_bo().
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
With the reloc domains gone, most of these are basically the same,
and the names don't make much sense anymore. Simplify them to ro_bo(),
rw_bo(), and ggtt_bo().
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The GPU reads the shader kernel from the program cache BO. It never
writes it, so using a read-write BO reference makes no sense.
Just make KSP read-only, and drop KSP_ro.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The .f32 was already getting added by emit_intrin_2f_param(). Noticed
when enabling LLVM module verification.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Before, we ended up always calling miptree_create_for_planar_image in
almost all cases because most images have image->planar_format != NULL.
This commit makes us only take that path if we have a multi-planar
format.
Reviewed-by: Daniel Stone <daniels@collabora.com>
This will allow us to call this function from
_mesa_alloc_shared_state() in the case that we run out of memory
part way through allocating the state.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We already expose glMultiDrawElementsBaseVertexEXT as part of the
EXT_draw_elements_base_vertex chunk, so this one can just be removed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
There was a previous error in the gl.xml and generated files that
referenced glMultiDrawElementsBaseVertexOES. This function should not
exist, only the EXT-suffixed version should.
Leaving the other headers alone to avoid conflicts with GL 4.6 work.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Also adds an assert because you never know how the winsys changes, and
multiprocess format differences are annoying.
Fixes: 1e696b962b "radv: add separate fmask tile swizzle counter."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Process most new SET packets in parallel with previous draw calls, then
flush caches and wait, start the draw, and do L2 prefetches last.
This decreases the [CP busy / SPI busy] ratio (verified with GRBM perf
counters). In other words, the time window when shaders are idle (between
(the wait and the draw) is much shorter now.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
so that we don't rely on si_pm4_state_enabled_and_changed, allowing us
to move prefetches after draw calls.
v2: ckear the dirty mask after unbinding shaders
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Other ones are either unsupported or don't have any helper
function checks.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Otherwise, this extension is not visible to the EGL users who
use the swrast driver.
This will allow the swrast driver to use eglCreateImageKHR,
provided the target is EGL_GL_TEXTURE_2D_KHR or
EGL_GL_RENDERBUFFER_KHR. Note we still have to implement the
create from render buffer path.
v2: add it to optional_core_extensions instead of swrast_core_extensions,
so it's not a requirement (Emil)
v3: Merge egl/dri2 changes together, also add support for
platform_wayland (Emil)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
Since the revelant functions have been moved to dri_helpers,
drisw.c can make use of the extension. Note we have version 6
of the extension, since we want to support createImageFromTexture.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
These functions will be used both by drisw.c and
dri2.c. This patch also moves some headers that can
be shared.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Although it doesn't seem like a strict requirement of the
code base, we do it when possible and it looks nice.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
These files provide helper structs and functions for dri2.c and drisw.c,
and name change better conveys that.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The make_shareable function deletes the aux buffer and then whacks
aux_usage to ISL_AUX_USAGE_NONE but not unsetting supports_fast_clear.
Since we only look at supports_fast_clear to decide whether or not to do
fast clears, this was causing assertion failures.
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101925
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The only one of the three remaining flags that has anything whatsoever
to do with layout is TILING_NONE. This commit renames them to
MIPTREE_CREATE_*, documents the meaning of each flag, and makes the
create functions take an actual enum type so GDB will print them nicely.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The only force tiling flag we really care about is LAYOUT_TILING_NONE.
The others don't actually do anything but add confusion.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Originally, I had moved it to the caller to make some things easier when
adding the CCS modifier. However, this broke DRI2 because
intel_process_dri2_buffer calls intel_miptree_create_for_bo but never
calls intel_miptree_alloc_aux. Also, in hindsight, it should be pretty
easy to make the CCS modifier stuff work even if create_for_bo allocates
the CCS when DISABLE_AUX is not set.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
The flag hasn't affected actual surface layout for some time. The only
purpose it served was to set bo->cache_coherent = false on the BO used
to create the miptree. This is fairly silly because we can just set
that directly from the caller where it makes much more sense.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We rename it to intel_miptree_supports_mcs and make the function
signature match intel_miptree_supports_ccs/hiz. We also move the sample
count check into the function so it returns false for single-sampled
surfaces.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We were calculating the total height of 2D surfaces by multiplying the
row pitch by the number of slices. This means that we actually request
slightly more space than actually needed since the padding on the last
slice is unnecessary. For tiled surfaces this is not likely to make a
difference. For linear surfaces, on the other hand, this means we may
require additional memory. In particular, this makes the i965 driver
reject EGL imports of buffers which do not have this extra padding.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
The docs contain a bunch of commentary about the need to pad various
surfaces out to multiples of something or other. However, all of those
requirements are about avoiding GTT errors due to missing pages when the
data port or sampler accesses slightly out-of-bounds. However, because
the kernel already fills all the empty space in our GTT with the scratch
page, we never have to worry about faulting due to OOB reads. There are
two caveats to this:
1) There is some potential for issues with caches here if extra data
ends up in a cache we don't expect due to OOB reads. However,
because we always trash the entire cache whenever we need to move
anything between cache domains, this shouldn't be an issue.
2) There is a potential issue if a surface gets placed at the very top
of the GTT by the kernel. In this case, the hardware could
potentially end up trying to read past the top of the GTT. If it
nicely wraps around at the 48-bit (or 32-bit) boundary, then this
shouldn't be an issue thanks to the scratch page. If it doesn't,
then we need to come up with something to handle it.
Up until some of the GL move to ISL, having the padding code in there
just caused us to harmlessly use a bit more memory in Vulkan. However,
now that we're using ISL sizes to validate external dma-buf images,
these padding requirements are causing us to reject otherwise valid
images due to the size of the BO being too small.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
This ports the workaround from radeonsi, that was missing in radv.
This fixes Talos rendering when MSAA is enabled on my Tahiti card.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Signed-off-by: Dave Airlie <airlied@redhat.com>
The configuration option --with-sha1 is no longer required for the
MESA_SHADER_READ_PATH, MESA_SHADER_DUMP_PATH environment variables
to take effect.
1- removed the "--with-sha1" sentence from docs/shading.html
2- added an extra note: that the corresponding dumped and replacement
shaders must have the same filenames for the feature to take effect.
Acked-by: Tapani Pälli <tapani.palli@intel.com>
This mirrors what Marek has done for radeonsi, and uses
a separate counter to handle the fmask surface for MSAA
MRTs.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just copies the code from the -pro shaders,
and fixes the tests on CIK.
With this CIK passes the same set of conformance
tests as VI.
Fixes: 83e58b03 (radv: flush f32->f16 conversion denormals to zero. (v2))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch adds support for large shaders on GC3000. For example the "terrain"
glmark benchmark with a large fragment shader will work after this.
If the GPU supports ICACHE, shaders larger than the available state area will
be uploaded to a bo of their own and instructed to be loaded from memory on
demand. Small shaders will be uploaded in the usual way. This mimics the
behavior of the blob.
On GPUs that don't support ICACHE, this patch should make no difference.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
GC3000 has changed from a separate store for VS and PS uniforms
to a single, unified one. There is backwards compatibilty functionalty,
however this does not work correctly together with ICACHE.
This patch adds explicit support, although in the simplest way possible:
the PS/VS uniforms split is still fixed and hardcoded. It should
make no difference on hardware that does not have unified uniform
memory.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The argument here is a bitmask, so the old code selected .xy, which
got silently truncated to .x when constructing the vec4 from components,
instead of using .w.
Fixes: 588185eb6b "radv/meta: add srgb conversion to end of resolve shader."
Reviewed-by: Dave Airlie <airlied@redhat.com>
It justs works with the fragment shader resolve, so no need to do
a custom conversion. In fact with SRGB dest, it actually gives
wrong results.
Fixes: 69136f4e63 "radv/meta: add resolve pass using fragment/vertex shaders"
Reviewed-by: Dave Airlie <airlied@redhat.com>
These seem to store very bogus results. Luckily there is some code
that converts srgb->linear already, so just making the descriptor
format UNORM should work.
Fixes: 588185eb6b "radv/meta: add srgb conversion to end of resolve shader."
Reviewed-by: Dave Airlie <airlied@redhat.com>
These need to match for interop compatibility queries.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is required for interop use cases. The same device must report
identical UUIDs through the GL and Vulkan APIs so that users can
identify when it is safe to perform a memory object import.
v2: use ac helpers to calculate the uuid
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We need vulkan and gl to produce the same UUIDs. Therefore we should
keep the mechanism to compute these in a common location to guarantee
they are updated in lockstep.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
These are used by EXT_external_objects to present UUIDs for the device
and the driver.
v2 (Timothy Arceri):
- remove extra break
- use _mesa_problem() rather the _mesa_error() for unimplemented
support for value types
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
v2: use PIPE_CAP_MEMOBJ to guard the extension
v3 (Timothy Arceri):
- expose extensions via the cap_mappings array
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Include no_error variants as well.
v2 (Timothy Arceri):
- reduced code churn by squashing some changes into
previous commits
v3 (Timothy Arceri):
- drop unused function declaration
v4 (Timothy Arceri):
- fix Driver function assert()
- add missing GL errors
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Instead of allocating memory to back a texture, use the provided memory
object.
v2: split off extension exposure logic
v3: de-duplicate code with st_AllocTextureStorage
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
v2: pass dedicated flag
v3 (Timothy Arceri):
- remove unrequired _mesa_init_memory_object_functions()
call in the state tracker.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
V2 (Timothy Arceri):
- fix copy and paste error with error message
V3 (Timothy Arceri):
- drop the Protected field for now as its unused
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Used by EXT_external_objects and EXT_external_objects_fd
V2 (Timothy Arceri):
- Throw GL_OUT_OF_MEMORY error if CreateMemoryObjectsEXT()
fails.
- C99 tidy ups
- remove void cast (Constantine Kharlamov)
V3 (Timothy Arceri):
- rename mo -> memObj
- check that the object is not NULL before initializing
- add missing "EXT" in function error message
V4 (Timothy Arceri):
- remove checks for (memory objecy id == 0) and catch in
_mesa_lookup_memory_object() instead.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The device version is the maximum CL version that the device supports.
device_version and device_clc_version are not necessarily the same for
devices that support CL 1.0, but have a 1.1 compiler and the necessary
extensions.
Eventually, this will be based on the features/extensions of the actual
device, but for now move it a bit closer to its eventual destination.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesey <jan.vesely@rutgers.edu>
This is a bug in the app, but I'd rather avoid hanging the GPU,
esp if someone is running in validation and it takes out their
development environment.
v2: get it right, reverse the polarity.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Having two callbacks to manage a single int seems like an overkill.
Use a cached copy and update that when needed.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
---
Might want to look if the dimensions dance in .query_surface ...
speaking of which close to nobody implements that ...
Currently xmlconfig is conditionally used, only when --enable-dri is
available.
As the library has moved to src/util and has wider wisebase, this guard
is no longer correct. Strictly speaking - it wasn't since the
introduction of xmlconfig into st/nine a while ago.
Unconditionally enable xmlconfig and drop the linking. As said before
there's other users of the library, so depending on the configure
options we will get multiple definitions of said symbols.
NOTE: To avoid breaking other combinations, this commit adds the
xmlconfig link to the required places - throughout gallium and the DRI
loaders.
Cc: Aaron Watry <awatry@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
The kernel only cares about whether the object is to be written to or
not, only reduces (reloc.read_domains, reloc.write_domain) down to just
!!reloc.write_domain. When we use NO_RELOC, the kernel doesn't even read
those relocs and instead userspace has to pass that information in the
execobject.flags. We can simplify our reloc api by also removing the
unused read/write domains and only pass the resultant flags.
The caveat to the above are when we need to make the kernel aware that
certain objects need to take into account different work arounds.
Previously, this was done using the magic (INSTRUCTION, INSTRUCTION)
reloc domains. NO_RELOC requires this to be passed in the execobject
flags as well, and now we push that up the callstack.
The API is more compact, more expressive of what happens underneath, but
unfortunately requires more knowledge of the system at the point of use.
Conversely it also means that knowledge is specific and not generally
applied and so not overused.
text data bss dec hex filename
8502991 356912 424944 9284847 8dacef lib/i965_dri.so (before)
8500455 356912 424944 9282311 8da307 lib/i965_dri.so (after)
v2: (by Ken) Rebase.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Based on a patch by Chris Wilson (who also wrote this commit message).
Passing the index of the target buffer via the reloc.target_handle is
marginally more efficient for the kernel (it can avoid some allocations,
and can use a direct lookup rather than a hash or search). It is also
useful for ourselves as we can use the index into our exec_bos for other
tasks.
v2: Only enable HANDLE_LUT if we can use BATCH_FIRST and thereby avoid
a post-processing loop to fixup the relocations.
v3: Move kernel probing from context creation to screen init.
Use batch->use_exec_lut as it more descriptive of what's going on (Daniel)
v4: Kernel features already exists, use it for BATCH_FIRST
Rename locals to preserve current flavouring
v5: Squash in "always insert batch bo first"
v6: (by Ken) Split out BATCH_FIRST from HANDLE_LUT.
Extracted from a patch by Chris Wilson.
Now that the batch is always at the front of the validation list,
we don't need to special case it - the usual "go find an existing BO"
code will work just fine.
To avoid a forward declaration in the next patch, move the definition of
add_exec_bo() earlier.
v2: (by Ken) redo move.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since before the kernel supported I915_EXEC_NO_RELOC, long before our
minimum kernel requirement, the kernel unconditionally invalidated all
GPU TLBs before a batch and flushed all GPU caches after a batch. At
that moment, the only use for read/write domain was for activity
tracking, ensuring that future reads waited for the last writer and
future writes waited for all reads. This only requires a single bit in
the execbuf interface which can be supplied via the NO_RELOC interface,
making the use of relocation domains entirely redundant.
Trimming the excess writes into the array allows the compiler to be much
more frugal:
text data bss dec hex filename
8493790 357184 424944 9275918 8d8a0e i965_dri.baseline
8493758 357184 424944 9275886 8d89ee i965_dri.so
(This text improvement really does come from dropping domains, not from
the new use of C99 initializers.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If we correctly fill the batch with the right relocation value, and that
matches the expected location of the object, we can then tell the kernel
it can forgo checking each individual relocation by only checking
whether the object moved.
v2: Rebase to apply ahead of I915_EXEC_HANDLE_LUT
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Borrow a trick from anv, and use the last known index for the bo to skip
a search of the batch->exec_bo when adding a new relocation. In defence
against the bo being used in multiple batches simultaneously, we check
that this slot exists and points back to us.
v2: Also update brw_batch_references()
v3: Reset bo->index on creation (Daniel)
v4: Improved explanation of bo->index (Kenneth)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We must be careful to only compute the address once based on the
per-context information (rather than accessing the unlocked global
bo->offset64) so that the value in the batch does match the
reloc.presumed_offset we declare to the kernel. Otherwise, highly
unlikely, but we may see GPU hangs in multithreaded users.
The only real complication here is isl_surf_fill_state() which needs to
adjust the reloc.delta to both general a tile offset and to encode state
into the lower 12 bits.
(Rebased on ISL changes by Ken.)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use the .pc file, as provided by version prior 2.1.0 onward and dropping
the manual header/library check.
Version 2.1.0 was released back in Mar 2012 and all major distributions
use it.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (IRC)
Earlier commits moved the xmlconfig library to a wider userbase.
Thus having the check within --enable-dri is insufficient.
Upon closer look, nine needed it from it's early days - 948e6c5228
("nine: Add drirc options (v2)")
Fixes: 601093f95d ("xmlconfig: move into src/util")
Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (IRC)
Android build changes to avoid the following building error:
target C: libmesa_pipe_radeonsi <= external/mesa/src/gallium/drivers/radeonsi/si_pipe.c
...
In file included from external/mesa/src/gallium/drivers/radeonsi/si_pipe.c:38:
external/mesa/src/compiler/nir/nir.h:48:10: fatal error: 'nir_opcodes.h' file not found
^
1 error generated.
Fixes: da62a31c5b "radeonsi: add nir include paths"
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
For buffer objects, where we primarily expect to be writing to them and
so already have a WC mmap (for !llc access) reusing the existing mmap
and keeping the buffer out of the CPU cache seems preferable.
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Missed updating this caller of pipe_loader_find_module.
Fixes: 0d7d60b7ea ("pipe-loader: pass only the driver_name to pipe_loader_find_module")
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The config passed into the screen should be independent from the state
tracker, because at least in the case of radeonsi, the screen structure
can be shared between different state trackers.
Incidentally, this also fixes crashes that were recently introduced.
Fixes: a35a9e7c ("gallium: add driconf options to pipe_screen_config")
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
They were set only by the DRI state tracker, which is problematic
when radeonsi is used with different state trackers in the same
process.
Also, we don't need them anymore.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Also, access the options directly, allowing us to get rid of the
PIPE_SCREEN_xxx flags.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Commit 0ab04ba979 (anv: Use python to generate ICD json files) changed
the way ICD json files are created.
Remove the old .in files from extra dist, and add the python script.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This fixes the image descriptors for mipmapped tile swizzle
Fixes: 2b7e8556 (ac/surface: enable tile swizzle for mipmapped textures)
Signed-off-by: Dave Airlie <airlied@redhat.com>
When Marek enabled mipmapped swizzle, radv didn't
have the code in place to handle it. This fixes the
regression.
I'll look more into GFX9 once I have a vega card (soon).
Fixes: 2b7e8556 (ac/surface: enable tile swizzle for mipmapped textures)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Note that dcc_alignment = pipe_interleave_bytes * num_pipes * num_banks,
which is greater than the previous open-coded alignment.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The tile swizzle computation was done after the whole miptree was computed,
but that was too late, because at that point AddrSurfInfoOut contained
information about the smallest miplevel, which is never 2D-tiled.
The correct way is to do the computation before the second level is computed.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
In Mesa we use the convention that if gl_renderbuffer::NumSamples
or gl_texture_image::NumSamples is zero, it's a non-MSAA surface.
Otherwise, it's an MSAA surface. But in gallium nr_samples=1 is a
non-MSAA surface.
Before, if the user called glRenderbufferStorageMultisample() or
glTexImage2DMultisample() with samples=1 we skipped the search for the
next higher number of supported samples and asked the gallium driver to
create a surface with nr_samples=1. So we got a non-MSAA surface.
This failed to meet the expection of the user making those calls.
This patch changes the sample count checks in st_AllocTextureStorage()
and st_renderbuffer_alloc_storage() to test for samples > 0 instead of > 1.
And we now start querying for MSAA support at samples=2 since gallium has
no concept of a 1x MSAA surface.
A specific example of this problem is the Piglit arb_framebuffer_srgb-blit
test. It calls glRenderbufferStorageMultisample() with samples=1 to
request an MSAA renderbuffer with the minimum supported number of MSAA
samples. Instead of creating a 4x or 8x, etc. MSAA surface, we wound up
creating a non-MSAA surface.
Finally, add a comment on the gl_renderbuffer::NumSamples field.
There is one piglit regression with the VMware driver:
ext_framebuffer_multisample-blit-mismatched-formats fails because
now we're actually creating 4x MSAA surfaces (the requested sample
count is 1) and we're hitting some sort of bug in the blitter code. That
will have to be fixed separately. Other drivers may find regressions
too now that MSAA surfaces are really being created.
v2: start quering for MSAA support with samples=2 instead of 1.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Both the GLSL 4.00 specs and DX10.1 specs specify that if a fragment
shader uses the sample ID or sample position inputs, the shader is
automatically run at per sample frequency. Document that expectation
for gallium fragment shaders.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The default values for GL_SAMPLE_SHADING and GL_MIN_SAMPLE_SHADING_VALUE
are missing from the state tables in the GL spec, but they're supposed
to be GL_FALSE and 0.0, per the GL_ARB_sample_shading spec.
Add code for that, just to be explicit.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Note that the Cray flags (-target-cpu=) need to come first since the
cray programming environment uses wappers around other compilers. By
checking the wrapper flags first, you can be sure to match the wrapper
flag instead of the underlying compiler (gcc, intel, pgi, etc.) flags.
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
v2: add libxmlconfig.la to the dynamic pipe_radeonsi driver
v3: add libxmlconfig.la to targets/opencl build
v4: add EXPAT_LIBS to opencl build
(note: for only-opencl builds, Emil's configure.ac changes
are also needed)
Fixes: bc7f41e11d ("gallium: add pipe_screen_config to screen_create functions")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102014
Tested-by: Andy Furniss <adf.lists@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
Fixes build error with anv_extensions.c not found for
libmesa_anv_entrypoints.
Fixes: d62063c "anv: Autogenerate extension query and lookup"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Android build changes to avoid the following building error:
In file included from external/mesa/src/gallium/targets/dri/target.c:1:
external/mesa/src/gallium/auxiliary/target-helpers/drm_helper.h:185:10:
fatal error: 'radeonsi/si_driinfo.h' file not found
^
1 error generated.
Fixes: 0f8c5de869 "radeonsi: prepare for driver-specific driconf options"
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Android build changes to avoid the following building error:
external/mesa/src/gallium/drivers/radeonsi/si_shader_nir.c:505:
error: undefined reference to 'ac_nir_translate'
Fixes: 86d4b46d66 "ac/common: always build NIR translation"
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
These were only here to keep building without needing to update libdrm.
Now that we include i915_drm.h in Mesa, we don't need this - our copy
is new enough and has the #define.
Trivial.
Fixes:
CXXLD addrlib/libamdgpu_addrlib.la
ar: `u' modifier ignored since `D' is the default (see `U')
../../../../src/amd/common/ac_nir_to_llvm.c:33:27: fatal error:
ac_shader_abi.h: No such file or directory
#include "ac_shader_abi.h"
^
compilation terminated.
Makefile:985: recipe for target
'common/common_libamd_common_la-ac_nir_to_llvm.lo' failed
When running `make distcheck`
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
If dual object compile fails (as seems to happen with virgl a
fair bit, and does piglit even have any tests for it?), we end up
not restarting the pull params, so we call
vec4_visitor::move_uniform_array_access_to_pull_constant
a second time and it runs over the ends of the alloc.
Fixes: tests/spec/glsl-1.50/execution/geometry/max-input-components.shader_test
running inside virgl on ivybridge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Implement the state tracker manager drawable interface flush_swapbuffer
method by plumbing it through to dri3 if available.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Add a state tracker interface method to flush outstanding swapbuffers, and
add a call to it from the mesa state tracker during glFinish().
This doesn't strictly mean the outstanding swapbuffers have actually finished
executing but is sufficient for glFinish()
to be able to be used as a replacement for glXWaitGL().
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
This method may be used by dri drivers to make sure all outstanding
buffer swaps have been flushed to hardware.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This can be used to guard support for EXT_memory_object and related
extensions.
v2: update gallium docs
v3 (Timothy Arceri):
- add cap to nv50
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This reduces the number of BOs that we need for the BO lists during
a submission.
Currently uses a fairly simple linear search for finding free space,
that could eventually be improved to a binary tree, which with some
per-node info could make a check for space O(1) and finding it O(log n),
in the number of buffers in that slab.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
As of 4.11, the kernel isn't bothering to set the subslice hashing mode
on Apollolake, leaving it at the default of 8x8. (It initializes it to
16x4 on most platforms.)
Performance data for GPUTest Triangle on Apollolake at 1024x640:
X-tiled RT:
-----------
8x8 -> 16x4: 2.4325% +/- 0.383683% (n=107)
8x8 -> 8x4: -3.75105% +/- 0.592491% (n=40)
8x8 -> 16x16: 6.17238% +/- 0.67157% (n=30)
Y-tiled RT:
-----------
8x8 -> 16x4: 1.30307% +/- 0.297292% (n=205)
8x8 -> 8x4: -0.769282% +/- 0.729557% (n=35)
8x8 -> 16x16: 3.00254% +/- 0.715503% (n=40)
8x MSAA RT (INTEL_FORCE_MSAA=8):
--------------------------------
8x8 -> 16x4: 1.38889% +/- 0.93729% (n=7)
8x8 -> 8x4: -2.10643% +/- 1.15153% (n=3)
8x8 -> 16x16: 3.87183% +/- 1.08851% (n=5)
Based on this, we choose 16x16 for Apollolake.
Skylake GT2 with X-tiled buffers appears to be a toss-up between 16x4
and 16x16, and with Y-tiled buffers it doesn't seem to really matter.
So we'll leave Skylake alone for now.
The hashing mode doesn't seem to make a measurable impact on more
complex benchmarks.
Acked-by: Matt Turner <mattst88@gmail.com>
One could have vX+1 which introduces another entrypoint without
implementing older ones.
v2: Rebase, while keeping loaderPrivate
Fixes: 1bf703e4ea ("dri_interface,egl,gallium: only expose RGBA visuals
on Android")
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The cacheline alignment restriction is on the base address; the pitch
can be anything.
Fixes assertion failures when using primus (say, on glxgears, which
creates a 300x300 linear BGRX surface with a pitch of 1200):
intel_blit.c:190: get_blit_intratile_offset_el: Assertion `mt->surf.row_pitch % 64 == 0' failed.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Move AVX512BW specific intrinics to be Core-only.
Move some AVX512F intrinsics back to common implementation file.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Disable an optimization which implemented sse/avx operations on avx512
using avx512 intrinsics (to avoid switching between lane widths).
Compile with SIMD_OPT_128_AVX512 / SIMD_OPT_256_AVX512 defined to enable
these optimizations.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Fix problems found when enabling USE_SIMD16_FRONTEND, mostly related to
vMask / movemask_ps(pd).
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This is more lines of code but the python is far easier to read than the
sed expressions we were using before. Also, this allows us to pull the
API version from anv_entrypoints.py so it never gets out-of-sync.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The VkVersion class is probably overkill but it makes it really easy to
compare versions in a way that's safe without the caller having to think
about patch vs. no patch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This way we can use "from anv_extensions import *" in the entrypoint
generator without worrying too much about pollution
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
When building sandboxed, we may encounter additional errors. Ignore the errors,
as we are in a constrained environment.
This can be observed when building latest git with OBS.
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This implements a wait for glXWaitGL, glXCopySubBuffer, dri flush_front and
creation of fake front until all pending SwapBuffers have been committed to
hardware. Among other things this fixes piglit glx-copy-sub-buffers on dri3.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: <mesa-stable@lists.freedesktop.org>
There is already get_shader_source(), and shader_source() will
be used for adding KHR_no_error support.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The mesa state tracker was needlessly flushing the front buffer even if it
hadn't been drawn to since the last flush. This was happening during
glXSwapBuffers if we at some point previously had set that frontbuffer as
a read- or draw renderbuffer, or at glFlush() or glFinish() if we at some
point previously had rendered to the front buffer. Since the frontbuffer
flush typically means a full drawable copy, it's a pretty big waste.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Check if shaders have transform feedback varyings also after the
post-link step.
This fixes:
KHR-GL45.enhanced_layouts.xfb_vertex_streams
piglit/spec/arb_enhanced_layouts/gs-stream-location-aliasing
v2: add claryfing comments (Timothy)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We will switch to the pipe_loader loading the configuration options,
so that they can be passed to the driver independently of the state
tracker.
Put the description into its own file so that it can be merged easily
with driver-specific options in future commits.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This allows a more generic mechanism for passing user configurations
into drivers by accessing the dri options directly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This tool merges driinfo XML that is built using DRI_CONF_xxx macros.
The intention is to merge together state-tracker options with
driver-specific options.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Most of the change is concerned with avoiding memory leaks, since v2 of
the DRI extension returns a malloc'ed string. This also allows us to
resolve the long-standing issue of keeping drivers loaded when returning
from glXGetDriverConfig.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The new function is defined to return a malloc'ed pointer. In the
following patches, this helps avoid leaking library handles when pipe
drivers are linked dynamically.
It also allows us to generate the XML string on the fly in the future.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The QBO workaround compute grid launch emits the render condition atom
when dirty, so install the render condition in the context only after
launching the compute grid. This avoids a redundant SET_PREDICATION.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There is a firmware regression that causes failures. Work around it by
using the compute shader for query_buffer_objects to summarize the query
results.
v2: rename to PREDICATION_OP_BOOL64 (consistent with sid.h)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The predication bits are "visible or no overflow" and "not visible or
overflow", so we need to invert the check relative to the GL and Gallium
interface semantics.
Also, predication by the other streamout-related queries is not allowed.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The issue here is that the immediate is treated as a 64-bit value,
and fetching it does not work reliably with swizzles that are different
from xy and zw.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is mostly mechanical search-and-replace, plus touching up the
macros in u_dump_defines.c manually a bit.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: rename cap to PIPE_CAP_QUERY_SO_OVERFLOW and be a bit more explicit
in the documentation
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes following build issues:
In file included from vendor/intel/external/android_ia/mesa/src/mesa/drivers/dri/common/dri_util.c:45:
vendor/intel/external/android_ia/mesa/src/util/xmlpool.h:103:10: fatal error: 'xmlpool/options.h' file not found
...
In file included from vendor/intel/external/android_ia/mesa/src/mesa/drivers/dri/i965/intel_screen.c:44:
vendor/intel/external/android_ia/mesa/src/util/xmlpool.h:103:10: fatal error: 'xmlpool/options.h' file not found
Fixes: 601093f9 (xmlconfig: move into src/util)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
LLVM complained about passing an i32 to a float clamp.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: 0f9e32519b "ac/nir: clamp shadow texture comparison value on VI"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Since make_surface() can fail, if the format isn't support by hw or
simlar error, we need to check the result before dereferencing it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reported by valgrind at:
glsl_to_tgsi_visitor::visit(ir_expression*) (st_glsl_to_tgsi.cpp:1560)
When compiling the Deus Ex shaders.
Fixes: 28a5e7104 ("st/glsl_to_tgsi: handle precise modifier")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This looks like it's supported since llvm 3.9 at least,
so switch over radeonsi and radv to using it, -pro also
uses this. We can now drop creating lds for these operations
as the ds_swizzle operation doesn't actually write to lds at all.
Acked-by: Marek Olšák <marek.olsak@amd.com>
(stable requested due to fixing radv CIK conformance tests)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
This makes it match radeonsi. The LLVM backend itself will emit the
correct instruction, but LLVM might do incorrect optimizations since it
thinks the output is undefined when the input is 0, even though it's not
supposed to be. We really need a new intrinsic, or for the backend to
become smarter and recognize this pattern.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
As time goes on, extension advertising is going to get more complex.
Today, we either implement an extension or we don't. However, in the
future, whether or not we advertise an extension will depend on kernel
or hardware features. This commit introduces a python codegen framework
that generates the anv_EnumerateFooExtensionProperties functions as well
as a pair of anv_foo_extension_supported functions for querying for the
support of a given extension string. Each extension has an "enable"
predicate that is any valid C expression. For device extensions, the
physical device is available as "device" so the expression could be
something such as "device->has_kernel_feature". For instance
extensions, the only option is VK_USE_PLATFORM defines.
This mechanism also means that we have a single one-line-per-entry table
for all extension declarations instead of the two tables we had in
anv_device.c and the one we had in anv_entrypoints_gen.py. The Python
code is smart and uses the XML to determine whether an extension is an
instance extension or device extension.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This will allow us to keep everything in one place when it comes to
declaring what extensions are supported.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
GLES/gl.h has historically provided some typedefs that are not
used in the API itself. Restore these typedefs that were lost to
avoid breaking applications.
These seem to be the only typedefs removed in the update.
Fixes: 7fd0817 "Update Khronos-supplied headers"
[Eric: added a big warning to revert this patch when pulling the updated header]
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Turn comments into actual code, that the compiler can check for us :)
(Speaking of, one of the comments had a typo. Challenge: find it)
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
When we have an interface block like:
layout (xfb_buffer = 0, xfb_offset = 0) out Block {
vec4 var1;
layout (xfb_stride = 48) vec4 var2;
vec4 var3;
};
According to ARB_enhanced_layouts spec:
"The *xfb_stride* qualifier specifies how many bytes are consumed by
each captured vertex. It applies to the transform feedback buffer
for that declaration, whether it is inherited or explicitly
declared. It can be applied to variables, blocks, block members, or
just the qualifier out. [ ...] While *xfb_stride* can be declared
multiple times for the same buffer, it is a compile-time or
link-time error to have different values specified for the stride
for the same buffer."
This means xfb_stride actually applies to the buffer, and not to the
individual components.
In the above example, it means that var2 consumes 16 bytes, and var3 is
at offset 32.
This has been confirmed also by John Kessenich, the main contact for the
ARB_enhanced_layouts specs, and also because this commit fixes:
GL45.enhanced_layouts.xfb_block_member_stride
This commit is in practice a revert of 598790e856 (glsl: apply
xfb_stride to implicit offsets for ifc block members).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
I don't know the condition for the flush, but we better turn this off.
The sL1 flush is used when CE dumps stuff into a ring buffer and the ring
buffer wraps.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Python is the scripting language we've been using for scripts that need
to run across all supported platforms.
Shell is *not* a portable language for scripts.
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
It's a single atomic add, so it makes sense to inline it.
Improves performance in Piglit's drawoverhead microbenchmark's
"DrawArrays ( 1 VBO, 0 UBO, 0 ) w/ no state change" subtest by
0.400922% +/- 0.310389% (n=350) on my i7-7700HQ.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This reverts commit 3008161d28,
which caused a regression for VMWare.
The initial code had some recursion in it, that I removed by accident
trying to add back the recursion broke lots of things, take the high
road and revert for now.
Fixes: 3008161d (st_glsl_to_tgsi: rewrite rename registers to use array fully.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.*
for a2r10g10b10 formats as destination on SI/CIK hardware.
This adds support to the meta program for emitting 10-bit
outputs, and adds 10-bit support to the fragment shader key.
It also only does the int8/10 on SI/CIK.
Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In some APU situations the reported visible size can be larger than
VRAM size. This properly clamps the value.
Surprisingly both CTS and spec seem to allow a heap type with size 0,
so this seemed like the easiest option to me.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: 4ae84efbc5 "radv: Use enum for memory heaps."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Commit 601093f95d ("xmlconfig: move into src/util") broke the Android
build due to missing libexpat dependency:
external/mesa3d/src/util/xmlconfig.c:34:10: fatal error: 'expat.h' file not found
Fixes: 601093f95d ("xmlconfig: move into src/util")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
ARB_polygon_offset_clamp and ARB_texture_filter_anisotropic look like
they'd be pretty trivial to wire up.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
When this GL call is a no-op, it should be a little faster in
the errors path only.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When this GL call is a no-op, it should be a little faster in
the errors path only.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When this GL call is a no-op, it should be a little faster in
the errors path only.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Otherwise, code generation fails. This has become necessary since some
shaders are wrapped in control flow.
Fixes: 081ac6e5c6 ("radeonsi/gfx9: always wrap GS and TCS in an if-block (v2)")
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Also, disable geometry and tessellation shaders. Mixing and matching NIR
and TGSI shaders should work (and I've tested it for the VS/PS interface),
but geometry and tessellation requires VS-as-ES/LS, which isn't implemented
yet for NIR.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Needed for TC-compatible HTILE in radeonsi for test cases like
piglit spec/arb_texture_rg/execution/fs-shadow2d-red-01.shader_test
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This simplifies a bunch of places that no longer need special treatment
of value_count == 1. We rely on LLVM to optimize away the 1-element vector
types.
This fixes a bunch of bugs where 1-element arrays are indexed indirectly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The scanning phase sets the driver_location, because it is part of the
ABI: radeonsi does the assignment differently.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The eventual goal is to hide all radv-specific details behind
ac_nir_context::abi, so that the NIR->LLVM code can be re-used by
radeonsi.
During development, we live with a partial split, where some of the
NIR->LLVM code still relies on linking back to the nir_to_llvm_context
(which should ultimately be renamed to reflect that it's radv-specific).
The idea is to get rid of these backlinks over time.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This allows drivers more freedom in how exactly they want to lower I/O,
e.g. first lowering I/O to temporaries.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is a further lowering of default-block uniform loads that transforms
load_uniform intrinsics into load_ubo intrinsics. This simplifies the rest
of the backend.
v2: transform from load_uniform instead of straight from variables
Reviewed-by: Eric Anholt <eric@anholt.net>
This pass is a replacement for the nir_lower_samplers pass, which has the
advantage of keeping sampler references as derefs. This allows a unified
treatment of texture instructions and image intrinsics in the backend.
Some drivers prefer to treat gl_FragCoord as a system value rather than
a fragment shader input, see Const.GLSLFragCoordIsSysVal.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
These are just no-op because we don't actually do anything
useful in the errors path.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
X/GLX can't handle them. This removes almost 500 GLX visuals that were
incorrectly exposed.
Add an optional getCapability callback for querying what the loader can do.
I'm not splitting this patch, because it's already too small.
v2: also add the callback to __DRIimageLoaderExtension
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
It's useless to clamp the same values for all viewports.
+7% in the "viewport change" test (drawoverhead benchmark).
v2: - call clamp_viewport() in all callers of set_viewport_no_notify()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
In commit 8771285054, José replaced the
Tungsten Graphics copyright notices with VMware, as Tungsten is gone.
I later imported brw_bufmgr.c, reintroducing a Tungsten copyright.
This commit does the equivalent of José's change to the new file.
This reformats the copyright header to match what we use in most of the
newer parts of the driver. There are a few minor alterations: we change
"COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS" to the standard
"AUTHORS OR COPYRIGHT HOLDERS", and move the permission notice to the
proper place (it should be in the middle, so "next paragraph" actually
refers to something).
Both of these changes match the OSI's MIT License text:
https://opensource.org/licenses/MIT
I copied this from genX_state_upload.c.
This fixes corruption with bindless textures in Dawn Of War 3.
The do_update_surf_dirtiness mechanism was complicated and dirty_level_mask
was only updated after the first draw call. The problem is bindless textures
are checked for decompression every draw call and we would only decompress
after the first draw call. The solution is to set dirtiness after the last
draw call to the framebuffer, so the (unconditional) decompression of
bindless textures happens at the right time.
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Since array splitting for AoA is disabled, we have to retrieve
the type of the first non-array type when an array of images is
declared inside a structure. Otherwise, it will hit an assert
in glsl_type::sampler_index() because it expects either a sampler
or an image type.
This fixes a regression in the following piglit test:
arb_bindless_texture/compiler/images/arrays-of-struct.frag
Fixes: 57165f2ef8 ("glsl: disable array splitting for AoA")
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The slower convert-and-copy process performs a bad conversion
because it converts the value to signed 64-bit integer, but
bindless uniform handles are considered unsigned 64-bit.
This fixes "Check glUniform*() with mixed texture units/handles"
from arb_bindless_texture-uniform piglit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
On SI this was causing a hang in
dEQP-VK.pipeline.render_to_image.core.2d_array.mipmap.r16g16_sint_s8_uint
This was due to not handling the tile mode index for depth like
I fixed previously for new GPUs.
Fixes: 01d0c5a9 (radv: fix stencil regression since new addrlib import)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The host doesn't understand this yet, so drop it for now.
Fixes: virgl regressions.
Fixes: af22adee4f (tgsi: add precise flag to tgsi_instruction)
Signed-off-by: Dave Airlie <airlied@redhat.com>
With merged ESGS shaders, the GS part of a wave may be empty, and the
hardware gets confused if any GS messages are sent from that wave. Since
S_SENDMSG is executed even when EXEC = 0, we have to wrap even
non-monolithic GS shaders in an if-block, so that the entire shader and
hence the S_SENDMSG instructions are skipped in empty waves.
This change is not required for TCS/HS, but applying it there as well
simplifies the logic a bit.
Fixes GL45-CTS.geometry_shader.rendering.rendering.*
v2: ensure that the TCS epilog doesn't run for non-existing patches
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The shader that is used to copy vertex data out of the vs/gs shaders to
the user-specified buffer (streamout or SO shader) was not using the
correct offsets.
Adjust the offsets that are used just for the SO shader:
- Make sure that position is handled in the same special way
as in the vs/gs shaders
- Use the correct offset to be passed in the core
- consolidate register slot mapping logic into one function, since it's
been calculated in 2 different places (one for calcuating the slot mask,
and one for the register offsets themselves
Also make room for all attibutes in the backend vertex area.
Fixes:
- all vtk GL2PS tests
- 18 piglit tests (16 ext_transform_feedback tests,
arb-quads-follow-provoking-vertex and primitive-type gl_points
v2:
- take care of more SGV slots in slot mapping logic
- trim feState.vsVertexSize
- fix GS interface and incorporate GS while calculating vsVertexSize
Note that vsVertexSize is used in the core as the one parameter that
controls vertex size between all stages, so it has to be adjusted appropriately
for the whole vs/gs/fs pipeline.
Also note that GS and SO is not fully implemented. This will be addressed
later.
fixes:
- fixes total of 20 piglit tests
CC: 17.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This ports 72e46c988 to radv.
radeonsi: apply a TC L1 write corruption workaround for SI
Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We were adding pad to size after creating the object, so we could
submit a CS bigger than the bo created for it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This ports: da7453666a
radeonsi: don't apply the Z export bug workaround to Hainan
to radv.
Just noticed in passing.
Fixes: f4e499ec7 (radv: add initial non-conformant radv vulkan driver)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We already have this little optimization for color clears. Now that
we're actually tracking whether or not a slice has any fast-clear
blocks, it's easy enough to add for depth clears too.
Improves performance of GFXBench 4 TRex at 1920x1080 by:
- Skylake GT4: 0.905932% +/- 0.0620197% (n = 30)
- Apollolake: 0.382434% +/- 0.1134730% (n = 25)
v2: (by Ken) Rebase and drop intel_mipmap_tree.c changes, as they're
no longer necessary (other patches already landed to do that part)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When changing the clear value, we need to resolve any fast cleared data.
Previously, we were performing resolves on every slice with HiZ enabled.
We only need to resolve slices that a) have fast clear data, and b)
aren't about to be cleared to the new color. In the latter case, we
were actually doing a resolve, and then a fast clear - when we could
skip both, causing the existing fast cleared area to be updated to the
new clear value for no additional work.
This patch stops using intel_miptree_prepare_access in favor of a more
optimal open coded loop that knows about our clear operation.
v2: (by Ken) Rebase on islification, write a real commit message.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We'll fail to flag an error if the context flags appear after the
no-error attribute in the context attribute list.
Delay the check to after attribute parsing to fix this.
Fixes: 4909519a66 ("egl: Add EGL_KHR_create_context_no_error support")
Cc: mesa-stable@lists.freedesktop.org
[Emil Velikov: add fixes/stable tags, commit message polish]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The number of supported waves per thread group has been reduced to 16
with gfx9. Trying to use 32 waves causes hangs, and barriers might
not work correctly with > 16 waves.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The firmware version numbers for SI were wrong. The new numbers are probably
too conservative (we don't have a definitive answer by the firmware team),
but DRAW_INDIRECT_MULTI has been confirmed to work with these versions on
Tahiti (by Gustaw) and on Verde (by myself).
While this is technically adding a feature, it's a feature we thought we had
for a long time. The change is small enough and we're early enough in the 17.2
release cycle that it should still go in.
Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Cc: 17.2 <mesa-stable@lists.freedesktop.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The EU limit of 128 GRFs should allow 32 vertex elements of 4 GRFs.
However, the maximum allowed value of "Vertex URB Entry Read Length"
in SIMD8 is 15. And 15 * 8 = 120 gives us a limit of 30 vertex elements.
Because we also need to reserve a vertex buffer to upload
VertexIndex/InstanceIndex and another to upload DrawID when needed,
we can only expose 28.
Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Here the AUX_USAGE_* mode indicates that we have HiZ, so we will have
a HiZ buffer. But Coverity doesn't know that, so it thinks it might
be NULL because we checked hiz_buf != NULL earlier.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
NewBufferObj() is called when the shared state is allocated so we
wouldn't get this far if it was NULL.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This keeps the flags out of v3d_decode.c's output. In the generated code,
only the unpack functions see any change (where they now get the
restricted start value), and vc4 doesn't use the unpack functions yet.
I was writing the XML such that the address field overlapped various flags
in the alignment bits, which caused pain when trying to unpack for decode.
Instead, keep the XML matching the docs (address fields don't overlap),
and just infer the appropriate shift value during decode.
During pack, the address is just applied to the appropriate bits
already, ignoring the sub-byte start/end fields.
We simply pick r4 if available (anything else would force a MOV), then
round-robin through accumulators (avoids physical regfile RAW delay
slots), then round-robin through the physical regfile.
The effect on instruction count is pretty impressive:
total instructions in shared programs: 76563 -> 74526 (-2.66%)
instructions in affected programs: 66463 -> 64426 (-3.06%)
and we could probably do better with a little heuristic of "if we're going
to choose a physical reg, and other operands of instructions using this as
a src have the same physical regfile, then use the other regfile".
VC4 has had a tension, similar to pre-Sandybridge Intel, where we want to
use low-numbered registers (more parallelism on Intel, fewer delay slots
on vc4), but in order to give instruction scheduling the most freedom to
avoid delays we want to round-robin between registers of the same cost.
Our two heuristics so far have chosen one end or the other of that
tradeoff.
The callback, instead, hands the driver the set of registers that are
available, and the driver gets to make its own choice. This will be used
in vc4 to round-robin between registers of the same cost, and might be
used in the future for improving bank selection.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
All the paths looping over adjacency had guards against considering
themselves (the non-obvious one was ra_any_neighbors_conflict(), which has
in_stack set).
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
I was going to indent this code another level, and decided it would be
easier to read as a helper.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Without this, a BlitFramebuffer would mark the whole framebuffer as being
changed (so we emit loads/stores of all of it) rather than just the
modified subset.
I don't know how I managed to leave this here for so long. Found when
working on a 1:1 overlapping blit extension for X11.
Cc: mesa-stable@lists.freedesktop.org
This gets us automatic CL decoding to a floating-point value, and drops a
magic number from the emit code. 250x250 shader runner tests now say they
have a center of 125.0 instead of 2000.
The device doesn't directly support this feature so we implement it with
additional shader code which sets the color output(s) w component to
1.0 (or max_int or max_uint).
Fixes 16 Piglit ext_framebuffer_multisample/*alpha-to-one* tests.
v2: only support unorm/float buffers, not int/uint, per Roland.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
When we forcibly write white to FS outputs (for XOR mode emulation)
we were using a temp register. But that's not really necessary.
This also fixes the case of writing white to multiple color buffers.
Subsequent changes will build on this.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Performance delta on Core i5-4570 + Radeon R9 270:
Overlord: +20% in certain locations
Overlord II: +20% in certain locations
Oil Rush: +12% in most locations
War Thunder: +4-9% in benchmarks
Saints Row 2: +10-35% in certain locations
As Chris commented, it makes more sense to have batch buffer flushes
before the query. Usually applications like frame_retrace do a series
of queries and in that case, with flushes at the end of the queries,
we might still have the first query contained in 2 different batchs.
More generally it would be quite usual to have the query contained in
2 batch buffers because we never now what's the fill rate of the
current batch buffer.
If we move the flushing at the beginning of the queries, it's pretty
much guaranteed that queries will be contained in a single batch
buffer (unless the amount of commands is huge, but then it's only fair
to include reloading request times in the measurements).
Fixes: adafe4b733 ("i965: perf: minimize the chances to spread queries across batchbuffers")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Always initialise whandle.modifier for DRIImage modifier queries, so if
the driver doesn't support it then we return false for the query.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: d33fe8b84e ("st/dri: enable DRIimage modifier queries")
In the DRIImage queryImage hook, check if resource_get_handle() failed
and return FALSE if so.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For textures we must not approximate the calculation with `stride *
height`, or `slice_stride * depth`, as that can easily lead to buffer
overflows, particularly for partial transfers.
This should address the issue that Bruce Cherniak found and diagnosed.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Add uintptr_t cast to fix 'cast to pointer from integer of different size'
warning on 32bit build (build error on Android M).
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Until we support sync fd, don't report the info.
Fixes CTS dEQP-VK.api.external.semaphore.sync_fd.* from crashing.
Fixes: eaa56eab6 (radv: initial support for shared semaphores (v2))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
No need for all that switching when we can just assign a nice little
variable with the number of layers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With commit 5124bf9823, a framebuffer interface hash table is
created in st_gl_api_create(), which is called in
dri_init_screen_helper() for each screen. When the hash table is
overwritten with multiple calls to st_gl_api_create(), it can cause
race condition. This patch fixes the problem by creating a
framebuffer interface hash table per state tracker manager.
Fixes crash with steam.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101876
Fixes: 5124bf9823 ("st/mesa: add destroy_drawable interface")
Tested-by: Christoph Haag <haagch@frickel.club>
Reviewed-by: Brian Paul <brianp@vmware.com>
If the underlying driver does not support modifiers, dmabuf will still
advertise formats through the 'modifier' event, but send them with an
invalid modifier. Ignore them if this is the case, rather than passing
them through to the driver.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Otherwise we'll attemt to generate the header even we don't need to.
In that case the dependencies may not be met, leading to build failure.
Fixes: 166852e "configure.ac: rework wayland-protocols handling"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The extension should be in the list as returned by getExtensions().
Seems to have gone unnoticed since close to nobody wants to change the
vblank mode for the software driver.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We need wl_egl_window to be a versioned struct in order to keep track of
ABI changes.
This change makes the first member of wl_egl_window the version number.
An heuristic in the wayland driver is added so that we don't break
backwards compatibility:
- If the first field (version) is an actual pointer, it is an old
implementation of wl_egl_window, and version points to the wl_surface
proxy.
- Else, the first field is the version number, and we have
wl_egl_window::surface pointing to the wl_surface proxy.
Signed-off-by: Miguel A. Vico <mvicomoya@nvidia.com>
Reviewed-by: James Jones <jajones@nvidia.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
mincore() returns 0 on success, and -1 on failure. The last parameter
is a vector of bytes with one entry for each page queried. mincore
returns page residency information in the first bit of each byte in the
vector.
Residency doesn't actually matter when determining whether a pointer is
dereferenceable, so the output vector can be ignored. What matters is
whether mincore succeeds. See:
http://man7.org/linux/man-pages/man2/mincore.2.html
Signed-off-by: Miguel A. Vico <mvicomoya@nvidia.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
At dist/distcheck time we need to ensure that all the files and their
respective dependencies are handled.
At the moment we'll bail out as the linux-dmabuf rules are guarded in a
conditional. Move them outside of it and drop the sources from
BUILT_SOURCES.
Thus the files will be generated only as needed, which will happen only
after the wayland-protocols dependency is enforced in configure.ac.
v2: add dependency tracking for the header
Cc: Andres Gomez <agomez@igalia.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
This calculates ps_iter_samples from the minSampleShading input
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is an alternate fix for the buffer export dedicated interaction.
Fixes CTS dEQP-VK.api.external.memory.opaque_fd.dedicated.buffer.info
Fixes: b70829708a (radv: Implement VK_KHR_external_memory)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If the layer base was > 0, it wasn't getting passed as the start
instance or getting added in the shaders.
Fixes CTS dEQP-VK.api.image_clearing.core.clear_color_attachment.2d_r8_uint_multiple_layers
Fixes: 7e0382fb (radv: add support for layered clears (v2))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The spec says we should return VK_ERROR_FEATURE_NOT_PRESENT.
Ported from anv.
Fixes CTS test dEQP-VK.api.device_init.create_device_unsupported_features
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we get an fd, we need to close it before returning.
Fixes CTS test dEQP-VK.api.external.memory.opaque_fd.dedicated.device_only.import_multiple_times
Fixes: b70829708a (radv: Implement VK_KHR_external_memory)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The image is set on Memory allocation already, but the image doesn't
have to have the BindImageMemory called yet. Luckily, we know offset
within a BO has to be 0 for dedicated allocations, so we can just
use the dummy 0 in the address calaculations.
Fixes CTS test dEQP-VK.api.external.memory.opaque_fd.dedicated.image.export_bind_import_bind
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: b70829708a "radv: Implement VK_KHR_external_memory"
Reviewed-by: Dave Airlie <airlied@redhat.com>
This just sets them to INVALID COLOR, instead of shifting the
attachments together.
This also fixes a number of cases where we use it first and only
then check if it is VK_ATTACHMENT_UNUSED.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fill the entire array instead of just a quarter. This avoids
crashes with large shaders.
(currently this never causes a problem because shaders larger than 2048/4
instructions are not supported by this driver on any hardware, but it will
cause problems in the future)
Fixes: ec43605189 ("etnaviv: fix shader miscompilation with more than 16 labels")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
We already have a helper for doing this in BLORP, this just moves the
logic into ISL where we can share it with other components.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The set of formats which supports CCS_E is actually fairly small on
gen9. However, everything that supports fast-clears on gen8 also
supports fast-clears on gen9+. The one very annoying exception is
that blending is broken for non-0/1 clear colors with sRGB formats.
In order to solve that problem, we do a resolve to get rid of the
clear color. Another option would be to just not fast-clear with
non-0/1 clear colors however non-0/1 + blending + sRGB is uncommon
enough that this shouldn't be a significant performance problem.
This appears to help gl_manhattan31_off by about 2%.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This makes it much easier to edit the template and doesn't really dirty
the python all that much.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This commit replaces the generic "flags" parameter with a more explicit
aux usage parameter. This leads to a lot of duplicated code at the
moment but this will all get cleaned up directly.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This requires us to start using the partial clear state. It makes
things quite a bit more complicated but it's still a fairly
straightforward exercise in diagram following.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that we have this field, it's much easier to switch on it than to
walk an if ladder that checks different things.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We also simplify the way we handle stencil since we know a priori that
it will have ISL_AUX_USAGE_NONE.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The only real change here is that we now reject clear colors for MCS
with certain formats on gen < 9 because we can't trust that the
reinterpretation will work. This may cause some MCS partial resolves.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Our attempts to do it automatically are problematic at best. In order
to really be precise, we need to know both the desired aux usage and
whether or not clear is supported. The current automatic mechanism
doesn't cover this. This commit itself is not a functional change since
it just reworks everything to be in terms of a silly helper. Later
commits will switch things over to more sensible ways of choosing usage.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We keep the old and possibly broken method of determining aux usage
intact for now. Therefore, the only functional change here is that we
may call finish_render a bit more accurately.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Multisample surfaces only have a single miplevel so there's no reason to
be passing the extra parameters around. It only leads to confusion.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This commit changes layer_range_length to return locical layers and also
changes the way we allocate the aux_state field to not allocate extra
layers for MCS. This will be important as we're about to start doing
significantly more detailed tracking of MCS state.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
intel_miptree_supports_ccs_e should handle the gen >= 9 requirement and
there's no reason why we can't do CCS_E on window system buffers so long
as we resolve.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Nothing created through intel_miptree_create_for_renderbuffer will ever
be exposed externally so there's no need to set FOR_SCANOUT.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
It turns out that if you have rendering in-flight with CCS_E enabled and
you go to do a depth resolve without flushing, the CCS data may never
hit the memory.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Image layouts only let us know that an image *may* be fast-cleared. For
this reason we can end up with redundant resolves. Testing has shown
that such resolves can measurably hurt performance and that predicating
them can avoid the penalty.
v2:
- Introduce additional resolve state management function (Jason Ekstrand).
- Enable easy retrieval of fast clear state fields.
v3: Use more descriptive field enums (Jason)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With an earlier patch from this series, resolves are additionally
performed on layout transitions. Remove the now unnecessary implicit
resolves within render passes.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Expound on comment for the pipe controls (Jason Ekstrand).
v3:
- Cast base_layer to uint64_t to avoid overflow.
- Remove "seems" from the pipe control comment.
- Fix clamp of layer_count (Jason Ekstrand).
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Use the performance warning infrastructure to provide helpful
information when testing applications.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For readability, bring the assignment of CCS closer to the assignment of
NONE and MCS.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The lifespan of the fast-clear data will surpass the render pass scope.
We need CCS_D to be enabled in order to invalidate blocks previously
marked as cleared and to sample cleared data correctly.
v2: Avoid refactoring.
v3: Allow CCS_D for subpass resolves.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The next patch enables the use of CCS_D even when the color attachment
will not be fast-cleared. Catch the gen7 case early to simplify the
changes required.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We'll be performing a GPU memcpy in more places to copy small amounts of
data. Add an alternate function that thrashes less state.
v2:
- Make a new function (Jason Ekstrand).
- Move the #define into the function.
v3:
- Update the function name (Jason).
- Update comments.
v4: Use an indirect drawing register as TEMP_REG (Jason Ekstrand).
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Remove ::first_subpass_layout assertion (Jason Ekstrand).
v3: Allow some fast clears in the GENERAL layout.
v4: Remove extra '||' and adjust line break (Jason Ekstrand).
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Don't pass in the command buffer (Jason Ekstrand).
v3: Remove an incorrect assertion and an if condition for gen7.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will be used to load and store clear values from surface state
objects.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It may technically be possible to enable some sort of fast-clear support
for at least the base slice of a 2D array texture on gen7. However,
it's not documented to work, we've never tried to do it in GL, and we
have no idea what the hardware does if you turn on CCS_D with arrayed
rendering. Let's just play it safe and disallow it for now. If someone
really cares that much about gen7 performance, they can come along and
try to get it working later.
The non-LLC story was a horror show. We uploaded data via pwrite
(drm_intel_bo_subdata), which would stall if the cache BO was in
use (being read) by the GPU. Obviously, we wanted to avoid that.
So, we tried to detect whether the buffer was busy, and if so, we'd
allocate a new BO, map the old one read-only (hopefully not stalling),
copy all shaders compiled since the dawn of time to the new buffer,
upload our new one, toss the old BO, and let the state upload code
know that our program cache BO changed. This was a lot of extra data
copying, and flagging BRW_NEW_PROGRAM_CACHE would also cause a new
STATE_BASE_ADDRESS to be emitted, stalling the entire pipeline.
Not only that, but our rudimentary busy tracking consistented of a flag
set at execbuf time, and not cleared until we threw out the program
cache BO. So, the first shader upload after any drawing would hit this
"abandon the cache and start over" copying path.
This is largely unnecessary - it's just ancient and crufty code. We can
use the same persistent mapping paths on all platforms. On non-ancient
kernels, this will use a write combining map, which should be reasonably
fast.
One aspect that is worse: we do occasionally grow the program cache BO,
and copy the old contents to the newer BO. This will suffer from UC
readback performance now. To mitigate this, we use the MOVNTDQA based
streaming memcpy on platforms with SSE 4.1 (all Gen7+ atoms). Gen4-5
are unfortunately going to be penalized.
v2: Add MOVNTDQA path, rebase on other map flag changes.
v3: Drop cache->bo_used_by_gpu too (caught by Chris Wilson).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Chris Wilson pointed out that this mapping really is persistant.
Shouldn't actually have any effect today, but best to set it anyway.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using a read-only mapping is completely bogus - we use this mapping to
write all new shaders to the cache.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Write-combine mappings give much better performance on writes than
uncached access through the GTT.
Improves performance of GFXBench 4's gl_driver2 benchmark at 1024x768
on Apollolake by 3.6086% +/- 0.674193% (n=15).
v2: (by Ken) Rebase on lockless mappings, map_count deletion, valgrind
updates, potential for CPU/WC maps failing, and other changes.
v3: (by Ken and Chris Wilson)
(Ken): Rebase on set_domain -> gem_wait
(Chris): Fix up a failed CPU/WC mmaping with a GTT mapping
Not all objects will be mappable for direct access by the CPU
(either using WC/CPU or WC paths), for example, a dmabuf wrapping an
object on a foreign device or an object wrapping access to stolen
memory. Since either the physical pages are not known or even do not
exist, we need to use the mediated, indirect access via the GTT. (If
one day, the kernel does suddenly start providing mediated access
via a regular WB/WC mmapping, we no longer need the fallback.)
v4: Avoid falling back for MAP_RAW (Chris).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the buffer is idle, we I915_GEM_WAIT will return immediately,
so we may as well skip the ioctl altogether. We can't trust the
"idle" flag for external buffers, but for most, it should be fine.
Reviewed-by: Matt Turner <mattst88@gmail.com>
With the advent of asynchronous maps, domain tracking doesn't make a
whole lot of sense. Buffers can be in use on both the CPU and GPU at
the same time. In order to avoid blocking, we stopped using set_domain
for asynchronous mappings, which means that the kernel's tracking has
lies. We can't properly track it in userspace either, as the kernel
can change domains on us spontaneously (for example, when un-swapping).
According to Chris Wilson, I915_GEM_SET_DOMAIN does the following:
1. pins the backing storage (acquiring pages outside of the
struct_mutex)
2. waits either for read/write access, including inter-device waits
3. updates the domain, clflushing as required
4. marks the object as used (for swapping)
5. turns off FBC/PSR/fancy scanout caching
Item (1) is not terribly important. Most BOs are recycled via the
BO cache, so they already have pages. Regardless, we fixed this
via an initial set_domain in the previous patch.
We implement item (2) with I915_GEM_WAIT. This has one downside:
we'll stall unnecessarily if we do a read-only mapping of a buffer
that the GPU is reading. I believe this is pretty uncommon. We
may want to extend the wait ioctl at some point.
Mesa already does item (3) itself. For cache-coherent buffers (most on
LLC systems), we don't need to do any clflushing - the CPU and GPU views
are coherent. For non-coherent buffers (most on non-LLC systems), we
currently only use the CPU for read-only maps, and we explicitly clflush
when necessary.
We don't care about item (4)...swapping has already killed performance.
Plus, with async maps, the kernel's domain tracking is already bogus,
so it can't do this accurately regardless.
Item (5) should be okay because we avoid cached maps of scanout buffers.
Reviewed-by: Matt Turner <mattst88@gmail.com>
From the ARB_uniform_buffer_object spec:
""shared" uniform blocks, the default layout, ..."
This doesn't fix anything as the default layout is already applied
at this point but fixes the misleading code/comment.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This was added in 2d03f48a65 and seems like it was intended
as a TODO comment in a function stub rather than a useful
code comment.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
When we validate the texture sample count, pass the correct
pipe_texture_target for the texture, rather than PIPE_TEXTURE_2D.
Also add more comments about MSAA.
No piglit regressions with VMware driver.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The NumSamples and FixedSampleLocation fields are set again later at
the end of the function so these earlier assignments aren't needed.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
imm {128.0, -128.0, 2.0, 3.0} is used for lit instruction which
is not used very frequently. So allocate it only if lit instruction is used.
Tested with mtt piglit and mtt glretrace
v2: As per Charmaine's comment
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This patch fixes the ordering of the constant indices for texcoord scale
factor and texture buffer size to match the order they were added to the
constant buffer in svga_get_extra_constants_common().
Tested with MTT piglit, glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
Sometimes, converting unnormalized coordinates to normalized
coordinates requires an epsilon value to produce the right texels with
nearest filtering. Adding 0.0001 to the coordinates when the min/mag
filter is nearest fixes the issue.
Fixes piglit test fbo-blit-scaled-linear
Tested with mtt-piglit, mtt-glretrace
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This allows us to override contexts to use no_error functionality
even if the applications themselves do not.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We have a very specific row pitch that we want and we don't want ISL to
be changing it on us so just be explicit about it.
Fixes: a40f043034
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
fixes
missrendering in TombRaider
KHR-GL44.gpu_shader5.precise_qualifier
KHR-GL45.gpu_shader5.precise_qualifier
v4: disable opt only for MAD, it's fine for SAD
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
v2: use str_match_no_case to fix _SAT_PRECISE detection
v4: usd is_digit_alpha_underscore to match end of mods
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Only implemented for glsl->tgsi. Other converters just set precise to 0.
v2: remove precise paramter from ureg_tex_insn and ureg_memory_insn
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
all subexpression inside an ir_assignment needs to be tagged as precise.
v2: make precise handling more global inside the visitor
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When I ported from libdrm, I forgot to add the line to reset
the sem, we just need to reset the context.
This fixes a regression in DOOM.
Fixes: 9ac1432a57 ("radv: port to new libdrm API.")
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
With this patch, the st manager will maintain a hash table for
the active framebuffer interface objects. A destroy_drawable interface
is added to allow the state tracker to notify the st manager to remove
the associated framebuffer interface object from the hash table,
so the associated framebuffer and its resources can be deleted
at framebuffers purge time.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101829
Fixes: 147d7fb772 ("st/mesa: add a winsys buffers list in st_context")
Tested-by: Brad King <brad.king@kitware.com>
Tested-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The two generators forked from each other, and they remain basically the
same. This rebases the radv version on the anv version, but with the
radv changes ported over. The result is that we get rid of the "cat |"
madness and gain mako, correct "generated by" attributions, and write
files out directly.
The only differences between the output is whitespace and comments.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
This was only needed for checking gen6 stencil which is already
using isl. One could delete GEN6_HIZ_STENCIL layout altogether
but that will be gone with the rest after a while anyway.
The dim_layout converter is needed even after transition to isl
when setting up surface states - see brw_emit_surface_state().
Hence dropping the unneeded argument separately.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
allowing graceful failure instead of crash on assert later on.
This can be hit, for example, on SNB when trying to allocate
8kx8k CUBE_MAP against isl: x-tiled buffer size becomes
2421161984 exceeding the maximum of 1 << 31 == 2147483648.
Another way to hit this on SNB is with multisampling of over
64-bit formats.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Otherwise init_teximage_fields_ms() (called by
_mesa_init_teximage_fields()) will always assert as it can't
find valid base format.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
There is the same constraintg later on as assert in
isl_gen7_choose_image_alignment_el() so catch it earlier in order
to return error instead of crash.
Needed to avoid crashes with piglits on IVB and HSW:
arb_internalformat_query2.image_format_compatibility_type pname checks
arb_internalformat_query2.all internalformat_<x>_type pname checks
arb_internalformat_query2.max dimensions related pname checks
arb_copy_image.arb_copy_image-formats --samples=2/4/6/8
arb_texture_float.multisample-fast-clear gl_arb_texture_float
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
These formats are already allowed by the i965 GL driver, and the
feature seems to work just fine.
There are tests for multisampled rendering in piglit:
tests/spec/ext_framebuffer_multisample which can be patched to
try 16I/32I in addition to GL_RGBA8I.
IvyBridge passed all tests with all sample numbers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
These formats are already allowed by the i965 GL driver, and the
feature seems to work just fine.
There are tests for multisampled rendering in piglit:
tests/spec/ext_framebuffer_multisample which can be patched to
try GL_RGBA16F/32F/16I/16UI/32I/32UI in addition to GL_RGBA/8I.
IvyBridge passed all tests with all sample numbers and even
with 128-bit formats.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We had some caller using LLVMAddInstrAttributes, which couldn't be
converted to lp_add_function_attr, because attributes were only handled
for functions in this case, so fix this.
For llvm >= 4.0, this already works correctly.
(radeonsi seems to avoid setting call site attributes prior to llvm 4.0,
the patch then citing it doesn't work when calling intrinsics. But at
least for calling external functions we always used that, albeit only
for actual call attributes, not call parameter attributes, though some
quick test shows llvm seems to handle that as well. The attribute index
is sort of iffy though, since attribute 0 of the call is the actual function,
attribute 1 corresponds to the first parameter of the called function.)
(Verified with GALLIVM_DEBUG=dumpbc plus llvm-dis that the correct
attributes are shown for calls, both for llvm 4.0 and 3.3.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We can also use storage images internally for resolves, which don't
require TRANSFER_DST usage on the image, so currently we may not create
the needed descriptors.
Just create these descriptors unconditionally.
Fixes: 0e1886efb9 ("radv: Fix descriptors for cube images with VK_IMAGE_USAGE_STORAGE_BIT")
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Linux-specific gettid() syscall shouldn't be used in portable code.
Fix does assume a 1:1 thread:LWP architecture, but works for our
current target platforms and can be revisited later if needed.
Fixes unresolved symbol in linux scons builds.
v2: add comment in code about the 1:1 assumption.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This adds support for sharing semaphores using kernel syncobjects.
Syncobj backed semaphores are used for any semaphore which is
created with external flags, and when a semaphore is imported,
otherwise we use the current non-kernel semaphores.
Temporary imports from syncobj fd are also available, these
just override the current user until the next wait, when the
temp syncobj is dropped.
v2: allocate more chunks upfront, fix off by one after
previous refactor of syncobj setup, remove unnecessary null
check.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds syncobj create/destroy/export/import paths into
the winsys interface.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Remove the following duplicates from the formats table:
- R8G8B8A8_UNORM (V_,_T)
- R8G8B8X8_UNORM (_T,_T)
- DXT3_RGBA (_T,_T)
Only the first has an effect because the _T overrides the V_ initializer,
the latter two were harmless duplications of the same.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Add support for ETC2 compressed textures in the etnaviv driver.
One step closer towards GL ES 3 support.
For now, treat SRGB and RGB formats the same. It looks like these are
distinguished using a different bit in sampler state, and not part of
the format, but I have not yet been able to confirm this for sure.
(Only enabled on GC3000+ for now, as the GC2000 ETC2 decoder
implementation is buggy and we don't work around that)
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
It's incorrect to use $(LOCAL_PATH) in makefile recipes since it's
changing. The typical way to handle it is to use private variable.
Fortunately in this case we can just simplify them to $^.
See further:
https://patchwork.freedesktop.org/patch/167718/
Also simplify LOCAL_GENERATED_SOURCES.
Fixes: 2dd4e2ec (spirv: Generate spirv_info.c)
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Two of the ARB_shader_ballot piglit tests hit the find_lsb case,
removing some of the noise allowed me to better debug the test when it
was failing.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Some hardware, like i965, doesn't support group sizes greater than 32.
In that case, we can reduce the destination size of the ballot
intrinsic, which will simplify our code generation.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The implementation of ballotARB() will start by zeroing the flags
register. So, a doing something like
if (gl_SubGroupInvocationARB % 2u == 0u) {
... = ballotARB(true);
[...]
} else {
... = ballotARB(true);
[...]
}
(like fs-ballot-if-else.shader_test does) would generate identical MOVs
to the same destination (the flag register!), and we definitely do not
want to pull that out of the control flow.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The implementations of the ARB_shader_ballot intrinsics will explicitly
read the flag as a source register.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We already had a channel_num system value, which I'm renaming to
subgroup_invocation to match the rest of the new system values.
Note that while ballotARB(true) will return zeros in the high 32-bits on
systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB
variables do not consider whether channels are enabled. See issue (1) of
ARB_shader_ballot.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The implementations of the ARB_shader_group_vote intrinsics will
explicitly write the flag as the destination register.
Reviewed-by: Matt Turner <mattst88@gmail.com>
I don't expect anyone is going to care about using this in vec4 programs
(vertex/tessellation/geometry on Gen6/7), no one has come up with a good
way to implement it much less test it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Specifically, constant fold intrinsics from ARB_shader_group_vote, but I
suspect it'll be useful for other things in the future.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are intrinsics rather than opcodes, because they operate across
channels.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Within i965, we have many different objects and confusingly when
submitting an execbuf we have lists of both our internal objects and a
list of the kernel's drm_i915_gem_exec_object with very similar names.
Rename the kernel's validation list to avoid the collison as it is only
used for interfacing with the kernel and so a peripheral use of
"object".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit b7153c3e9f.
The point of that commit was to ensure intel_prepare_render() occurred
before color resolves on the current framebuffer. In 0673bbfd9b
(i965: Move surface resolves back to draw/dispatch time), Jason moved
brw_predraw_resolve_framebuffer back to draw time, which is already
after a intel_prepare_render() call. So, this is no longer necessary.
Furthermore, it caused problems. "mpv" would only display a small
corner of movies, and Android started failing camera CTS tests.
This is because intel_prepare_render() ended up handling DRI2 events
which caused the drawable to be resized at an inopportune time, flagging
ctx->NewState |= _NEW_BUFFERS, but at a point where we've already copied
ctx->NewState, and failed to notice the newly set flag.
The lack of _NEW_BUFFERS caused us to skip 3DSTATE_DRAWING_RECTANGLE,
so the drawing ended up being clipped to an outdated framebuffer size.
Just drop the hack and go back to handling this at the proper time.
Thanks to Matti Hämäläinen (ccr), Tomasz Figa (tfiga), and Tapani Palli
for reporting these issues.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101558
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101704
Tested-by: Tapani Pälli <tapani.palli@intel.com>
No need to check if ID is not 0 because _mesa_HashFindFreeKeyBlock()
can't generate this value.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
No need to check if ID is not 0 because _mesa_lookup_vao()
already prevents this to happen.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
v2 (Jason):
- s/separate_stencil_surface/make_separate_stencil_surface/
- drop the check for separate stencil when wrapping an
existing buffer object with miptree. This is dead code as
the first needs_separate_stencil() checks is
MIPTREE_LAYOUT_FOR_BO-flag and says no.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Depth buffers are always Y-tiled. In brw_miptree_choose_tiling()
driver opts to use linear buffers for small and 1D but this does
not apply for depth - GL_DEPTH_COMPONENT and GL_DEPTH_STENCIL_EXT
are considered first.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This will significantly reduce chrun when switching remaaining
surface types to isl. After the full transition it will be easier
to calculate on-demand and drop the helper member in miptree.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This makes intel_mipmap_tree::pitch and isl_surf::row_pitch
semantically equivalent.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
v2 (Jason):
- Don't trigger miptree re-creation in vain later on with ISL
based. Core GL uses zero to indicate single sampled while
ISL uses one - this would cause intel_miptree_match_image()
to always fail.
- Now that native miptree is already using sample number of
one, there is no need for MAX2() when converting to ISL.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Patch moves "assert(brw->num_samples <= 16)" from
emit_3dstate_multisample2() to upload_multisample_state(). Latter
is the only caller of the former and passes "brw->num_samples"
as argument. Therefore it is clearer to assert in the caller.
Possible bug fix in genX(emit_3dstate_multisample2) which
doesn't have a case for num_samples == 0 in the switch
statement.
It should be noted that intel_miptree_map()/unmap() now checks
additionally for "mt->surf.samples == 1" in order to support gen6
stencil which is already transitioned to ISL. This will go away in
next patch when native miptrees start to use isl_surf::samples as
well.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
If we have a compat profile context, it means that GL_QUADS[_STRIP] are
supported so this query makes sense. It's also legal for 3.2 core profile
because of a spec bug.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This bumps the libdrm requirement for amdgpu to the 2.4.82.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just introduces a central semaphore info struct, and passes it around,
and introduces some wrappers that will make porting off libdrm_amdgpu easier.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Not built by default. Currently only builds with icc.
v2:
* document knl,skx possibilities for swr_archs
* merge with changed loader lib selection code
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Allow configuration of the SWR architecture depend libraries
we build for with --with-swr-archs. Maintains current behavior
by defaulting to avx,avx2.
Scons changes made to make it still build and work, but
without the changes for configuring which architectures.
v2:
* add missing comma for swr_archs default
* check that at least one architecture is enabled
* modify loader logic to make it clearer how to add archs
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
cpuid.7 requires cx=0 to select the extended feature leaf.
avx512 detection was using the non-indexed cpuid resulting
in random non-detection of avx512.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
> checking for WAYLAND... no
>
> configure: error: Package requirements (wayland-client >= 1.11 wayland-server >= 1.11 wayland-protocols >= 1.8) were not met:
>
> No package 'wayland-protocols' found
>
> Consider adjusting the PKG_CONFIG_PATH environment variable if you
> installed software in a non-standard prefix.
>
> Alternatively, you may set the environment variables WAYLAND_CFLAGS
> and WAYLAND_LIBS to avoid the need to call pkg-config.
> See the pkg-config man page for more details.
Also, added extra path to PKG_CONFIG_PATH env variable.
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We incorrectly detected VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT. We looked
for the bit in VkImageCreateInfo::usage, but it's actually in
VkImageCreateInfo::flags.
Found by assertion failures while enabling VK_ANDROID_native_buffer.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The LD_LIBRARY_PATH environment variable could be already defined so
we extend it and restore it rather than just overwriting it.
v2:
- Unset the __old_ld helper variable when we are done with it.
- Corrected test for and escaping of variables (Eric).
v3: Remove unneeded variable (Emil).
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The "Perform basic testing" and "Use the release.sh script from xorg
util-modular" sections provide some instructions to do so. We add now
some comments in order to use a recent enough LLVM version to run
dist/distcheck and the automake generated binaries.
v2: Suggested the need to define LLVM_CONFIG also before running the
release.sh script.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Trailing space after the backslash meant the rest of the AM_CFLAGS lines
were no longer included.
This has been silently ignored because of the next line starting with
a `-` dash, instructing make to be silent about that line.
Fixes: 02cc359372 "egl/wayland: Use linux-dmabuf interface for buffers"
Cc: Daniel Stone <daniels@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Simply advertise all supported modifiers, independent of the format.
Special formats, like compressed, which don't support all those modifiers
are already culled from the dmabuf format list, as we don't support
the render target binding for them.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
This allows to create buffers with a specific tiling layout, which is primarily
used by GBM to allocate the EGL back buffers with the correct tiling/modifier
for use with the scanout engines.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
This allows the state trackers to know the tiling layout of the
resource and pass this through the various userspace protocols.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
This implements resource import with modifier, deriving the correct
internal layout from the modifier and constructing a render compatible
base resource if needed.
This removes the special cases for DDX and renderonly scanout allocated
buffers, as the linear modifier is enough to trigger correct handling
of those buffers.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
This reworks the logic in etna_update_sampler_source to select the
newest resource view for updating the texture view. This should make
the logic easier to follow and fixes texture updates from imported
dma-bufs.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
If we import a dma-buf with a sampler/pixel pipe incompatible modifier,
the imported buffer will end up in an external resource view. As
resource_changed signals the change of the imported resource, we need
to update the external view seqno, instead of the base resource seqno.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This fixes failures to import the scanout buffer with screen resolutions
that don't satisfy the RS alignment restrictions, like 1680x1050.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
The minimum RS alignment calculation is needed in various places.
Extract a helper to avoid open-coding the calcuation at every site.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The current way of importing the resource from renderonly after allocation
is opaque and is taking away control from the driver, which it needs in
order to implement more advanced scenarios than the simple linear scanout
with matching stride alignments.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
The following changes need the modifier definitions for the Vivante tiled
formats, which are shipped with libdrm 2.4.82.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Commit 463b7d0332c5("gallium: Enable ARM NEON CPU detection.")
introduced CPU feature detection based Android cpufeatures library.
Unfortunately it also added an assumption that if PIPE_OS_ANDROID is
defined, the library is also available, which is not true for the
standalone build without using Android build system.
Fix it by defining HAS_ANDROID_CPUFEATURES in Android.mk and replacing
respective #ifdefs to use it instead.
v2:
- Add a comment explaining why the separate flag is needed (Emil).
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Earlier commit refactored/split the parsing into separate hunks.
While no functional change was intended, it did not attribute that
different error is set when the attrib. value is incorrect.
Fixes: 3ee2be4113 ("egl: split _eglParseImageAttribList into per
extension functions")
Cc: Michel Dänzer <michel@daenzer.net>
Reported-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The extension should be present (if applicable) in the list returned by
getExtensions(). AFAICT no loader has ever looked for it in
__driDriverExtensions/__driDriverGetExtensions.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
The extension should be in the list as returned by getExtensions().
Seems to have gone unnoticed since close to nobody wants to change the
vblank mode for the software driver.
v2: Rebase
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
The option is only queried from the loader, which has access to the
dri common code in src/mesa/drivers/dri/common/.
One could grant the loader access to brw_config_options but even
then, having the same option in both places is not a good idea.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
It removes unused buffer_count variable from dri2_egl_surface.
And it polishes the assert of dri2_drm_get_buffers_with_format().
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This is a tiny housekeeping patch which does the following:
* Limit lines to 78 or fewer characters.
According to the mesa coding style guidelines.
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Because the color_buffers have a each unique bo, if the designated buffer is
found, release_buffer() can go out the loop which seaches the buffer.
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Adding linux-dmabuf Wayland protocol files as generated did the right
thing, by prepending $(MKDIR_GEN) so autotools didn't try to write into
a build directory which didn't yet exist.
Unfortunately MKDIR_GEN needs to be defined in every Makefile it's used
in (which we do now), or alternately defined and substituted in
configure.ac (which we don't do), and src/egl/ didn't actually have it
from either method. As unset variables expand to nothing, it was
silently being skipped.
Copy & paste the defintion to make sure drivers/dri2/ exists before we
try to generate files into it.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Nick Sarnie <commendsarnex@gmail.com>
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Fixes: 02cc359372 ("egl/wayland: Use linux-dmabuf interface for buffers")
The previous implementation of CLAMP() allowed NaN to pass through
unscathed, by failing both comparisons. NaN isn't exactly a value
between MIN and MAX, which can break the assumptions of many callers.
This patch changes CLAMP to convert NaN to MIN, arbitrarily. Callers
that need NaN to be handled in a specific manner should probably open
code something, or use a macro specifically designed to do that.
Section 2.3.4.1 of the OpenGL 4.5 spec says:
"Any representable floating-point value is legal as input to a GL
command that requires floating-point data. The result of providing a
value that is not a floating-point number to such a command is
unspecified, but must not lead to GL interruption or termination.
In IEEE arithmetic, for example, providing a negative zero or a
denormalized number to a GL command yields predictable results,
while providing a NaN or an infinity yields unspecified results."
While CLAMP may apply to more than just GL inputs, it seems reasonable
to follow those rules, and allow MIN as an "unspecified result".
This prevents assertion failures in i965 when running the games
"XCOM: Enemy Unknown" and "XCOM: Enemy Within", which call
glTexEnv(GL_TEXTURE_FILTER_CONTROL_EXT, GL_TEXTURE_LOD_BIAS_EXT,
-nan(0x7ffff3));
presumably unintentionally. i965 clamps the LOD bias to be in range,
and asserts that it's in the proper range when converting to fixed
point. NaN is not, so it crashed. We'd like to at least avoid that.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If the source is an indirect register, there is ralloc'd data. Copying
with a direct assignment will copy the pointer, but the data will still
belong to the old instruction's memory context. Since we're lowering
and throwing away instructions, that could free the data by mistake.
Instead, use nir_src_copy, which properly handles this.
This is admittedly not a common case, so I think the bug is real,
but unlikely to be hit.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
While it produces functioning code the pass creates worse code
for arrays of arrays. See the comment added in this patch for more
detail.
V2: skip splitting of AoA of matrices too.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This can happen if, for instance, you have an array of structs and there
are both direct and wildcard references to the same struct and some
members only have direct or only have indirect.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Cc: mesa-stable@lists.freedesktop.org
This guarantees that the value written in the batch matches the
value recorded in the relocation entry.
(Chris Wilson wrote an identical patch as well.)
The code doesn't get exactly a lot simpler but at least it is in a single
place, and we delete more than we add.
Another good point is that you get rid of struct brw_wm_unit_state
which was a third mechanism for encoding GEN state. We used to have
GENXML, manual packing and these bitfield structs. Now we're down to
just GENXML and some manual packing. (Khristian)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add the code into its own function and atom, since almost nothing is
shared with GEN >= 6.
v2: Split GEN <=5 and GEN >= 6 into separate functions (Ken).
v3: Minor tidying by Ken.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When available, use the zwp_linux_dambuf_v1 interface to create buffers,
which allows multiple planes and buffer modifiers to be used.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Now create_wl_buffer is generic enough, we can use it for the
EGL_WL_create_wayland_buffer_from_image extension.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Remove surface-specific code from create_wl_buffer, so it's now just a
generic translation from DRIimage to wl_buffer.
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This was only used in create_dumb() to blacklist planar formats.
However, the start of the function already whitelists ARGB8888 (cursor)
and XRGB8888 (scanout), and nothing else. So this entire function can be
removed.
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Luckily no-one really used the is_format_supported() call, because it
only supported three formats.
Also, since buffers with alpha can be displayed on planes, stop banning
them from use.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Wayland buffers coming from wl_drm use the WL_DRM_FORMAT_* enums, which
are identical to GBM_FORMAT_*. Similarly, FD imports do not need to
convert between GBM and DRI FourCC, since they are (almost) completely
compatible.
This widens the formats accepted by gbm_bo_import() when importing
wl_buffers; previously, only XRGB8888, ARGB8888, RGB565 and YUYV were
supported.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Current logic calls intel_renderbuffer_set_draw_offset() which in
turn tries to calculate x and y offset against layer/level settings
that are against the original miptree actually having sufficient
levels/layers. This returns correctly x=0 y=0 regardless of the given
layer/level only because one calls intel_miptree_get_image_offset()
which goes and consults miptree offset table which in turn luckily
contains entries for max-mipmap levels, all initialised to zero even
in case of non-mipmapped.
This patch stops consulting the table and simply sets the draw
offsets to zero that are compatible with the single slice miptree
backing the renderbuffer.
This prepares for ISL based miptrees that calculate offsets
on-demand and do not tolerate levels beyond what the miptree has.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This will falsely trigger an assert on number of layers once
isl is used for 3D layouts of Gen4 cube maps.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that image surface vertical slice calculator doesn't depend
on total_height, total dimensions are only needed when new buffer
objects are created. Therefore one can safely ignore them when
miptrees are created for already exisiting buffer objects.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This helps to drop dependency to miptree::total_height which is
used in brw_miptree_get_vertical_slice_pitch().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Once the driver moves to ISL both compressed and uncompressed have
the same type. One needs to tell them apart by other means. This
can be done by checking the existence of mcs_buf.
There is a short period of time within intel_miptree_create()
where mcs_buf doesn't exist yet (between calls to
intel_miptree_create_layout() and intel_miptree_alloc_mcs()).
First compute_msaa_layout() makes the decision if compression is
to be used and sets the msaa_layout type. Then based on the type
one sets aux_usage and finally decides if mcs_buf is needed.
This patch duplicates the logic in compute_msaa_layout() and uses
that to make the decision on aux_usage and mcs_buf allocation.
Most of the original logic in compute_msaa_layout() will be gone
in later patch leaving only one version.
Elsewhere only brw_populate_sampler_prog_key_data() needs to know
if compression is used based on the msaa_type. This is now
replaced with consideration for number of samples and existence
of mcs_buf. All other occurrences consider CMS || UMS which can
be represented using single the type of ISL_MSAA_LAYOUT_ARRAY
without any tweaks.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
same as irb::layer_count. In case of copies and blits msaa
surfacas already fall to blorp which natively works with logical
slices.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Checking against zero currently works as single sampling is
represented with zero. Once one moves to isl single sampling
really has sample number of one.
This keeps later patches simpler.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We don't support the general version yet because that requires us to
lower shared variables up-front in SPIR-V -> NIR. This shouldn't be a
whole lot of work but it's not something we support today.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Now that vtn_type has piles of unions, we should assert sanity before
setting fields that may stomp others.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The old table based spirv_*_to_string functions would return NULL for
any values "inside" the table that didn't have entries. The tables also
needed to be updated by hand each time a new spirv.h was imported.
Generate the file instead.
v2: Make this script work more like src/mesa/main/format_fallback.py.
Suggested by Jason. Remove SCons supports. Suggested by Jason and
Emil. Put all the build work in Makefile.nir.am in lieu of adding a new
Makefile.spirv.am. Suggested by Emil. Add support for Android builds
based on code provided by Emil.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This query is not allowed in GL core profile 3.3 and later (since
GL_QUADS and GL_QUAD_STRIP are disallowed). The query was (mistakenly)
supported in GL 3.2. This fixes the glGet error test accordingly.
Reviewed-by: Neha Bhende<bhenden@vmware.com>
This looks like a regression from df30123794 ("radv: use
ac_compute_surface"). Before that, the opt4Space addrlib flag was set
to true unless the image has FMASK (ac_compute_surface will similarly
only set that flag for images without FMASK).
This saves multiple gigabytes of VRAM on one of our games, and brings
its VRAM utilisation on RADV in line with AMDGPU-PRO and NVIDIA.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
"...and stay dead!"
Rafael deleted this file in c2b5a26dc2
(i965: Convert SF_STATE to genxml.) but Marek accidentally brought it
back in commit e7a091936f (mesa: replace
ctx->Polygon._FrontBit with a helper function) when resolving conflicts.
It's not actually even compiled, but it's still here trolling people
into thinking it still exists and needs patching.
Translate the NIR variables directly to LLVM instead of lowering to a
TGSI-style giant array of vec4's and then back to a variable. This
should fix indirect dereferences, make shared variables more tightly
packed, and make LLVM's alias analysis more precise. This should fix an
upcoming Feral title, which has a compute shader that was failing to
compile because the extra padding made us run out of LDS space.
v2: Combine the previous two patches into one, only use this for shared
variables for now until LLVM becomes smarter.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Alex Smith <asmith@feralinteractive.com>
Otherwise, if a client gave us a list of modifiers that contained a
modifier we understand but which is not supported on the hardware, we
might return that one and then fail to create the image.
Reviewed-by: Daniel Stone <daniels@collabora.com>
This commit splits the mapping in half. The modifier_infos table now
only contains the modifier and the since_gen field. The tiling bits
have been moved into a table in tiling_to_modifier as that's the only
place it was ever used. The modifier_is_supported function now takes a
devinfo and does the since_gen check.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Now that we have an actual aux_usage field, we no longer need the
complex logic of is_lossless_compressed in order to figure out if a
miptree is CCS_E compressed. As a side-effect, there is not longer any
need to overload MSAA_LAYOUT_CMS for CCS_E and we can stop doing so.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
HiZ, like MCS and CCS_E, can compress more than just clear colors so we
want it turned on whenever the miptree is being used as a depth
attachment. It's theoretically possible for someone to create a depth
texture, upload data with glTexSubImage2D, and texture from it without
ever binding it as a depth target. If this happens, we would end up
wasting a bit of space by allocating a HiZ surface we never use.
However, this is rather unlikely out side of test cases, so we're better
off just allocating it up-front.
Reviewed-by: Chad Versace <chadversary@chromium.org>
We need this split for the same reason that we need the split for CCS:
intel_miptree_supports_hiz is called *before* we choose the actual
tiling. Adding a tiling_supports_hiz helper lets choose_aux_usage
more accurately decide whether or not to enable hiz. In particular,
this prevents us from enabling HiZ on linear depth buffers.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fixes piglit test crash when context creation fails.
v2: As suggested by Brian, move the init to st_create_context_priv()
Reviewed-by: Brian Paul <brianp@vmware.com>
Enable the capability if the DRM supports it.
Hook up mechanism to send and receive fence FD from the DRM.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Connect fence_get_fd, fence_create_fd, and fence_server_sync.
Implement the required functions in vmw_fence module.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Connect fence_get_fd, fence_create_fd, and fence_server_sync.
Return PIPE_CAP_NATIVE_FENCE_FD capability based on what the
winsys reports
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The new interfaces will be used to enable
EGL_ANDROID_native_fence_sync.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The timeout parameter is required to implement
EGL_ANDROID_native_fence_sync.
v2
* Replaced default timeout from 0 to PIPE_TIMEOUT_INFINITE
* Add more documentation to the new timeout parameter
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
From the Vulkan 1.0.53 spec VU for vkCreateImageView:
"image must have been created with a usage value containing at least
one of VK_IMAGE_USAGE_SAMPLED_BIT, VK_IMAGE_USAGE_STORAGE_BIT,
VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT,
VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT, or
VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT"
We were missing VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT from out list.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
If the queue is full, util_queue_add_job will wait while bo_fence_lock is
held.
It pb_slab wants to reuse a buffer, it will lock the pb_slab mutex and
try to check BO fence busyness, but it has to wait for bo_fence_lock to get
released. Both bo_fence_lock and pb_slab mutex are locked now.
When the CS thread unreferences and releases a suballocated buffer,
it will try to lock the pb_slab mutex and has to wait. The CS thread
can't finish its job in order to free a queue slot and unblock
util_queue_add_job ==> deadlock.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Consider the following situation:
mtx_lock(mutex);
do_something();
util_queue_add_job(...);
mtx_unlock(mutex);
If the queue is full, util_queue_add_job will wait for a free slot.
If the job which is currently being executed tries to lock the mutex,
it will be stuck forever, because util_queue_add_job is stuck.
The deadlock can be trivially resolved by increasing the queue size
(reallocating the queue) in util_queue_add_job if the queue is full.
Then util_queue_add_job becomes wait-free.
radeonsi will use it.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
During bring-up, this is often 0. Prevent automatic disablement of
ARB_timer_query and demotion of the OpenGL version to 3.2 by setting
a non-zero frequency. Print an error message instead.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
For inputs and outputs, indirect indexing is lowered by the GLSL compiler.
For temporaries, use alloca and disable the "promote-alloca" pass.
In the future, we could switch all codepaths to alloca permanently and
just rely on the "promote-alloca" pass.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Mesa here requires the scaling lists in diagonal scan order, but
VAAPI passes them in raster scan order. Therefore, rearrange the
elements when copying.
v2: Move scan tables to vl_zscan.c.
Fix type in size assertion.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
One can override the deviceID, by setting the INTEL_DEVID_OVERRIDE
variable. A few symbolic names or a numerical value for the actual
device ID is accepted.
At the same time we're using strtod (string to double) to convert the
string to a decimal numeral. A seeming thinko, made by the original
commit that introduces the code in libdrm_intel and got here with the
import.
Fixes: 514db96c11 ("i965: Import libdrm_intel.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There seems to be a rounding difference with F2I vs nearest filtering.
The precise problem in the rounding is unknown.
This fixes an incorrect output with OpenMAX encoding.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Otherwise, ImmutableLevels is 0, which is an illegal value. Later,
_mesa_meta_setup_sampler will use _mesa_texture_parameteriv to set
texObj->MaxLevel = CLAMP(params[0], texObj->BaseLevel,
texObj->ImmutableLevels - 1);
which turns into a completely bogus CLAMP(value, 0, -1)...where the
upper bound is smaller than the lower bound. This ends up being -1
today due to the way CLAMP is implemented, which is a bogus MaxLevel.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Grigori recently added EGL_KHR_create_context_no_error support,
which causes EGL to pass a new __DRI_CTX_FLAG_NO_ERROR flag to
drivers when requesting an appropriate context mode.
driContextSetFlags() will already handle it properly for us, but the
classic drivers all have code to explicitly balk at unknown flags. We
need to let it through or they'll fail to create a no_error context.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
When using DCC some clear values don't require a cmask eliminate
step. This patch adds support for black and black with alpha 1,
there are other values, but I don't have access to a comprehensive list.
This works by setting the cmask eliminate predicate when doing the
fast clear, and later when doing the cmask elimination making sure
the draws are predicated.
This increases the fps on Sascha Willems deferred.
Tonga: 580fps->670fps on a Tonga PRO card.
Polaris 730->850fps
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We can only fast clear 128-bit images if the r/g/b channels
are the same, and we are using DCC.
For DCC we'll bail out on translate if this isn't true,
and we catch cmask clears explicitly.
v2: remove 64-bit block (Bas), add uint32 as well.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch uses addrlib to workout the tile swizzles according
to the surface index. It seems to produce the same values as
amdgpu-pro for the deferred test.
v2: don't apply swizzle to CMASK. the eg docs don't mention
it, and we clearly don't align cmask for that.
v3: disable surf index for dedicated images, as these will
most likely be shared, and I don't think the metadata has
space for this info in it yet.
v4: update for shareable images, rename combined_swizzle
to tile_swizzle
This gets the deferred demo from 730->950fps on my rx480.
(dcc cmask elim predication patches get it further)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some of the Sascha Willems demos pick a D32/S8 format for the depth
buffer, then do a LOAD_OP_CLEAR/LOAD_OP_DONT_CARE on it, which means
we don't get to merge the undefined->depth and clear htile transitions.
This add the stencil aspect to the pending clears if there is a depth
clear pending and the stencil aspect is don't care.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
To not confuse apps in thinking it might be faster.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
NV isn't valid for external images anymore.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Fixes: 6ddc64b93e "radv: Add support for VK_KHR_dedicated_allocation."
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
This effectively reverts commit 43a171878bb4b5aedb36a. Technically,
VK_KHR_get_memory_requirements2 and VK_KHR_dedicated_allocation are
required for the KHR version but this at least restores the removed
functionality. This patch builds but has received zero testing.
Acked-by: Dave Airlie <airlied@redhat.com>
Fished the SparseImage call out of the headers as the spec missed
the definition.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Dave Airlie <airlied@redhat.com>
We always recommend sub-allocation and don't do anything special for
dedicated allocations.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
There is one small ANV change here because we used the
VK_ERROR_INVALID_EXTERNAL_HANDLE_KHX enum in the BO cache and that had
to be updated to have the _KHR suffix.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
These have been formally deprecated by Khronos never to be shipped
again. The KHR versions should be implemented/used instead.
Acked-by: Dave Airlie <airlied@redhat.com>
These have been formally deprecated by Khronos never to be shipped
again. The KHR versions should be implemented/used instead.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
These have been formally deprecated by Khronos never to be shipped
again. The KHR versions should be implemented/used instead.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This accidentally set __DRI_CTX_FLAG_NO_ERROR whenever any flags were
present. Just needs extra parenthesis.
Fixes: 4909519a66 (egl: Add EGL_KHR_create_context_no_error support)
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Fixes performance regression from f50aa21456 - was forcing internal
code generation to target AVX (no gather, etc).
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This only adds the EGL side, needs to be plumbed into Mesa frontend.
v2: Add check for extension availability.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add a new context flag and plumb it through the various layers of the
context creation code to set up dispatch tables for the no-error mode.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This basic extension allows usage of the __DRI_CTX_FLAG_NO_ERROR flag.
This includes support code for classic Mesa drivers to switch on the
no-error mode if the flag is set.
v2: Move to common DRI code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add async marshalling/unmarshalling for all glClearBuffer variants.
These entry points are commonly used in general and Alien Isolation
specifically uses glClearBufferiv. Slightly reduces the number of
thread synchronizations with glthread in that game.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Extract clear buffer helper functions in preparation for adding
marshal/unmarshal functions for the various glClearBuffer variants.
v2: Fix command size.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The fps graph for example calculates the fps as double with small
variations based on when query_new_value() is called, which causes
many values to be truncated on the cast to uint64_t.
The HUD internally stores the values as double, so just use double
everywhere instead of fixing this with rounding. Using doubles also
allows the hud to show small variations instead of being clamped to
discrete values.
v2: Don't print decimals in the dump file when not necessary
Signed-off-by: Christoph Haag <haagch+mesadev@frickel.club>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This reverts commit d8b2ccdb88, which causes priglit regressions on GPUs
with SNORM support. We'll have another try at enabling this feature after
the 17.2 branchpoint.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
A dangling bo object would result in memory corruption while loading a
level in ioquake3_opengl2.
Fixes: 330d0607ed (gallium: remove pipe_index_buffer and set_index_buffer)
Suggested-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
GC3000 has a new LOG instruction, similar to the new SIN and COS instructions.
Generate the new instruction sequence when appropriate; there are
two occasions, as part of LIT and the generator for the LG2
instruction itself.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
If we blit from a rendertarget or a depthstencil buffer there might still
be dirty data in the TS buffer which needs to be flushed out.
Fixes missing shadow tiles in glmark2 shadow.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Before resolving a rendertarget or a depth/stencil resource into a
texture, flush both the color cache and the depth cache together.
It is unclear whether this is necessary for the following stall to
work properly, or whether the depth flush just adds enough time
for the color cache flush to finish before the resolver is started,
but this change removes artifacts that otherwise appear if a texture
is sampled directly after rendering into it.
The test case is a simple QML scene graph with a QtWebEngine based
WebView rendered on top of a blue background:
import QtQuick 2.0
import QtQuick.Window 2.2
import QtWebView 1.1
Window {
Rectangle {
id: background
anchors.fill: parent
color: "blue"
}
WebView {
id: webView
anchors.fill: parent
}
Component.onCompleted: {
webView.url = "<some animated website>"
}
}
If the website is animated, the WebView renders the site contents into
texture tiles and immediately afterwards samples from them to draw the
tiles into the Qt renderbuffer. Without this patch, a small irregular
triangle in the lower right of each browser tile appears solid blue, as
if the texture sampler samples zeroes instead of the website contents,
and the previously rendered blue Rectangle shows through.
Other attempts such as adding a pipeline stall before the color flush or
a TS cache flush afterwards or flushing multiple times, with stalls
before and after each flush, have shown no effect.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Apparently this can happen. Just bail out early in that case, as all the called
functions return NULL in that case.
Fixes weston-terminal for me.
Fixes: 147d7fb772 ("st/mesa: add a winsys buffers list in st_context")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Use a slightly more explicit version cap for binding wl_drm, so we can
add other interfaces with different versioning schemes later.
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
u_vector.h doesn't actually use anything from u_math, but it does mean
everyone has to pull in src/gallium/auxiliary/util includes.
Just remove it, adding a <string.h> include to u_vector.c to cover
memcpy.
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
In the previous commit, forgot to apply v2 suggestions.
Fixes: 28d0c38 (anv/pipeline: use unsigned long long constant to check
enable vertex inputs)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
When initializing the ANV pipeline, one of the tasks is checking which
vertex inputs are enabled. This is done by checking if the enabled bits
in inputs_read.
But the mask to use is computed doing `(1 << (VERT_ATTRIB_GENERIC0 +
desc->location))`. The problem here is that if location is 15 or
greater, the sum is 32 or greater. But C is handling 1 as a 32-bit
integer, which means the displaced bit is out of range and thus the full
value is 0.
Thus, use 1ull, which is an unsigned long long value.
This fixes:
dEQP-VK.pipeline.vertex_input.max_attributes.16_attributes.binding_one_to_one.interleaved
v2: use 1ull instead of BITFIELD64_BIT() (Matt Turner)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
This actually takes advantage of the newly pushed UBO data, avoiding
pull loads.
Improves performance in GLBenchmark Manhattan 3.1 by:
HSW: ~1%, BDW/SKL/KBL GT2: 3-4%, SKL GT4: 7-8%, APL: 4-5%.
(thanks to Eero Tamminen for these numbers)
shader-db results on Skylake, ignoring programs with spill/fill changes:
total instructions in shared programs: 13963994 -> 13651893 (-2.24%)
instructions in affected programs: 4250328 -> 3938227 (-7.34%)
helped: 28527
HURT: 0
total cycles in shared programs: 179808608 -> 172535170 (-4.05%)
cycles in affected programs: 79720410 -> 72446972 (-9.12%)
helped: 26951
HURT: 1248
LOST: 46
GAINED: 21
Many "Deus Ex: Mankind Divided" shaders which already spilled end up
spill a lot more (about 240 programs hurt, 9 helped). The cycle
estimator suggests this is still overall a win (-0.23% in cycle counts)
presumably because we trade pull loads for fills.
v2: Drop "PULL" environment variable left in for initial debugging
(caught by Matt).
Reviewed-by: Matt Turner <mattst88@gmail.com>
With UBOs, the answer of "have we decided to push this uniform" gets
a bit more complicated - for one, we have multiple surfaces. This
patch refactors things so we can add the new code in a single place.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch starts uploading UBO data via 3DSTATE_CONSTANT_* packets,
and updates the compiler to know that there's extra payload data, so
things continue working. However, it still issues pull loads for all
data. I wanted to separate the two aspects for greater bisectability.
v2: Update for new intel_bufferobj_buffer parameter.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is an annoyingly big hammer, but it seems less mean than disabling
UBO pushing, and I'm not sure what else to do.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously we would re-upload the constant data to the batchbuffer,
then re-emit the packets. We only need to do the last step (causing
the existing data in the batchbuffer to be re-uploaded to the push
constant staging area in the L3).
Now that we've separated the two, it's pretty easy to accomplish.
Reviewed-by: Matt Turner <mattst88@gmail.com>
I hope to upload UBO via 3DSTATE_CONSTANT_XS packets, in addition to
normal uniforms. In order to do that, I'll need to re-emit the packets
when UBOs change. But I don't want to re-copy the regular uniform data
to the batchbuffer every time.
This patch separates out the data uploading from the packet submission.
We're running low on dirty bits, so I made the new atom happen on every
draw call, and added a flag to stage_state indicating that we want the
packet for that stage emitted.
I would have preferred to do this outside the atom system, but it has
to happen between the uploading of push constant data and the binding
table upload.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Right now, we always upload new push constant data, and immediately
emit 3DSTATE_CONSTANT_* packets. We call intel_upload_space and store
the resulting BO pointer in brw->curbe.curbe_bo. We read that when
emitting the packets. This works today, but is fragile - it depends on
upload and packet emission being interleaved.
If we instead were to upload all the data, then emit all the packets,
then upload BO wrapping will get us into trouble. For example, the VS
constants may land in one upload BO, but the FS constants may not fit
and land in a second upload BO. Uploading FS constants would overwrite
the brw->curbe.curbe_bo pointer, so when we emitted 3DSTATE_CONSTANT_VS,
we'd get the wrong BO.
I intend to separate out this code in a future commit, so I need to fix
this. To fix it, we simply store a per-stage BO pointer.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This adds a NIR pass that decides which portions of UBOS we should
upload as push constants, rather than pull constants.
v2: Switch to uint16_t for the UBO block number, because we may
have a lot of them in Vulkan (suggested by Jason). Add more
comments about bitfield trickery (requested by Matt).
v3: Skip vec4 stages for now...I haven't finished wiring up support
in the vec4 backend, and so pushing the data but not using it
will just be wasteful.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Soon, we're going to start providing UBO data to shaders as push
constants, rather than requiring them to issue pull loads. The
3DSTATE_CONSTANT_* commands require 32 byte aligned pointers.
So, we need to increase this from 16 to 32.
Reviewed-by: Matt Turner <mattst88@gmail.com>
By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic
state base address. This makes it unusable for pushing UBOs. I'd like
to be able to use all four push buffers.
There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake)
which controls whether buffer 0 is relative to dynamic state base
address, or simply a normal pointer. Setting that gives us full
flexibility.
We can't currently write this on Haswell and earlier, and will need
to update the kernel command parser, and then do the whole version
checking song and dance.
Reviewed-by: Matt Turner <mattst88@gmail.com>
When writing a region of a buffer via glBufferSubData(), we can write
the data asynchronously if the destination doesn't contain any data.
Even if it's busy, the data was undefined, so the new data is fine too.
Removes all stall avoidance blits on BufferSubData calls in
"Total War: WARHAMMER" on my Skylake GT4.
Decreases the number of stall avoidance blits in Manhattan 3.1:
- Skylake GT4: -18.3544% +/- 6.76483% (n=13)
- Apollolake: -12.1095% +/- 5.24458% (n=13)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This doesn't do anything yet, but soon we'll want to know whether an
access to a buffer section may write that data, or simply reads it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Merge the code with gen6+ 3DSTATE_GS, and delete brw_gs_state.c,
together with brw_gs_unit_state.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since we always call brw_batch_emit anyways, we can hopefully make things
simpler by calling it only once, and then branching inside its body. This
can be helpful when bringing the gen4-5 code into this function.
Additionally, check for GEN_GEN == 6 instead of < 7 in cases that won't apply
to lower gens.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This function only emits a particular case of 3DSTATE_GS. Instead, we can do
that inside genX(upload_gs_state), and later reuse part of that code for
emitting gen4-5 state.
There's the additional benefit of allowing us to remove gen6_gs_state.c, which
was only left because of this function.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use set_blend_entry_bits and set_depth_stencil_bits to fill most of the
color calc struct, and then manually update the rest.
v2:
- Always check for depth_irb (Ken)
- Always set Backface Stencil Ref (Ken)
- Always set alpha reference value (Ken)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gen6+ uses _mesa_base_format_has_channel() to check for the alpha
channel, while gen4-5 use ctx->DrawBuffer->Visual.alphaBits. By using
_mesa_base_format_has_channel() here we keep the same behavior accross
all gen.
While initially both ways of checking the alpha channel seemed correct
to me, this change also seems to fix fbo-blending-formats piglit test on
gen4.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add a helper function to reuse code that fills blend entry related
state, and make genX(upload_blend_state) use it. This function can later
be used by gen4-5 color calc state to set the blend related bits.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen4-5 basically glue DEPTH_STENCIL_STATE, COLOR_CALC_STATE, and
BLEND_STATE together into a single COLOR_CALC_STATE structure.
By making a helper function, we'll be able to reuse it when filling
out Gen4-5 COLOR_CALC_STATE without replicating any actual logic.
We use generation-defined typedef to handle the polymorphism.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
If we allow the size to be more than 2^32, then we should compute it
in 64bit arithmetic otherwise we might run into overflow issues.
CID: 1412892, 1412891
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The compact flag doesn't make sense on local variables, since the
packing on them is up to the driver. This fixes nir_validate assertions
in some cases, particularly when lower_io_to_temporaries is used on
per-vertex inputs/outputs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
While normally we give variables whose name field is NULL a temporary
name when called from nir_print_shader(), when we were calling from
nir_print_instr() we never bothered, meaning that we just segfaulted
when trying to print out instructions with such a variable. Since
nir_print_instr() is meant to be called while debugging, we don't need
to bother too much about giving a consistent name, but we don't want to
crash in the middle of debugging.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's a bit rare, but blorp can trigger a urb reconfiguration. When
that happens, we need to re-upload the URB config. Previoulsy blorp
would set BRW_NEW_URB_SIZE, but this is a pretty big hammer as it
would cause back-to-black blorp operations to reconfigure both times.
Using BRW_NEW_BLORP is a small, more accurate hammer.
v2 (idr): Sort BRW_NEW_ tokens to match brw_recalculate_urb_fence and
gen6_urb.
v3 (idr): Don't whack BRW_NEW_URB_SIZE in blorp. Suggested by Jason.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add support for 32-bit RGBX/RGBA formats which are required for Android.
The original patch (commit ccdcf91104) was reverted (commit
c0c6ca40a2) in mesa as it broke GLX resulting in swapped colors. Based
on further investigation by Chad Versace, moving the RGBX/RGBA configs
to the end is enough to prevent breaking GLX.
The handling of RGBA/RGBX in dri_fill_st_visual is a fix from Marek
Olšák.
Cc: Eric Anholt <eric@anholt.net>
Cc: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Previous check-ins without testing with USE_SIMD16_FRONTEND have
introduced regressions. This fixes the build, not the regressions.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Core will ensure hot tiles are loaded for read and write render targets,
and will skip all output merger for read-only render targets.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Forwarding from the ES prolog to the ES just barely exceeds the current
maximum array size when 16 vertex attributes are used. Give it a decent
bump to account for merged shaders having up to 32 user SGPRs.
Fixes a crash in GL45-CTS.multi_bind.draw_bind_vertex_buffers.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If the application hasn't done any drawing since the last call, we
would reuse the same back buffer which was used for the previous swap,
which may not have completed yet. This could result in various issues
such as tearing or application hangs.
In the normal case, the behaviour is unchanged.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97957
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101683
Cc: mesa-stable@lists.freedesktop.org
[Michel Dänzer: Make Thomas' fix from bugzilla actually work as
intended, write commit log]
Any form of CCS on gen9+ only works on Y-tiled images. The only caller
of create_for_bo which uses Y-tiled BOs is create_for_dri_image.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
We want to start using create_for_dri_image for all miptrees created
from __DRIimage, including those which come from a window system. In
order to allow for fast clears to still work on window system buffers,
we need to allow for creating aux surfaces.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
The __DRI_FORMAT enums are all UNORM but we will frequently want sRGB
when creating miptrees for renderbuffers. This lets us specify.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Due to the wonders of autogeneration, this new version covers a few
formats that the old version was missing:
MESA_FORMAT_SRGB8_ALPHA8_ASTC_3x3x3
MESA_FORMAT_SRGB8_ALPHA8_ASTC_4x3x3
MESA_FORMAT_SRGB8_ALPHA8_ASTC_4x4x3
MESA_FORMAT_SRGB8_ALPHA8_ASTC_4x4x4
MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x4x4
MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5x4
MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5x5
MESA_FORMAT_SRGB8_ALPHA8_ASTC_6x5x5
MESA_FORMAT_SRGB8_ALPHA8_ASTC_6x6x5
MESA_FORMAT_SRGB8_ALPHA8_ASTC_6x6x6
Reviewed-by: Chad Versace <chadversary@chromium.org>
Later commits require intel_update_image_buffer() to have control over
the miptree creation. However, intel_update_winsys_renderbuffer_miptree()
currently creates it based on the given buffer object. This patch moves
the creation to the caller side.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
There is nothing particularly useful to do currently if the update
fails, but there is no point carrying on either. As a result, this has a
behavior change.
v2: Make the return type a bool (Topi)
v3: Don't leak the bo if update_winsys_renderbuffer fails. (Jason)
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This does make a tiny functional change in that we now also test for
whether or not the format supports texturing and not just rendering.
However, this should have no practical effect as all renderbuffers use
texturable formats.
Reviewed-by: Chad Versace <chadversary@chromium.org>
This is what we do in intel_image_target_renderbuffer_storage and it
makes more sense than stomping them. Because the image gets created as
a 2D image with one miplevel, they should already be equal to the
provided width/height. Adding the tile offset makes some sense
depending on how you interpret the fields.
The only place these fields are used for in state setup is to set up the
image parameters we pass into shaders. There may be issues here if you
try to use image_load_store on something pulled in from EGL but that's
probably broken already. This just makes it consistently broken.
Reviewed-by: Chad Versace <chadversary@chromium.org>
This is mostly a direct port. The only bit of refactoring that was done
was to make creating a planar miptree be an early return from the
non-planar case. Alternatively, we could have three functions: two
helpers and a main function to just call the right helper. Making the
planar case an early return seemed cleaner.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We were using the "cp" union fields, which are only valid for compute
shaders. The threads calculation affects the available GPRs, so just
pick a small number for other shader types to avoid limiting available
registers.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
The comments are correct - we get -1 and 0. However by adding 1, we
convert this into 0,1. This mostly works for conditionals, but when
negated, this will yield the wrong result. Instead just negate the
values (as they are backwards -- -1 means back instead of front).
Fixes tests/shaders/glsl-fs-frontfacing-not.shader_test and
dEQP-GLES3.functional.shaders.builtin_variable.frontfacing on A530.
The latter also tested on A306 by Rob Clark.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
If a cube image has VK_IMAGE_USAGE_STORAGE_BIT set, the type in an image
view's descriptor was set to a 2D array (and a few other fields adjusted
accordingly). This is correct when the image view is actually bound as a
storage image, but not when bound as a sampled image. In that case the
type should be set as a cube.
Fix by generating 2 sets of descriptors at view creation time for both
storage and non-storage usage, and then choose between them based on
descriptor type when writing descriptor sets.
v2: Generate storage descriptors for images with TRANSFER_DST, since
those may be used as storage images internally.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This free was left in after dynamic descriptors were changed to not be
allocated separately from the descriptor set, and can cause a crash.
Fixes: 39644fa40a ("radv: Don't allocate dynamic descriptors separately")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If size of client memory copy is too large, don't copy. The draw will
access user-buffer directly and then block. This is faster and more
efficient than queuing many large client draws.
Applications that still use large client arrays benefit from this. VMD
is an example.
The threshold for this path defaults to 32KB. This value can be
overridden by setting environment variable SWR_CLIENT_COPY_LIMIT.
v2: Use #define for default value, rather than hard-coded constant.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Moved reading of environment config options out of
swr_create_screen_internal, into a separate swr_validate_env_options.
This is to keep from cluttering create_screen.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Removed the hard-coded constant in favor of a #define. Also removed
TODO comment. The constant value doesn't need an environment
configurable option.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Since commit 7f80a9ff13 ("vc4: Introduce XML-based packet header
generation like Intel's."), the vc4 build on Android is broken:
out/target/product/linaro_x86_64/gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h:12:10: fatal error: 'v3d_packet_helpers.h' file not found
external/mesa3d/src/gallium/drivers/vc4/vc4_cl_dump.c:28:10: fatal error: 'vc4_packet.h' file not found
The path of the generated header needs to be fixed since we build out of
tree.
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Rob Herring <robh@kernel.org>
Commit a5e733c6b5 fixes the dangling
framebuffer object by unreferencing the window system draw/read buffers
when context is released. However this can prematurely destroy the
resources associated with these window system buffers. The problem is
reproducible with Turbine Demo running with VMware driver. In this case,
the depth buffer content was lost when the context is rebound to a
drawable.
To prevent premature destroy of the resources associated with
window system buffers, this patch maintains a list of these buffers in
the context, making sure the reference counts of these buffers will not
reach zero until the associated framebuffer interface objects no
longer exist. This also helps to avoid unnecessary destruction and
re-construction of the resources associated with the framebuffer.
Fixes VMware bug 1909807.
Reviewed-by: Brian Paul <brianp@vmware.com>
The locking was supposed to go away in commit 314647c4c2
(i965: Drop global bufmgr lock from brw_bo_map_* functions.), but
this lone unlock remains.
I'm guessing I messed this up when splitting up Chris's patch.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
X11 and GL compositor performance on VC4 has been terrible because of our
SHARED-usage buffers all being forced to linear. This swaps SHARED &&
!LINEAR buffers over to being tiled.
This is an expected win for all GL compositors during rendering (a full
copy of each shared texture per draw call), allows X11 to be used with
decent performance without a GL compositor, and improves X11 windowed
swapbuffers performance as well. It also halves the memory usage of
shared buffers that get textured from. The only cost should be idle
systems with a scanout-only buffer that isn't flagged as LINEAR, in which
case the memory bandwidth cost of scanout goes up ~25%.
This implements the EGL_EXT_image_dma_buf_import_modifiers extension,
supporting the VC4 T_TILED modifier.
v2: Added modifier support to resource creation/import, and
advertisement (by daniels).
v3: Fix old-kernel fallback path, fix compiler error and warnings, and
comment touchups (by anholt).
Reviewed-by: Daniel Stone <daniels@collabora.com>
Rather than open-coding populating the first slice inside resource
import, use vc4_setup_slices to do it for us.
v2: Rebase on VC4_DEBUG=surf change
Reviewed-by: Daniel Stone <daniels@collabora.com>
Needing to get our uapi header from libdrm has only complicated things.
Follow intel's lead and drop our requirement for it.
Generated from the same commit mentioned in the README.
v2: Update Android.mk as well, move vc4_drm.h reference for distcheck.
Reviewed-by: Daniel Stone <daniels@collabora.com>
I want to remove vc4's dependency on headers from libdrm as well, but
storing multiple copies of drm_fourcc.h in our tree would be silly.
v2: Update Android.mk as well, move distcheck drm*.h references to
top-level noinst_HEADERS.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Daniel Stone <daniels@collabora.com> (v1)
Reviewed-by: Rob Herring <robh@kernel.org>
This fixes 32-bit builds of the driver. Commit 08413a81b9
changed things so that we now put struct anv_states in the u_vector for
binding tables. On 64-bit builds, sizeof(struct anv_state) is a power
of two but it isn't on 32-bit builds.
Fixes: 08413a81b9
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
With ealier commit we relaxed the requirement from C++14 to C++11.
Update the build script so that it
Cc: Tim Rowley <timothy.o.rowley@intel.com
Fixes: 0b80b02502 ("swr: relax c++ requirement from c++14 to c++11")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
DRI_IMAGE's createImageFromTexture is used to implement the extension,
so we should check for it prior to advertising.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Drop the (duplicate) top-level check in dri2_create_image_khr() and add
the respective checks in dri2_create_image_khr_{texture,renderbuffer}
v2: use unreachable instead of assert in dri2_create_image_khr_texture
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
If the respective extension is not supported, one should return
EGL_BAD_PARAMETER as mentioned in earlier commits.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Although not listed amongst the initial EGL_LINUX_DRM_FOURCC_EXT and
friends list, the spec reads
... Required attributes and their values are as
follows:
* EGL_WIDTH & EGL_HEIGHT: The logical dimensions of the buffer in pixels
* EGL_LINUX_DRM_FOURCC_EXT: The pixel format of the buffer, as specified
by drm_fourcc.h and used as the pixel_format parameter of the
drm_mode_fb_cmd2 ioctl.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Will allow us to simplify existing code and make further improvements
short and simple.
No functional change intended.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
As per EGL_KHR_image_base:
If an attribute specified in <attrib_list> is not one of the
attributes listed in Table bbb, the error EGL_BAD_PARAMETER is
generated.
We should set the error as opposed to simply log it.
Currently we have a partial solution, whereby only some of the callers
call _eglError().
Since that has proven to be less robust, simply set the error by the
function itself and change the return type to EGLBoolean, updating the
callers.
So now the code is slightly simpler. Plus the follow-up fixes will be
easier to manage.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Don't bother allocating any memory until we're finished parsing and
sanitising all the attributes.
As a nice side effect we now consistently set eglError when any of
the attrib/values are not correct.
Strangely enough the spec does not mention _anything_ about what error
should be set where, even if the implementation already sets the odd
one.
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Commit bfe1e7737a changed how texture swizzles are set up.
This exposed a latent bug in the VMware driver: we were ignoring
the texture instruction's writemask when applying the 0 and 1
swizzle terms.
This wasn't caught by the Piglit texture swizzle test because it
only exercises fixed function (no write masking).
Fixes issues seen with ETQW apitrace.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Valgrind doesn't actually implement VALGRIND_FREELIKE_BLOCK as the
exact inverse of VALGRIND_MALLOCLIKE_BLOCK. It makes the block
inaccessible, but still leaves it defined in its allocation tracker i.e.
it will report the mmap as lost despite the call to FREELIKE!
Instead of treating the mmap as an allocation, treat it as changing the
access bits upon the memory, i.e. that it becomes defined (because of
the buffer objects always contain valid content from the user's
perspective) upon mmap and inaccessible upon munmap. This makes memcheck
happy without leaving it thinking there is a very large leak.
Finally for consistency, we treat all the mmap/munmap paths the same
even though valgrind can intercept the regular mmap used for GTT. We
could move this in the drm_mmap/drm_munmap macros, but that quickly
looks ugly given the desire for those to support different OSes, but I
didn't try that hard!
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When using a read-only CPU mapping, we may encounter stale buffer
contents. For example, the Piglit primitive-restart test offers the
following scenario:
1. Read data via a CPU map.
2. Destroy that buffer.
3. Create a new buffer - obtaining the same one via the BO cache.
4. Call BufferSubData, which does a GTT map with MAP_WRITE | MAP_ASYNC.
(We avoid set_domain for async mappings, so no flushing occurs.)
5. Read data via a CPU map.
(Without explicit clflushing, this will contain data from step 1!)
Otherwise, everything ought to work, keeping in mind that we never use
CPU maps for writing - just read-only CPU maps.
This restores the performance gains after Matt's revert in commit
71651b3139.
v2: Do the invalidate later, and even when asking for a brand new map.
v3: Add more comments from Chris.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Just map the buffer and memcpy. This will do a CPU mmap, which should
be reasonably efficient, and doing this gives us full control over the
domains and caching instead of leaving it to the kernel.
This prevents regressions on Braswell in the next commit. Specifically
GL45-CTS.shader_atomic_counters.basic-buffer-operations. Because async
maps start skipping set-domain, the pread thought everything was nicely
still in the CPU domain, and returned stale data.
v2: Use _mesa_error_no_memory() if the map fails instead of crashing.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
swr used to build and link the rasterizer to the driver, and to support
multiple architectures we needed to have multiple versions of the
driver/rasterizer combination, which needed to link in much of mesa.
Changing to having one instance of the driver and just building
architecture specific versions of the rasterizer gives a large reduction
in disk space.
libGL.so 6464 Kb -> 7000 Kb
libswrAVX.so 10068 Kb -> 5432 Kb
libswrAVX2.so 9828 Kb -> 5200 Kb
Total 26360 Kb -> 17632 Kb
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Use the SWR rasterizer API through the table returned from
SwrGetInterface rather than referencing the functions directly.
This will allow us to move to a model of having the driver dynamically
load the appropriate swr architecture library.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
We could have used a single integer to store that value, but
Cannonlake has different number of subslices per slice depending on
the GT.
v2: Add CFL subslice numbers (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reduces IOCTL calls by 1, and provides a centralized place to override
such configurations if we have a need to do so.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
From KHR_fence_sync:
When the condition of the sync object is satisfied by the fence
command, the sync is signaled by the associated client API context,
causing any eglClientWaitSyncKHR commands (see below) blocking on
<sync> to unblock. The only condition currently supported is
EGL_SYNC_PRIOR_COMMANDS_COMPLETE_KHR, which is satisfied by
completion of the fence command corresponding to the sync object,
and all preceding commands in the associated client API context's
command stream. The sync object will not be signaled until all
effects from these commands on the client API's internal and
framebuffer state are fully realized. No other state is affected by
execution of the fence command.
If clients are passing the fence fd (from EGL_ANDROID_native_fence_sync)
to a compositor, that fence must only be signaled once the framebuffer
is resolved and not before as is currently the case.
v2: fixup assert to use GL_SYNC_GPU_COMMANDS_COMPLETE (Chad)
Reported-by: Sergi Granell <xerpi.g.12@gmail.com>
Fixes: c636284ee8 ("i965/sync: Implement DRI2_Fence extension")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sergi Granell <xerpi.g.12@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Chad Versace <chadversary@chromium.org>
Cc: Daniel Stone <daniels@collabora.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Using CPU maps of non-coherent buffers can get us in a lot of trouble,
and WC maps are a reasonable alternative anyway. Guard against shooting
ourselves in the foot by adding an assert, and comment.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If the user triggers an implicit batch flush while holding access to a
CPU mapped buffer, that mmapping will be invalidated by the kernel for
non-LLC devices. (The kernel when executing a batch will change the
cache domain of the buffers in that batch, which for non-LLC CPU access
will cause that buffer to be clflushed and any further CPU access to be
discarded.) To prevent this, simply disallow any CPU async mmap access.
The cases where async CPU access to a non-LLC buffer should continue to
be allowed via their preferred snooping path.
v2 (Ken): Reword the comment slightly.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If the buffer is being shared with an external client, our own state
tracking may be stale and in some cases we may wish to double check with
the kernel/hw state. At the moment, this is synonymous with not being
reusable, but the semantics between reusable and external are quite
different and we will have more examples of non-reusable buffers in the
near future.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I want to use these in the OpenGL driver as well.
v2: Add to COMMON_FILES in Makefile.sources (caught by Emil)
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We were hitting the
unreachable("Invalid image opcode")
near the end of vtn_handle_image when parsing the
SpvOpAtomicCompareExchange opcode.
v2: Add stable CC.
v3: Ignore SpvOpAtomicCompareExchangeWeak. It requires the Kernel
capability which is not exposed in Vulkan, and spirv_to_nir is not used
for OpenCL which does support it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
CC: <mesa-stable@lists.freedesktop.org>
Currently, we use set_domain() to cause a stall on rendering. But the
set-domain ioctl has the side-effect of changing the kernel's cache
domain underneath the struct_mutex, which may perturb state if there was
no rendering to wait upon and in general is much heavier than the
lockless wait-ioctl. Historically libdrm used set-domain as we did not
have an explicit wait-ioctl (and the patches to teach it to use wait if
available were lost in the mists). Since mesa already depends upon a
kernel support the wait-ioctl, we do not need to supply a fallback.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This query is supposed to return the max texture buffer size/width in
texels, not size in bytes. Divide by 16 (the largest format size) to
return texels.
Fixes Piglit arb_texture_buffer_object-max-size test.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by :Charmaine Lee <charmainel@vmware.com>
This fixes a regression in some piglit tests since commit 5e5d5f1a2e.
I think I mis-resolved the merge conflict when cherry-picking that
commit to master.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The reason we were doing this was to ensure that the kernel did the
appropriate cross-ring synchronization and flushing. However, the
kernel only looks at EXEC_OBJECT_WRITE to determine whether or not to
insert a fence. It only cares about the domain for determining whether
or not it needs to clflush the BO before using it for scanout but the
domain automatically gets set to RENDER internally by the kernel if
EXEC_OBJECT_WRITE is set.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The register values depend on the currently set program, so make sure to
revalidate when the program changes.
Fixes glsl-1.10-fragdepth as well as
dEQP-GLES3.functional.shaders.fragdepth.compare.*
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Since radv uses compute rings and we can't know when we are setting
up the shaders what ring they are to be used on, we should just use
the default xnack setting. This may be suboptimal in some places,
but if we hit a problem, we likely should try and address this
between llvm and mesa.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Rather than using 64k, use what addrlib returns as the base
alignment for vulkan allocations.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes a bunch of gl_BackColor interpolation tests that had explicit
interpolation specified on the fragment shader gl_Color.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Better to just point at the bcolor_entry struct which has our current
understanding encoded into it. Also add an assert to ensure that the
struct remains the expected size.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Figured out the clear value when we have a combined depth stencil
surface.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
It could only handle indices 0/1, otherwise what happened was bad (accessing
array out of bounds, no crash but kind of random). This is enough for the gl
state tracker (primary/secondary color) but not enough for some other state
trackers (d3d9 has no limits on the number of color interpolants).
The complexity with color semantics are all due to the front/back mapping (2
outputs in the vs map to one input in the fs) so this isn't extended to
indices > 1 - d3d9 has no use for back colors, therefore this isn't needed and
still only 2 back colors can be handled correctly.
Reviewed-by: Brian Paul <brianp@vmware.com>
By design pixel shaders can have up to 3 variants:
* The standard one.
* glDrawPixels variant.
* glBitmap variant.
However "shader_has_one_variant" ignores this fact, and therefore
st_update_fp would select the wrong variant if glDrawPixels or glBitmap
was ever called.
This patch fixes the problem. If the standard variant has been created,
calling glDrawPixels or glBitmap will append the variant to the second
entry of the linked list, so that st_update_fp still selects the right
one if shader_has_one_variant is set.
If the standard variant hasn't been created yet and glDrawPixel/Bitmap
has been called, st_update_fp will will see this and take the slow path
instead. The standard variant will then be added at the front of the
linked list, so that the next time the fast path is taken.
Blender in particular is hit by this bug.
v2: Marek - cosmetic changes
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=101596
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This reverts commit 8aaa13467d, which was
based on an incorrect assumption. Unlike the restriction placed on image
views in the Vulkan API, OpenGL allows you to render to texture views
whose formats differ from the originals.
Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=101677
If we try to build a display list with just a glPrimitiveRestartNV()
call, we'd crash because of a null GLvertexformat::PrimitiveRestartNV
pointer. This change fixes that case.
The previous patch fixed the case of calling glPrimitiveRestartNV()
inside a glBegin/End pair.
v2: minor clean-up in save_PrimitiveRestartNV(), per Charmaine.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
glPrimitiveRestartNV crashes when it is called during the compilation
of a display list.
There are two reasons:
- ctx->Driver.CurrentSavePrimitive is not set to the current primitive
- save_PrimitiveRestartNV() calls _save_Begin() which only sets an
OpenGL error, instead of calling vbo_save_NotifyBegin().
This patch correctly calls vbo_save_NotifyBegin() but it detects
the current primitive mode by looking at the latest saved primitive.
Additional work by Brian Paul
Signed-off-by: Olivier Lauffenburger <o.lauffenburger@topsolid.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101464
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This function always returned GL_TRUE. Just make it a void function.
Remove unreachable code following the call to vbo_save_NotifyBegin()
in save_Begin() in dlist.c
There were some stale comments that no longer applied since an earlier
code refactoring.
No Piglit regressions.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This fixes two regressions on HWv8:
Piglit gl-1.0-ortho-pos
Piglit/glean fbo
This was caused by commit c2b92dada0 "svga: clamp device line width
to at least 1 to fix HWv8 line stippling"
This also fixes two conform tests: Vertex Order and Polygon Face
No Piglit/conform changes with HWv9 or later.
VMware bug 1905053
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Despite being a member of the etna_screen struct, 'refcnt' is used by
the winsys-specific logic to track the reference count of the object
managed in a hash table. When the count reaches zero, the pipe screen
is removed from the table and destroyed.
Fix the logic by initializing the refcnt to 1 when screen created.
This initialization is done in etna_screen_create(), to follow the
same logic as in freedreno and virgl.
Fixes: c9e8b49b88 ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
The VA is stored at [4:5], not [0:1]. This invalidated all
texture buffer descriptors when they were made resident in
the current context.
This removes few partial flushes and cache invalidations which
are needed when updating a bindless descriptor on the fly with
a WRITE_DATA packet.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
wglUseFontBitmaps is currently a noop.
This patch implements this function for Windows.
Misc code clean-ups by Brian.
Signed-off-by: Olivier Lauffenburger <o.lauffenburger@topsolid.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Current selection of pixel format does not enforce the request of
stencil or depth buffer if the color depth is not the same as
requested.
For instance, GLUT requests a 32-bit color buffer with an 8-bit
stencil buffer, but because color buffers are only 24-bit, no
priority is given to creating a stencil buffer.
This patch gives more priority to the creation of requested buffers
and less priority to the difference in bit depth.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101703
Signed-off-by: Olivier Lauffenburger <o.lauffenburger@topsolid.com>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The NIR parameters are ordered "compare, data", matching GLSL, but both
the image and buffer LLVM intrinsics take them the other way around.
This is already handled correctly for SSBO atomics.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
The Piglit arb_clear_texture-error test creates a texture with only
a 1x1 image at level=1, then tries to clear level 0 (nonexistent)
and level 1 (exists). The test only checks that the former generates
an error but the later doesn't. The test passes, but when we try
to clear the level=1 image we're passing an invalid level to
pipe_context::clear_texture(). level=1, but since there's only one
mipmap level in the texture, it should be zero.
This fixes the code to search the gallium texture resource for the
correct mipmap level. Also, add an assertion to make sure we're not
passing an invalid level to pipe_context::clear_texture().
Fixes device errors with VMware driver. No Piglit regressions.
v2: don't do the level search when using immutable textures.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
For depth/stencil formats the surface layer allocates the
stencil separately, so we don't need to include it in the
bpe.
This reduces the side of d32s8 allocates to something closer to pro.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just modifies the API to make it easier to add other flags
to target machine creation.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
There was no reason for this script to live outside the scripts
directory.
Suggested-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cacheline alignment of SWR_STATS to prevent sharing of cachelines
between threads (performance).
Gets rid of gcc-7.1 warning about using c++17's over-aligned new
feature.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This doesn't get used yet, it just adds support to various PKT3
emissions to enable it later.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
_mesa_glsl_has_builtin_function is used to determine whether any variant
of a builtin are available, for the purpose of enforcing the GLSL ES
3.00+ rule that overloads or overrides of builtins are disallowed.
However the builtin_builder contains information on all builtins,
irrespective of parse state, or versions, or extension enablement. As a
result we would say that a builtin existed even if it was not actually
available.
To resolve this, first check if at least one signature is available for
a builtin before returning true.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101666
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we have proper pointer types, we can be more sensible about the
way we set up function arguments and deal with the two cases of pointer
vs. SSA parameters distinctly.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We're going to want the full vtn_type available to us anyway at which
point glsl_type isn't really buying us anything.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This adds a vtn concept of base_type as well as a couple of other
fields. This lets us be a tiny bit more efficient in some cases but,
more importantly, it will eventually let us express things the GLSL type
system can't.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Now that we have a pointer wrapper class, we can create offsets for UBOs
and SSBOs up-front instead of waiting until we have the full access
chain. For push constants, we still use the old mechanism because it
provides us with some nice range information.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Everyone now calls it with stop_at_matrix = false. Since we're now
always walking all the way to the end of the access chain, the type
returned is just the same as ptr->type;
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Instead of handling all of the complexity at the end, we choose to
decorate types a bit more cleverly. When we have a row-major matrix
type, we give it the stride of a single vector and give it's array
element type (which represents a column) the actual matrix stride.
Previously, we were using stop_at_matrix and handling everything from
matrix on down as special cases but now we walk the access chain all the
way to the end and then load. Even though this looks like it may lead
to a significant functional change, it doesn't. The reason why we
needed to do stop_at_matrix before was to handle row-major properly
since the offsets and strides would be all out-of-order. Now that row
major matrix types have the small stride on the matrix and the large
stride on the vector, offsetting to a single column of a row-major
matrix works fine. The load/store code simply picks up on the fact that
the stride isn't the type size and does multiple loads. The generated
code from these methods should be the same.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The vtn_pointer structure provides a bit better abstraction than passing
access chains around directly. For one thing, if the pointer just
points to a variable, we don't need the access chain at all. Also,
pointers know what their dereferenced type is so we can avoid passing
the type in a bunch of places. Finally, pointers can, in theory, be
extended to the case where you don't actually know what variable is
being referenced.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We're about to add a vtn_pointer data structure and this will prevent
some rename churn in the next commit.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We were originally handling them together because I was rather unclear
on the distinction. However, keeping them combined keeps the confusion.
Split them up so that it's more clear from the code how we expect the
two storage classes to be used.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This is effectively a revert of 388f02729b
though much code has been added since. Kristian initially moved it to
try and avoid locking problems with meta-based resolves. Now that meta
is gone from the resolve path (for good this time, we hope), we can move
it back. The problem with having it in intel_update_state was that the
UpdateState hook gets called by core mesa directly and all sorts of
things will cause a UpdateState to get called which may trigger resolves
at inopportune times. In particular, it gets called by _mesa_Clear and,
if we have a HiZ buffer in the INVALID_AUX state, causes a HiZ resolve
right before the clear which is pointless. By moving it back to
try_draw_prims time, we know it will only get called right before a draw
which is where we want it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
According to Nicolai the SX can already start work when all
the position exports are done, so do those first.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
We have some cases where changing between depth and stencil only aspect
was causing hangs.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
If dri2_setup_extensions() fails, the "err" variable would not be assigned
causing the error path to access an unitialized variable. Fix it by
assigning an error message.
Fixes: 2c341f2bda ("egl: refactor dri2_create_screen() into three separate functions")
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Some reshuffle in the Makefiles under src/intel resulted in Android
libraries being no longer linked with code using
src/intel/common/gen_debug.h that contains references to functions
exported by those libraries (namely ALOGW macro, which is currently
resolved into a call to __android_log_print() from cutils).
Fix the build by taking into account ANDROID_CFLAGS and ANDROID_LIBS for
affected module on Android NDK builds.
Fixes: d5b355ce5f ("i965: Move intel_debug.h to intel/common/gen_debug.h")
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Current post install command relies on GALLIUM_TARGET_DRIVERS variable,
however variable needs to be initialized in src/gallium/Android.mk
in order that all enabled gallium drivers symlinks are correctly generated.
At the moment due to sorting of INC_DIRS and variable set with svga (vmwgfx)
only vmwgfx_dri.so and virtio_gpu_dri.so symlinks are generated.
Fixes: a3d98ca62f ("Android: use symlinks for driver loading")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We have to mark the additional shader input as used, otherwise it will
be eliminated, and we have to setup its index correctly.
This is a bit of a hack, but so is everything surrounding edgeflag
passthrough.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
So... the pipe_ prefix doesn't really fit into a TGSI header; on the
other hand, the return type has the pipe_ prefix.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
NIR shaders are not captured properly in pipelined mode currently. This
would require shader cloning, which requires linking all the Gallium
drivers against NIR. We can always do that later.
v2: avoid immediate crashes in pipelined mode
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
This is convenient for backends that support both Vulkan and OpenGL while
lowering samplers to derefs with nir_lower_samplers_as_deref.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Undefined data will eventually trigger a valgrind error while computing
its CRC32 while writing it into the disk cache, but at that point, it is
basically impossible to track down where the undefined data came from.
With this change, finding the origin of undefined data becomes easy.
v2: remove duplicate VALGRIND_CFLAGS (Emil)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Otherwise, the padding bits remain undefined, which leads to valgrind
errors when storing the gl_shader_variable in the disk cache.
v2: use rzalloc instead of an explicit padding member variable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Each field gets a distinct name, so we should never hit the case where
the name already exists in the parameter list.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Save some passes over the IR.
v2: redesign to make the users of find_assignments more readable
v3:
- fix missing !
- add some comments and make the num_found check more explicit (Timothy)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The former 0x60 hardcoded in is equivalent to ROP_COPY with the shift.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Rob Clark <robdclark@gmail.com>
We need to figure out how to implement it properly. Right now it doesn't
work at all.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Rob Clark <robdclark@gmail.com>
At least the first level works now. Eventually the later levels stop
working, there appears to be some alignment issue. But this improves the
situation immensely.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Rob Clark <robdclark@gmail.com>
It doesn't appear to do what we want. Removing this bit makes
lodclamp-between as well as a number of dEQP tests pass, with no visible
ill effect.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Rob Clark <robdclark@gmail.com>
This enables S3TC, BPTC, ETC2, and ASTC texture decoding. Additionally
this enables RGB32 texture buffer objects, as well as 11_11_10_FLOAT and
10_10_10_2 vertex formats (and related extensions).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Rob Clark <robdclark@gmail.com>
The smapi->get_egl_image() call in st_egl_image_get_surface() stores a
reference to the EGLImage's texture in stimg.texture. That reference is
released via pipe_resource_reference(&stimg.texture, NULL) before stimg
goes out of scope at the end of the function, but not in the error path
if !is_format_supported().
Fixes: 83e9de25f3 ("st/mesa: EGLImageTarget* error handling")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Ensure vc4_cl_dump.h and $(BROADCOM_FILES) are distributed in the
dist-file.
This fixes `make distcheck`
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
I'm not 100% sure this is all wired up but it looks like it is.
v2: actually enable extension.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
NIR always makes the shift amount 32 bits, but LLVM asserts if the two
sources aren't the same type. Zero-extend the shift amount to make LLVM
happy.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We implement the split opcodes, and tell NIR to lower the original ones.
The lowering to LLVM is a little more complicated, but NIR can optimize
the split ones a little better, and some NIR lowering passes that we
might want to use (particularly for doubles) emit the split ones.
This should fix pack/unpackDouble2x32, which seems like a bug since when
we enabled the Float64 capability. It will also fix pack/unpackInt2x32
when we enable the Int64 capability.
Fixes: 798ae37c ("radv: Enable Float64 support.")
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Before, we were just implementing it with a move, which is incorrect
when the source and destination have different bitsizes. To implement
it properly, we need to use the 64-bit pack/unpack opcodes. Since
glslang uses OpBitcast to implement packInt2x32 and unpackInt2x32, this
should fix them on anv (and radv once we enable the int64 capability).
v2: make supporting non-32/64 bit easier (Jason)
v3: add another assert (Jason)
Fixes: b3135c3c ("anv: Advertise shaderInt64 on Broadwell and above")
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
svga_texture_device_format_has_alpha() is only intended to work for
texture resources, not buffer resources. This fixes a failed assertion
in the svga_texture() cast function when running texture buffer tests.
Also, add an assertion in svga_texture_device_format_has_alpha() to
catch the issue sooner.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
With change 8aba778fa2 we stopped binding
sampler objects for texture buffers. That broke our texture sample /
sampler view setup code.
Now, we loop over the max(num samplers, num sampler views) and handle
the sampler and view information separately. For texture buffers,
the sampler will be NULL but the sampler view non-null.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The buffer binding flags aren't ensured until after the
svga_buffer_handle() call, so move the assertion after it.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
If a buffer is created/initialized with glNamedBufferData we will
have no target (GL_ARRAY_BUFFER, GL_UNIFORM_BUFFER, etc) so the
svga_buffer::bind_flags will be zero until we try to get the buffer
handle.
This patch initializes the svga_buffer::bind_flags field when it's
zero.
This fixes the Piglit arb_uniform_buffer_object-rendering-dsa test.
Note that there's still issues in this area that'll have to be
addressed in the future. For example, creating a buffer object
as GL_UNIFORM_BUFFER and later using it as a vertex buffer will
fail.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We always have stage == first and stage == last when first == last, so
drop the special case. Also rephrase the comment to make the logic
clearer.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From Vulkan spec, 4.2.1. "Device Creation":
"vkCreateDevice verifies that extensions and features requested in
the ppEnabledExtensionNames and pEnabledFeatures members of
pCreateInfo, respectively, are supported by the implementation."
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@gmail.com>
SPIR-V tessellation shaders that were created from HLSL will have
the primitive generation domain set in tessellation control shader
(hull shader in HLSL) instead of the tessellation evaluation shader.
v2:
- Add assert (Kenneth)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch limits the number of items on the fence work queue (the
deferred deletion list) by submitting a sync fence when the queue size
exceeds a threshold. This initiates deferred deletion of all resources
on the list and decreases the total amount of memory held waiting for
"deferred deletion".
This resolves bug 101467 filed against swr for the piglit
streaming-texture-leak test. For those running on smaller memory
(16GB?) systems, this will prevent oom-killer.
Thus far, we have not seen any real world applications that exhibit
behavior like the streaming-texture-leak test; as any form of pipeline
flush will trigger the defer queue and properly free any retained
allocations. But, this addresses those as well.
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
We lost some precision on a previous change due to switching to
integers. Since we report a float in timestampPeriod, we want the
division to happen in floats.
CID: 1413021
Fixes: c77d98ef32 ("intel: common: express timestamps units in frequency")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Being able to see the access mode of various mappings is incredibly
useful for debugging. With this patch, INTEL_DEBUG=buf now shows
data such as:
bo_create: buf 7 (bufferobj) 640b
bo_map_gtt: 7 (bufferobj) -> 0x7fca1fae5000, WRITE ASYNC
brw_bo_map_cpu: 7 (bufferobj) -> 0x7fca1fae4000, READ
bo_map_gtt: 5 (bufferobj) -> 0x7fca1fad4000, WRITE ASYNC
brw_bo_map_cpu: 7 (bufferobj) -> 0x7fca1fae4000, READ
which makes it easy to see that there are async GTT writes with
intervening CPU reads.
Reviewed-by: Matt Turner <mattst88@gmail.com>
With the conversion to storing the result of drm_mmap to a local and not
directly to bo->map_gtt itself, we no longer should clear bo->map_gtt.
In the best the operation is redundant as we know bo->map_gtt to already
be NULL, but in the worst case we overwrite a concurrent thread that
successfully mmaped the GTT.
Fixes: 314647c4c2 ("i965: Drop global bufmgr lock from brw_bo_map_* functions.")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
After removing the unusuable debugging code in the previous commit, we
can also entirely remove the global mutex around mapping the buffer for
the first time and replace it with a single atomic operation to update
the cache once we retrieve the mmap.
v2 (Ken): Split out from Chris's original commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
With the broken debugging code gone, it doesn't do anything anymore.
We could technically eliminate it, but I'd like to keep it around in
case we want to add something there again someday. Otherwise we'd
have to go all over the codebase adding unmap calls back again.
Based on a patch by Chris Wilson.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Supposedly we were keeping a reference count for the number of users of
a mapping so that we could use valgrind to detect access to the map
outside of the valid section. However, we were incrementing the counter
only when first creating the cached mapping but decrementing on every
unmap. The bo->map_count tracking was wrong and so the debugging code
was completely useless.
v2 (Ken): Separate out atomic compare and swap optimization.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
At first glance this seems missing, since we handle it manually for CPU
and WC maps. Although a bit inconsistent, it's actually not necessary.
Thanks to Chris Wilson for explaining this to me.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We apparently still used v16i8 ....
As radeonsi doesn't use it with LLVM version checks I don't think
we need them either.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The state tracker should never ask us to create a texture with invalid
dimensions / mipmap levels. Do some assertions to check that.
No Piglit regressions.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
If we're rendering to an incomplete/inconsistent (cube) texture, the
different faces/levels of the texture may be stored in different
resources. Before, we always used the texture object resource. Now,
we use the texture image resource. In normal circumstances, that's
the same resource. But in some cases, such as the Piglit
fbo-incomplete-texture-03 test, the cube faces are in different
resources and we need to render to the texture image resource.
Fixes fbo-incomplete-texture-03 with VMware driver.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Return early from st_finalize_texture() if we have an incomplete
texture. This avoids trying to create a texture resource with invalid
parameters (too many mipmap levels given the base dimension).
Specifically, the Piglit fbo-incomplete-texture-03 test winds up
calling pipe_screen::resource_create() with width0=32, height0=32 and
last_level=6 because the first five cube faces are 32x32 but the sixth
face is 64x64. Some drivers handle this, but others (like VMware svga)
do not (generates device errors).
Note that this code is on the path that's usually not taken (we normally
build consistent textures).
No Piglit regressions.
v2: only need to check for base-level completeness since that's what has to
be consistent in order to specify the dimensions for a new gallium texture.
Per Roland.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Commit 8aba778fa2 "st/mesa: don't set
sampler states for TBOs" changed how texture buffer objects are handled.
Document the new convention.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Take the CL pointer in, which will be useful for enabling relocs.
However, our code expands a bit more:
before:
4449 0 0 4449 1161 src/gallium/drivers/vc4/.libs/vc4_draw.o
988 0 0 988 3dc src/gallium/drivers/vc4/.libs/vc4_emit.o
after:
4481 0 0 4481 1181 src/gallium/drivers/vc4/.libs/vc4_draw.o
1020 0 0 1020 3fc src/gallium/drivers/vc4/.libs/vc4_emit.o
I really liked this idea, as it should help with management of packet
parsing tools like the CL dump. The python script is forked off of theirs
because our packets are byte-based instead of dwords, and the changes to
do so while avoiding performance regressions due to unaligned accesses
were quite invasive.
v2: Fix Android.mk paths, drop shebang for python script, fix overlap
detection.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Rob Herring <robh@kernel.org>
In swr_update_derived, for consistency, index buffer validation should
be using the p_draw_info copy "info" rather than referencing
p_draw_info.
No functional change.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Tag pStat field in swr_draw_context structure so gen_llvm_types.py
can deal with the actual structure type instead of using void.
Code cleanup, no functional change.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Switch from a macro-based simd intrinsics layer to a more C++
implementation, which also adds AVX512 optimizations to 128-bit
and 256-bit SIMD.
Reviewed-by: Bruce Cherniak <bruce.cherniak at intel.com>
Each shader stage state (VS, TS, GS, SO, BE/CLIP) now has a
vertexAttribOffset to specify the offset to the start of the
general attribute section of the incoming verts for that stage.
It is up to the driver to set this up correctly based on the
active stages. All the shader stages use this value instead of
VERTEX_ATTRIB_START_SLOT to offset to the incoming attributes.
Only the vertex shader stage supports dynamic layout output
currently. The other stages continue to expect the output to be
the fixed layout slots as before. Will be enabling GS next.
Reviewed-by: Bruce Cherniak <bruce.cherniak at intel.com>
There is typo in the mkdir command path,
the correct one is $(TARGET_OUT)/$(l)/$(MESA_DRI_MODULE_REL_PATH)
The other issue is in 32bit builds, because lib64 does not exist there,
we can use TARGET_IS_64_BIT to refine the post install command.
Fixes: a3d98ca62f ("Android: use symlinks for driver loading")
Signed-off-by: Rob Herring <robh@kernel.org>
Add mksstats for surface view emulation and also tighten the stat
CreateBackedView for the actual creation of backed view.
Reviewed-by: Brian Paul <brianp@vmware.com>
In general, the functions which emit commands to the command buffer check
for failure and return a PIPE_ERROR_x code. It's up to the caller to
flush the buffer and retry the command.
But svga_set_stream_output() did its own flushing and the callers never
checked the return value (though, it would always be PIPE_OK) in practice.
This patch changes svga_set_stream_output() so that it does not call
svga_context_flush() when the buffer is full. And we update the callers
to check the return value as we do for other functions, like
svga_set_shader().
No Piglit regressions. Also tested w/ Nature demo.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This patch fixes the total surface size in surface cache
to include array size as well.
Tested with MTT glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
piglit test ext_texture_array-gen-mipmap is fixed with this patch.
Tested with mtt piglit, glretrace, viewperf and conform. No regression.
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch validates those sampler views with backing copy
of texture whose original copy has been updated since the
view is last validated.
This is done here at draw time because the texture binding might not
have modified, hence validation is not triggered at state update time,
and yet the texture might have been updated in another context, so
we need to re-validate the sampler view in order to update the backing
copy of the updated texture.
This fixes a rendering flickering issue with Photoshop running in
Linux VM with HWversion 11. The problem is Photoshop renders to texture A
in context X, and then bind texture A to context Y. The first time
when texture A is bound to context Y, cso calls pipe->set_sampler_views().
Validation of sampler views is done, rendering is fine.
But when texture A is rendered to again in context X, and rebound in
context Y, cso skips pipe->set_sampler_views() because texture A is already
bound in context Y. SVGA driver is not given a chance to re-validate
the texture binding, the backing copy of the texture is not updated,
and hence causes black image.
Tested with Photoshop, MTT glretrace, piglit.
Fixes VMware bug 1769103.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
UYVY is diffrent with YUYV in byte order.
YUYV is already declared in dri_interface.h,
this CL add the difinitions for UYVY.
Drivers can add UYVY as supported format
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Deferred deletion (via "fence_work") has obsoleted the need to allocate
all client vertex buffer scratch space in a single chunk. Scratch
allocations are now valid until the referenced fence is complete.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Vertex buffer state doesn't need to be validated on every call,
only on dirty _NEW_VERTEX or indexed draws.
Unconditional validation was introduced as part of patch 330d0607ed,
"remove pipe_index_buffer and set_index_buffer", with the expectation
we'd optimize later.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Windows doesn't allow you to move a file that's opened, and Popen()
doesn't wait on its subprocess' completion before returning, which leads
to broken Windows build.
Fixes: 3fd425aed7 "build systems: uniformize git_sha1.h generation"
Suggested-by: Scott D Phillips <scott.d.phillips@intel.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Instead of having special driver loading logic for Android, create
symlinks to gallium_dri.so so we can use the standard loading logic.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Commit 7dd20bc3ee ("anv/i965: drop libdrm_intel dependency completely")
removed the libdrm_intel dependency for automake, but Android builds still
depended on it. Now the build requires a newer version of i915_drm.h and
fails on Android builds:
src/mesa/drivers/dri/i965/brw_performance_query.c:616:9: error: use of undeclared identifier 'I915_OA_FORMAT_A32u40_A4u32_B8_C8'
case I915_OA_FORMAT_A32u40_A4u32_B8_C8:
^
src/mesa/drivers/dri/i965/brw_performance_query.c:1887:18: error: use of undeclared identifier 'I915_PARAM_SLICE_MASK'
gp.param = I915_PARAM_SLICE_MASK;
^
src/mesa/drivers/dri/i965/brw_performance_query.c:1893:18: error: use of undeclared identifier 'I915_PARAM_SUBSLICE_MASK'
gp.param = I915_PARAM_SUBSLICE_MASK;
^
Remove the libdrm_intel dependency for Android builds and add the necessary
include paths for the local copy of i915_drm.h.
Fixes: 7dd20bc ("anv/i965: drop libdrm_intel dependency completely")
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
In addition to Rob Herring "Android: i965: remove libdrm_intel dependency",
we can drop libdrm_intel dependency in anv for Android.
Please check if libdrm has to stay as shared dependency and drop this comment line.
Fixes: 7dd20bc ("anv/i965: drop libdrm_intel dependency completely")
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The layer stride information is used in various parts of the driver,
so it needs to be present regardless if the driver allocated the
buffer itself or merely imported it from an external source.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
This patch makes glCopyImageSubData require mipmap completeness when the
texture object's built-in sampler object has a mipmapping MinFilter.
This is apparently the de facto behavior and mandated by Android's CTS.
One exception is that we ignore format based completeness rules
(specifically integer formats with linear filtering), as this is
also the de facto behavior that until recently was mandated by the
OpenGL 4.5 CTS.
This was discussed with both the OpenGL and OpenGL ES working groups,
and while everyone agrees this behavior is unfortunate and complicated,
it is what it is at this point. There was little appetite for relaxing
restrictions given that all conformant Android drivers followed the
mipmapping rule, and all conformant GL 4.5 implementations ignored the
integer/linear rule.
Fixes (on i965):
dEQP-GLES31.functional.debug.negative_coverage.*.buffer.copy_image_sub_data
Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=16224
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Add a big spec quotation justifying the error generated, which has
changed over the GL versions.
v2: Compact the spec quote based on a Khronos bug and discussion with Jason.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This shouldn't ever happen - GL requires it to be aligned:
"Clients must align data elements consistent with the requirements
of the client platform, with an additional base-level requirement
that an offset within a buffer to a datum comprising N basic
machine units be a multiple of N."
Mesa should reject unaligned index buffers for us - we shouldn't have
to handle them in the driver.
Note that Gallium already makes this assumption.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
On NV20 (and probably also on earlier NV GPUs that lack
GL_ARB_texture_border_clamp) fixes the following piglit tests:
gl-1.0-beginend-coverage gltexparameter[if]{v,}
push-pop-texture-state
texwrap 1d
texwrap 1d proj
texwrap 2d proj
texwrap formats
All told, 49 more tests pass on NV20 (10de:0201).
No changes on Intel CI run or RV250 (1002:4c66).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: Use textwrap.dedent to make the source line a lot shorter.
Shortening (?) the line was requested by Jason.
v3: Simplify the texwrap.dedent usage. Suggested by Dylan.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The format_fallback.py script wants two arguments: 'csv-file' and
'out-file'.
Fixes: 20c99eaece "mesa: Add _mesa_format_fallback_rgbx_to_rgba() [v2]"
Reported-by: Rob Herring <robh@kernel.org>
The labels array may change its virtual address on a reallocation, so
it is invalid to cache pointers into the array. Rather than using the
pointer directly, remember the array index.
Fixes miscompilation of shaders in glmark2 ideas, leading to GPU hangs.
Fixes: c9e8b49b (etnaviv: gallium driver for Vivante GPUs)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The Android framework requires support for EGLConfigs with
HAL_PIXEL_FORMAT_RGBX_8888 and HAL_PIXEL_FORMAT_RGBA_8888.
Even though all RGBX formats are disabled on gen9 by
brw_surface_formats.c, the new configs work correctly on Broxton thanks
to _mesa_format_fallback_rgbx_to_rgba().
On GLX, this creates no new configs, and therefore breaks no existing
apps. See in-patch comments for explanation. I tested with glxinfo and
glxgears on Skylake.
On Wayland, this also creates no new configs, and therfore breaks no
existing apps. (I tested with mesa-demos' eglinfo and es2gears_wayland
on Skylake). The reason differs from GLX, though. In
dri2_wl_add_configs_for_visual(), the format table contains only
B8G8R8X8, B8G8R8A8, and B5G6B5; and dri2_add_config() correctly matches
EGLConfig to format by inspecting channel masks.
On Android, in Chrome OS, I tested this on a Broxton device. I confirmed
that the Google Play Store's EGLSurface used HAL_PIXEL_FORMAT_RGBA_8888,
and that an Asteroid game's EGLSurface used HAL_PIXEL_FORMAT_RGBX_8888.
Both apps worked well. (Disclaimer: I didn't test this patch on Android
with Mesa master. I backported this patch series to an older Android
branch).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This fixes a couple of errors when building in Android:
external/mesa3d/src/mesa/main/shaderapi.c:293:49: error: format string
is not a string literal (potentially insecure)
[-Werror,-Wformat-security]
_mesa_error(ctx, GL_INVALID_OPERATION, caller);
^~~~~~
external/mesa3d/src/mesa/main/shaderapi.c:293:49: note: treat the string
as an argument to avoid this
_mesa_error(ctx, GL_INVALID_OPERATION, caller);
^
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Implement assembly language API acceleration for PPC64LE,
analogous to long-standing implementations for X86 and X86-64.
See also similar implementation in libglvnd.
Tested with Piglit.
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
This enables support for importing RGBX8888 EGLImage textures on
Skylake.
Chrome OS needs support for RGBX8888 EGLImage textures because because
the Android framework produces HAL_PIXEL_FORMAT_RGBX8888 winsys
surfaces, which the Chrome OS compositor consumes as dma_bufs. On
hardware for which RGBX is unsupported or disabled, normally core Mesa
provides the RGBX->RGBA fallback during glTexStorage. But the DRIimage
code bypasses core Mesa, so we must do the fallback in i965.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new function takes a mesa_format and, if the format is an alpha
format with a non-alpha variant, returns the non-alpha format.
Otherwise, it returns the original format.
Example:
input -> output
// Fallback exists
MESA_FORMAT_R8G8B8X8_UNORM -> MESA_FORMAT_R8G8B8A8_UNORM
MESA_FORMAT_RGBX_UNORM16 -> MESA_FORMAT_RGBA_UNORM16
// No fallback
MESA_FORMAT_R8G8B8A8_UNORM -> MESA_FORMAT_R8G8B8A8_UNORM
MESA_FORMAT_Z_FLOAT32 -> MESA_FORMAT_Z_FLOAT32
i965 will use this for EGLImages and DRIimages.
v2 (Jason Ekstrand):
- Use mako
- Rework to be easier to read
- Write directly to the output file
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
sizeof(struct si_shader_key):
Before reverting the 2 commits: 120 bytes
After reverting the 2 commits: 128 bytes
With #pragma pack: 107 bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Broken by:
commit 00173d91b7
Author: Marek Olšák <marek.olsak@amd.com>
Date: Sat Jun 10 12:09:43 2017 +0200
mesa: don't flag _NEW_TRANSFORM for st/mesa if possible
It also optimizes the case slightly for GL core.
It doesn't try to fix that glEnable might be a bad place to do the
clip plane transformation.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Per Jose's suggestion, this patch cleans up format_cap_table to remove
the unnecessary default cap value for vgpu10 formats since those devcap values
can be retrieved from the device.
Tested with MTT conform, glretrace, piglit in HWv13 and HWv8.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The default devcap for format SVGA3D_Z_D24S8_INT in HWv8 when its devcap is
not explicitly advertised should be set to zero to match the default value
in the device.
Tested with MTT piglit in HW version 8.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
In cases where certain bind flags cannot be enabled together,
such as CONSTANT_BUFFER cannot be combined with any other flags,
a separate host surface will be created.
For example, if a stream output buffer is reused as a constant buffer,
two host surfaces will be created, one for stream output,
and another one for constant buffer. Data will be copied from the
stream output surface to the constant buffer surface.
Fixes piglit test ext_transform_feedback-immediate-reuse-index-buffer,
ext_transform_feedback-immediate-reuse-uniform-buffer
Tested with MTT piglit, MTT glretrace, Nature, NobelClinician Viewer, Tropics.
v2: Fix bind flags compatibility check as suggested by Brian.
v3: Use the list utility to maintain the buffer surface list.
v4: Use the SAFE rev of LIST_FOR_EACH_ENTRY
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Currently we unconditionally enable streamout bind flag at
buffer resource creation time. This is not necessary if the buffer
is never used as a streamout buffer. With this patch, we enable
streamout bind flag as indicated by the state tracker. If the buffer
is later bound to streamout and does not already has streamout bind
flag enabled, we will recreate the buffer with
the new set of bind flags. Buffer content will be copied
from the old buffer to the new one.
Tested with MTT piglit, Nature, Tropics, Lightsmark.
v2: Fix bind flags check as suggested by Brian.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is to prepare for more bind_flags optimization
in subsequent patches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is to prepare for other bind_flags optimization
in subsequent patches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
src/mesa/drivers/x11/xm_dd.c:688:7: warning: implicit declaration of function ‘_mesa_update_draw_buffer_bounds’; did you mean ‘_mesa_has_ARB_draw_buffers_blend’? [-Wimplicit-function-declaration]
_mesa_update_draw_buffer_bounds(ctx, ctx->DrawBuffer);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cc: Marek Olšák <marek.olsak@amd.com>
Fixes: 585c5cf8a5 ("mesa: don't update draw buffer bounds in
_mesa_update_state")
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
From experimentation in IGT, we found that the OA unit might label
some report as "idle" (using an invalid context ID), right after a
report for a given context. Deltas generated by those reports actually
belong to the previous context, even though they're not labelled as
such.
This change makes ensure that while reading OA reports, we only
consider the GPU actually idle after 2 reports with an invalid context
ID.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Due to an underlying hardware race condition, we have no guarantee
that all the reports coming from the OA buffer related to the workload
we're trying to measure have landed to memory by the time all the work
submitted has completed. That means we need to keep on reading the OA
stream until we read a report with a timestamp more recent than the
timestamp recored by the MI_REPORT_PERF_COUNT at the end of the
performance query.
v2: fix uninitialized offset variable to 0 (Lionel)
v3: rework the reading to avoid blocking the user of the API unless
requested (Rob)
v4: fix a bug that makes the i965 driver reading the perf stream when
not necessary, leading to very long counter accumulation times
(Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Enables access to OA unit metrics on Gen8+ via INTEL_performance_query.
v2: make use of new parameters coming from gen_device_info (Lionel)
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In preparation for adding XML OA metric set descriptions for Gen 8 and 9
which will result in auto generated code that depends on a number of new
system variables ($EuSubslicesTotalCount, $EuThreadsCount and
$SliceMask) this adds corresponding members to brw->perf.sys_vars.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With Ken's work to drop the library dependency on libdrm_intel, we now
only depend on libdrm for the kernel uapi headers it provides. It
seems like we're better off just embeddeding those headers ourselves,
making the lives of people developping news features tightly
integrated with the kernel a tiny bit easier.
This change also makes it a bit more obvious what cflags/libs are
required by the i915 drivers vs i965, by renaming INTEL_CFLAGS/LIBS
into I915_CFLAGS/LIBS.
Headers were generated from drm-tip on the following commit :
commit 6d61e70ccc21606ffb8a0a03bd3aba24f659502b
Merge: 338ffbf7cb5e c0bc126f97fb
Author: Dave Airlie <airlied@redhat.com>
Date: Tue Jun 27 07:24:49 2017 +1000
Backmerge tag 'v4.12-rc7' into drm-next
v2: Use installed files from the kernel (Daniel Vetter)
v3: Use headers from drm-next rather than drm-tip (Dave/Daniel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Counter related to timings will be sensitive to any delay introduced
by the software. In particular if our begin & end of performance
queries end up in different batches, time related counters will
exhibit biffer values caused by the time it takes for the kernel
driver to load new requests into the hardware.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
According to GLSL.std.450 spec, SmoothStep expects input to be a
floating-point type, but it does not restrict the bitsize.
Current implementation relies on inputs to be 32-bit.
This commit extends the support to 64-bit size inputs.
Reviewed by: Elie Tournier <elie.tournier@collabora.com>
According to GLSL.std.450 spec, the operand for step() function must be
a floating-point. It does not restrict the value to 32-bit floats.
Reviewed by: Elie Tournier <elie.tournier@collabora.com>
LLVM has required an i1 here for a long time. llvm.ctlz.* was fixed in
commit edd23e0606 ("ac/llvm: fix various findMSB bugs").
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously the logic would decide that the record is kept, which
translates into keep = false in the caller, which meant that these
passes did not run.
While it's right that keep = false which means that a new record does
not need to be added, we do still have to perform the usual list
maintenance. It's easiest to do this pre-merge rather than post.
The lowering that clip/cull distance passes produce triggers this bug in
TCS (since reading outputs is done differently in other stages), but it
should be possible to achieve it with the right sequence of regular
reads/writes.
Fixes: KHR-GL45.cull_distance.functional
Fixes: generated_tests/spec/arb_tessellation_shader/execution/tes-input/tes-input-gl_ClipDistance.shader_test
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
If the fileIndex is different, that means they are in logically
different spaces. However if there's also a relative offset, then they
could end up pointing at the same spot again.
Also add a note about potential for multiple buffers to overlap even if
they're at different file indexes. However that's potentially lowered
away by the point that this logic hits.
Not known to fix any specific application or test.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This has no effect since in practice this will only play for
memory-backed files, for which VFETCH will never happen.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The idxbuf could linger, and when a clear happened, which also uses the
3d bufctx, we could get an error trying to access it.
This fixes spurious crashes/errors in CTS tests.
Fixes: 61d8f3387d ("nv50,nvc0: clear index buffer bufctx bin unconditionally")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
All the BuildUtil helpers just insert the operation into the current BB.
So we have to take care that any fetchSrc() operations happen before the
operation whose setIndirect() it goes into.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
vc4 now depends on renderonly functions, but these weren't added to the
Android build resulting in the following errors:
src/gallium/drivers/vc4/vc4_resource.c:380: error: undefined reference to 'renderonly_scanout_destroy'
src/gallium/drivers/vc4/vc4_resource.c:681: error: undefined reference to 'renderonly_create_gpu_import_for_resource'
src/gallium/drivers/vc4/vc4_screen.c:625: error: undefined reference to 'renderonly_dup'
src/gallium/winsys/pl111/drm/pl111_drm_winsys.c:37: error: undefined reference to 'renderonly_create_gpu_import_for_resource'
src/gallium/winsys/pl111/drm/pl111_drm_winsys.c:37: error: undefined reference to 'renderonly_create_gpu_import_for_resource'
Fixes: 7029ec05e2 ("gallium: Add renderonly-based support for pl111+vc4.")
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Meta always sets the API to API_OPENGL_COMPAT, so the current API
setting is irrelevant.
text data bss dec hex filename
7154994 256860 37332 7449186 71aa62 32-bit i965_dri.so before
7154978 256860 37332 7449170 71aa52 32-bit i965_dri.so after
6788451 328056 50704 7167211 6d5ceb 64-bit i965_dri.so before
6788419 328056 50704 7167179 6d5ccb 64-bit i965_dri.so after
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
This is used to inline KHR_no_error logic without inlining
the function into all its callers.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Apparently, the sampler has some sort of precision issues for
non-normalized texture coordinates with linear filtering. This caused
some small precision issues in scaled blits. Work around this by using
normalized coordinates. There is some extra work necessary because Gen6
uses TEX (instead of TXF) for some multisample resolve blits.
Fixes piglit.spec.arb_framebuffer_object.fbo-blit-stretch on SNB.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68365
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
A GPU memcpy function could alternatively be implemented using MI_*
commands. Provide more detail into how this one operates in case another
memcpy function is created.
v2:
- Update the commit message.
v3:
- Use 'memcpy' instead of 'cpy' (Jason Ekstrand)
- Shorten 'streamout' to 'so'
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In the future, we plan on using this method to resolve images whose
surface state fast-clear value is dynamically updated during command
buffer execution. Start using it now for testing and to reduce churn
later on.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will be used in the next patch.
v2:
- Omit BLORP_BATCH_NO_EMIT_DEPTH_STENCIL (Jason Ekstrand)
- Update commit message.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Splitting out these fields will make the color buffer transitioning
function simpler when it gains more features.
v2: Remove unintended blank line (Iago Toral)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For 3D image subresources undergoing a layout transition via
PipelineBarrier, we increase the number of fast-cleared layers to match
the intended behaviour of KHR_maintenance1. When such subresources
undergo layout transitions between subpasses, we don't do this to avoid
failing incorrect CTS tests. Instead, unify the behaviour in both
scenarios, and wait for the CTS tests to catch up. See CL 1111 for the
test fix and Vulkan issue #849 for more information.
On SKL+, this causes 3 test failures under:
dEQP-VK.pipeline.render_to_image.3d.*
v2: Add a reference to the Vulkan issue (Iago Toral).
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Make the function take in an image instead of an image view. This
enables us to record relocations for surfaces states created outside of
the anv_CreateImageView path.
v2 (Jason Ekstrand):
- Use image->offset instead of surf_offset in aux_offset calculation.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reflect the fact that an image view or subresource range with the color
aspect cannot have any other aspect.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- Check for aux levels in layer helper (Jason Ekstrand)
- Don't assert aux is present, return 0 if it isn't.
- Use the helpers.
v3:
- Make the helpers aspect-agnostic (Jason Ekstrand)
- Drop anv_image_has_color_aux()
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2 (Jason Ekstrand):
- Remove Vulkan-specific terminology from the commit title.
- Replace '== 7' with '<= 7' to hint that this is a new feature on BDW+.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently a resource flush may trigger a self resolve, even if a scanout buffer
exists, but is up to date. If a scanout buffer exists we only ever want to
flush the resource to the scanout buffer. This fixes a performance regression.
Fixes: dda956340c (etnaviv: resolve tile status when flushing resource)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Based on a patch from Wladimir J. van der Laan and untested due
to lack of hardware. Binary blob emits those formats if GPU supports
HALTI1 (faked with ibvivhook).
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Passes texwrap GL_ARB_texture_rg piglit (with faked full texture rg support).
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Fix regression of "no rendering" on simple apps like glxgears by
setting an explicit full surface clear_rect when scissor is not
enabled.
This regressed with commit 00173d91 "st/mesa: don't set 16
scissors and 16 viewports if they're unused" due to an assumption
that a default scissor rect is always set, which was the case prior
to this optimization.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Some combinations of c++ compilers and standard libraries had problems
with the string::replace code we were using previously.
This should fix the travis-ci system.
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
The hardware doesn't support it, so we just interpolate all array elements
and then use indirect indexing on the resulting vector.
Clearly, this is not very efficient. There is an argument to be had for
adding if/else, or perhaps even pulling the data out of LDS directly.
Both don't really seem worth the effort, considering that it seems nobody
actually uses this feature.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In swrastGetDrawableInfo, set *x and *y, not just *w and *h;
this fixes a crash later in drisw_update_tex_buffer when the
(formerly) uninitialized x and y values are used to construct
an address in a call to llvmpipe_transfer_map.
Fixes crash in Piglit test
"spec@egl 1.4@eglcreatepbuffersurface and then glclear"
(<piglit dir>/bin/egl-create-pbuffer-surface -auto)
that occurred intermittently, e.g. when the uninitialized x and y in
drisw_update_tex_buffer just happened to contain absurd non-zero values.
v2: Initialize in case if function succeeds or fails, just like *w/*h.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The function _eglError() already explicitly returns EGL_FALSE,
explicitly to simplify the callers. Make use of it.
While EGL_FALSE is numerically identical to false, NULL, EGL_NO_FOO,
storage is not the same so we cannot use it for "everything".
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Function cannot fail and always returns true.
v2: Inline the one line function in the header
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
xfb only applies to the latest stage before the fragment shader, so
there is no need to invoke it in the fragment shader.
Fixes:
KHR-GL45.enhanced_layouts.xfb_stride_of_empty_list
KHR-GL45.enhanced_layouts.xfb_stride_of_empty_list_and_api
v2: do reset only if shaders provide an explicit stride
v3: do not call link_xfb_stride_layout_qualifiers() for fragment shaders
(Timothy)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
…and print error in such case. Which probably is not a rare event btw
because fopen doesn't expand ~ to $HOME.
Also get rid of unused "bool ret" variable.
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=100785
v2: I was too much twiddling whether to initialize nsys_inputs at the beginning of shader initialization or for allocation of system values, and by the time I decided to go with the first one, I forgot to change it back.
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Dave Airlie <airlied@redhat.com>
On evergreen we can route vertex fetches via the texture cache,
and this is required for some images support. So add support
to the asm builder for it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This was found during writing the images code, we need to
make sure we route the correct index register.
Signed-off-by: Dave Airlie <airlied@redhat.com>
for HUD integration in following commits. This valuable profiling data
will allow us to see on the HUD how well glthread is able to utilize
parallelism. This is better than benchmarking, because you can see
exactly what's happening and you don't have to be CPU-bound.
u_threaded_context has the same counters.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This mirrors exactly how u_threaded_context works.
If you understand this, you also understand u_threaded_context.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
These entry points are used by Alien Isolation and caused
synchronization with glthread. The async marshalling implementation
is similar to glBuffer(Sub)Data. However unlike Buffer(Sub)Data
we don't need to worry about EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD,
as this isn't applicable to these DSA variants.
Results in an approximately 6x drop in glthread synchronizations and a
~30% FPS jump in Alien Isolation (Medium preset, Athlon 860K, RX 480).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This trivially adds support for the image offset query, which is needed
for the zwp_linux_dmabuf based EGL platform wayland implementation.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Daniel Stone <daniels@collabora.com>
This prevents glViewport() and friends to always flush and
trigger _NEW_VIEWPORT.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If lp_setup_bind_framebuffer() is never called, then setup fb x1/y1 was not
correctly initialized. This can happen if there's never a fb set - both
cso and llvmpipe would consider setting this with no cbufs and no zsbuf a
redundant change and therefore it would never get set.
We rely on this setup fb rect being initialized correctly for the tri intersect
tests, throwing away tris which don't intersect. Not initializing it meant
we'd then say it intersected, and we'd try to bin that despite that we have
no actual tiles to bin it to, leading to assertion failures (pretty harmless
since tile 0/0 always exists nevertheless as tiles are statically allocated,
albeit that should change at some point).
(Note probably not an issue with gl state tracker)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This commit replaces the complex and confusing set of disable flags with
two fairly straightforward fields which describe the intended auxiliary
surface usage and whether or not the miptree supports fast clears.
Right now, supports_fast_clear can be entirely derived from aux_usage
but that will not always be the case.
This commit makes functional changes. One of these changes is that it
re-enables multisampled fast-clears which were accidentally disabled in
cec30a6669 around a year ago. Fixing this
improves the SynMark v7 DeferredAA test by around ~3% on some gen9
hardware. This commit also gets us closer to enabling CCS_E for
window-system buffers which are Y-tiled.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Starting with Sky Lake, we can clear to arbitrary floats or integers.
Unfortunately, the hardware isn't particularly smart when it comes
sampling from that clear color. If the clear color is out of range for
the surface format, it will happily return whatever we put in the
surface state packet unmodified. In order to avoid returning bogus
values for surfaces with a limited range, we need to do some clamping.
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
While we're here, we also make the two support checks static since there
are no users outside intel_mipmap_tree.c.
Reviewed-by: Chad Versace <chadversary@chromium.org>
We never fast-clear more than the base slice (LOD 0, layer 0) anyway, so
layered rendering without a resolve is always perfectly safe. Should
this ever change in the future, we'll have to put some sort of resolve
back in but we can cross that bridge when we come to it.
Reviewed-by: Chad Versace <chadversary@chromium.org>
For PartialResolveDisableInVC field recommendation is to
always set this to 0 and that's the default value of the bit.
So, we have nothing left to write to CACHE_MODE_1.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With below optimizations gone in gen10+ we have nothing left out to
write to CACHE_MODE_1:
Float Blend Optimization Enable: This bit have been removed in gen10+
Partial Resolve Disable in VC: Recommendation is to always set this
field to 0 in gen10+ and that's the default value of the bit.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This needs to be passed to gallium drivers.
No game fix is planned at this time.
The addition of glsl_correct_derivatives_after_discard is
generally a good thing for mesa compatibility with the broader GL
driver ecosystem.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100070
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We use the bounding box (triangle extents) to figure out if 32bit rasterization
could potentially overflow. However, we used the bounding box which already got
rounded up to 0 for negative coords for this, which is incorrect, leading to
overflows and hence bogus rendering in some of our private use.
It might be possible to simplify this somehow (we're now using 3 different
boxes for binning) but I don't quite see how.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is pretty useful for debugging rasterization issues, so turn it on
based on DEBUG (the actual existence of the fields is also conditionalized
on DEBUG, lines fill it out the same too).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
I don't think this is actually required, if the viewport
values are different from the ones stored in the context, we
already flush and trigger _NEW_VIEWPORT in
set_viewport_no_notify().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
I don't think this is actually required, if the depth range
values are different from the ones stored in the context, we
already flush and trigger _NEW_VIEWPORT in
set_depth_range_no_notify().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This looks like useless because gl_context::Texture::CurrentUnit
is not used by _mesa_update_texture_state() and friends.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This function was moved to genX_state_upload.c but was still not using genxml.
By converting it to genxml, we make some things simpler, like setting
haswell's border color state, but others are more complex, since the structs
used by each gen are different.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The sampler state code was all moved to genxml, so we can get rid of these
functions and delete the file.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Since they just use the code that is already available in genX_state_upload.c,
convert them in one batch.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Also convert some auxiliary functions used by it, and copy
upload_default_color to genX_state_upload.c.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Based on the current code, gen5 and gen6 have the same sampler border color
state struct. So fix the gen5 one to match gen6.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
anv_layout_to_aux_usage() lacked a case for
VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR. Add an unreachable case, because we
don't support the extension.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This patch just enables building Vulkan libs for gen10. We
still don't have gen 10 support enabled on Vulkan.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Few of the fields in this register are changed as compared
to gen9.xml.
V2: Remove some fields which are not valid anymore.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
This is required because we already have a macro defined with
the name StartInstanceLocation.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The param is currently unused. It will later be used it to support
R8G8B8X8 EGLConfigs on Skylake.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows us to query the driver's supported formats in i965's DRI code,
where often there is available a DRIscreen but no GL context.
To reduce diff noise, this patch does not completely remove
brw_context's format arrays. It just redeclares them as pointers which
point to the arrays in intel_screen.
Specifically, move these two arrays from brw_context to intel_screen:
mesa_to_isl_render_format[]
mesa_format_supports_render[]
And add a new array to intel_screen,
mesa_format_supportex_texture[]
which brw_init_surface_formats() copies to ctx->TextureFormatSupported.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm swimming in a vortex of formats. Mesa formats, isl formats, DRI
formats, GL formats, etc.
It's easy to misinterpret the following brw_context members unless
you've recently read their definition. In upcoming patches, I change
them from embedded arrays to simple pointers; after that, even their
definition doesn't help, because the MESA_FORMAT_COUNT hint will no
longer be present.
Rename them to prevent further confusion. While we're renaming, choose
shorter names too.
-format_supported_as_render_target
+mesa_format_supports_render
-render_target_format
+mesa_to_isl_render_format
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rename 'count' to 'config_count'. I didn't understand what the variable
did until I untangled the for-loops. Now the next person won't have that
problem.
v2: Rebase. Fix typo. Apply to all platforms (for emil).
Reviewed-by: Eric Engestrom <eric@engestrom.ch> (v1)
No behavioral change. Just a readability cleanup.
Instead of modifying this small array on each loop iteration, we now
initialize it in-place with the values it needs.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
No behavioral change. Just a readability cleanup.
Instead of modifying this small array on each loop iteration, we now
initialize it in-place with the values it needs.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
No behavioral change. Just a readability cleanup.
Instead of modifying this small array on each loop iteration, we now
initialize it in-place with the values it needs.
v2: Rebase.
Reviewed-by: Eric Engestrom <eric@engestrom.ch> (v1)
That is, consistently do this:
for (int i = 0; ...)
No behavioral change.
This patch touches only egl_dri2.c.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
If the call fails we need to flush the command buffer and retry. In this
case, we were failing to unbind the GS which led to subsequent errors.
This fixes a bug replaying a Cinebench R15 apitrace in a Linux guest.
VMware bug 1894451
cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
When surface_invalidate is called to invalidate a newly created surface
in svga_validate_surface_view(), it is possible that the command
buffer is already full, and in this case, currently, the associated wddm
winsys function will flush the command buffer and resend the invalidate
surface command. However, this can pre-maturely flush the command buffer
if there is still pending image updates to be patched.
To fix the problem, this patch will add a return status to the
surface_invalidate interface and if it returns FALSE, the caller will
call svga_context_flush() to do the proper context flush.
Note, we don't call svga_context_flush() if surface_invalidate()
fails when flushing the screen surface cache though, because it is
already in the process of context flush, all the image updates are already
patched, calling svga_context_flush() can trigger a deadlock.
So in this case, we call the winsys context flush interface directly
to flush the command buffer.
Fixes driver errors and graphics corruption running Tropics. VMware bug 1891975.
Also tested with MTT glretrace, piglit and various OpenGL apps such as
Heaven, CinebenchR15, NobelClinicianViewer, Lightsmark, GoogleEarth.
cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
Consider the following RT attachment order:
1. Attach surfaces attachments 0 & 1, and render with them
2. Detach 0 & 1
3. Re-attach 0 & 1 to different surfaces
4. Render with the new attachment
The definition of a tile being resolved is that local changes have been
flushed out to the surface, hence there is no need to reload the tile before
it's written to. For an invalid tile, the tile has to be reloaded from
the surface before rendering.
Stage (2) was marking hot tiles for attachements 0 & 1 as RESOLVED,
which means that the hot tiles can be written out to memory with no
need to read them back in (they are "clean"). They need to be marked as
resolved here, because a surface may be destroyed after a detach, and we
don't want to have un-resolved tiles that may force a readback from a
NULL (destroyed) surface. (Part of a destroy is detach all attachments first)
Stage (3), during the no att -> att transition, we need to realize that the
"new" surface tiles need to be fetched fresh from the new surface, instead
of using the resolved tiles, that belong to a stale attachment.
This is done by marking the hot tiles as invalid in stage (3), when we realize
that a new attachment is being made, so that they are re-fetched during
rendering in stage (4).
Also note that hot tiles are indexed by attachment.
- Fixes VTK dual depth-peeling tests.
- No piglit changes
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
From OpenGL 4.5 spec PDF, section '8.11. Texture Queries', page 236:
"An INVALID_VALUE error is generated if texture is not the name of
an existing texture object."
Same wording applies to the compressed version.
But turns out this is a spec bug, and Khronos is fixing it for the next
revisions.
The proposal is to return INVALID_OPERATION in these cases.
This reverts commit 633c959fae.
v2:
- Use _mesa_lookup_texture_err (Samuel Pitoiset)
v3:
- _mesa_lookup_texture_err() already handles texture > 0 (Samuel
Pitoiset)
- Just revert 633c959fae (Juan A. Suarez)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Many Android apps (such as Google's official NDK GLES2 example app), and
even portions the core framework code (such as SystemServiceManager in
Nougat), incorrectly choose their EGLConfig. They neglect to match the
EGLConfig's EGL_NATIVE_VISUAL_ID against the window's native format, and
instead choose the first EGLConfig whose channel sizes match those of
the native window format while ignoring the channel *ordering*.
We can detect such buggy clients in logcat when they call
eglCreateSurface, by detecting the mismatch between the EGLConfig's
format and the window's format.
As a workaround, this patch changes the order of EGLConfig generation
such that all EGLConfigs for HAL pixel format i precede those for HAL
pixel format i+1. In my (chadversary) testing on Android Nougat, this
was good enough to pacify the buggy clients.
v2: Rebase to make patch cherry-pickable to stable.
Cc: mesa-stable@lists.freedesktop.org
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
gl_FragCoord contains the window coordinates so it seems to me that
we should not use perspective correct interpolation for it. At least
now I get similar output as i965/swrast/llvmpipe produce.
This fixes dEQP-GLES2.functional.shaders.builtin_variable.fragcoord_w.
dEQP-GLES2.functional.shaders.builtin_variable.fragcoord_xyz was already
passing, though I'm not quite sure how it managed to do that.
v2: Add definitons for the S3 "wrap shortest" bits as well (Ian)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
In the same spirit as 858f2f2ae6 (egl/dri2: ease srgb __DRIconfig
conditionals), let's merge dri_single_config and dri_double_config into
a single dri_config[2].
This moves the `if (double) dri_double_config else dri_single_config`
logic to `dri_config[double]`, reducing code duplication and making it
easier to read.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
I reproduced this bug on Polaris11 and Raven.
I can't get this bug on Fiji. The reason might be that Fiji doesn't use
2D tiling for the test due to higher 2D tiling alignment requirements.
Fixes piglit: spec@ext_framebuffer_object@fbo-fast-clear
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
VK_FORMAT_FEATURE_TRANSFER_[SRC|DST]_BIT_KHR is a flag value of the
VkFormatFeatureFlagBits enum that can only be hold and checked against
the linearTilingFeatures or optimalTilingFeatures members of the
VkFormatProperties struct but not the bufferFeatures member.
>From the Vulkan® 1.0.51, with the VK_KHR_maintenance1 extension,
section 32.3.2 docs for VkFormatProperties:
"* linearTilingFeatures is a bitmask of VkFormatFeatureFlagBits
specifying features supported by images created with a tiling
parameter of VK_IMAGE_TILING_LINEAR.
* optimalTilingFeatures is a bitmask of VkFormatFeatureFlagBits
specifying features supported by images created with a tiling
parameter of VK_IMAGE_TILING_OPTIMAL.
* bufferFeatures is a bitmask of VkFormatFeatureFlagBits
specifying features supported by buffers."
...
Bits which can be set in the VkFormatProperties features
linearTilingFeatures, optimalTilingFeatures, and bufferFeatures
are:
typedef enum VkFormatFeatureFlagBits {
...
VK_FORMAT_FEATURE_TRANSFER_SRC_BIT_KHR = 0x00004000,
VK_FORMAT_FEATURE_TRANSFER_DST_BIT_KHR = 0x00008000,
...
} VkFormatFeatureFlagBits;
...
The following bits may be set in linearTilingFeatures and
optimalTilingFeatures, specifying that the features are supported
by images or image views created with the queried
vkGetPhysicalDeviceFormatProperties::format:
...
* VK_FORMAT_FEATURE_TRANSFER_SRC_BIT_KHR specifies that an image
can be used as a source image for copy commands.
* VK_FORMAT_FEATURE_TRANSFER_DST_BIT_KHR specifies that an image
can be used as a destination image for copy commands and clear
commands."
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
As encode support is added along with decode, increase max_entrypoints to two.
vaMaxNumEntrypoints was returning incorrect value and causing
memory corruption before this commit
v2: assert when max_entrypoints needs to be bigger
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Christian König <christian.koenig@amd.com>
This fixes an assertion in debug build, and probably a crash
in release build.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
si_build_shader_variant can actually be called directly from one of
normal-priority compiler threads. In that case, the thread_index is
only valid for the normal tm array.
v2:
- use the correct sel/shader->compiler_ctx_state
Fixes: 86cc809726 ("radeonsi: use a compiler queue with a low priority for optimized shaders")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The main flush before texturing is done after the FMASK decompress pass.
CB after MSAA rendering is not flushed in set_framebuffer_state and also
not in memory_barrier if the current color buffer is MSAA. We fully rely
on the FMASK decompress pass for the flushing.
Some CB decompress and resolve passes need an explicit flush before and
after.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Use the mechanism of si_decompress_textures, but instead of doing
the actual decompression, just flag the DB cache flush there.
This removes a lot of unnecessary DB cache flushes.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Only do so if there is a shader writing gl_ViewportIndex.
This removes a lot of CPU overhead for the most common case.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Mainly don't (indirectly) call util_format_description here.
If the driver supports texture swizzling, this will always do the right
thing. If the driver doesn't support it, it doesn't matter.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Remove handling of buffers from all texture paths.
This simplifies things for both buffers and textures.
get_sampler_view_format is also cleaned up not to call
util_format_is_depth_and_stencil.
v2: also update st_NewTextureHandle
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
This removes 2 loops from hot codepaths and adds 1 loop to a rare codepath
(restore_sampler_states), and makes sanitize_hash() slightly worse.
Sampler states, when bound, are not unbound for draw calls that don't need
them. That's OK, because bound sampler states don't add any overhead.
This results in lower CPU overhead in most cases.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This has the benefit that we get to set up constants for exactly
the shader stage that needs it.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Now AlphaFunc avoids the blend state update in st/mesa and avoids
_mesa_update_state_locked.
The GL_ALPHA_TEST enable won't trigger blend state updates in st/mesa
after st/mesa stops relying on _NEW_COLOR.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
st/mesa doesn't need the draw bounds for draw calls. I've added the call
where it's necessary in core Mesa and drivers, but I suspect that most
drivers can just move the call to the right places.
The core Mesa places aren't hot paths, so the call overhead doesn't matter
there.
For now, only st/mesa is made such that this function is invoked very
rarely.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
For the default framebuffer, _mesa_resize_framebuffer updates it.
For FBOs, _mesa_test_framebuffer_completeness updates it.
This code is redundant.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The idea is to remove the dependency on _mesa_update_state_locked,
so that st/mesa can skip it for stencil state updates, and then stop
setting _NEW_STENCIL in mesa/main if the driver is st/mesa.
The main motivation is to stop invoking _mesa_update_state_locked for
certain state groups.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
_mesa_update_state will no longer recompute Width/Height if the framebuffer
is complete. We now rely on the FBO completeness check to do it.
The only code that needs to be fixed seems to be this one.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
These locks were added in 2f28a0dc, but I don't see anything in the
intel_miptree_blit path that should make this necessary.
When asked, Kristian says:
I doubt it's needed now with the new blorp. If I remember correctly,
I had to drop the lock there since intel_miptree_blit() could hit
the XY blit path that requires a fast clear resolve. The fast
resolve being meta, would then try to lock the texture again.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Same as the previous commit, but this one was split out because it's
a bit more complicated: this field is given as a pointer to a function,
so the function had to be changed as well, and the function was use in
a bunch of places, which needed updating as well.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
In newer gens, this field has a prefix and the non-IEEEE-745 mode is called
"Alternate", instead of simply "Alt".
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On gen6+, this is called "Dispatch GRF Start Register For Constant/Setup Data
0", while on gen5 and lower it's called only "Dispatch GRF Start Register For
URB Data", but it's essentially the same thing (URB data), so rename it to
match newer gens and simplify the C code that handles it.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
"Pixel Shader Kill Pixel" -> "Pixel Shader Kills Pixel", which is how it's
called on newer gens.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On gen4, WM_STATE only has one Kernel Start Pointer and one GRF Register
Count, but we can make the code that handles this on multiple gens simpler if
we add an index 0 to it too.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just because it's not set doesn't mean that it doesn't exist. And since the
field is there on newer gens, having it on gen5 simplifies the code when
porting gen5 and lower.
Also add missing value to API Mode on CLIP_STATE on gen4.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a bitmask, so it can't be a boolean. Also rename it so it matches
gen6+.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These fields are set by brw_clip_unit, so we need them when converting to
genxml.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unlike the older gen2 hardware, gen3 performs perspective
correct interpolation even for the primary/secondary colors.
To do that it naturally needs us to emit W for the vertices.
Currently we emit W only when at least one texture coordinate
set gets emitted. This means the interpolation of color will
change depending on whether texcoords/varyings are used or not.
That's probably not what anyone would expect, so let's just
always emit W to get consistent behaviour. Trying to avoid
emitting W seems like more hassle than it's worth, especially
as bspec seems to suggest that the hardware will perform the
perspective division anyway.
This used to be broken until it was accidentally fixed it in
commit c349031c27 ("i915: Fix texcoord vs. varying collision
in fragment programs") by introducing a bug that made the driver
always emit W. After fixing that bug in commit c1eedb43f3
("i915: Fix wpos_tex vs. -1 comparison") we went back to the
old behaviour and caused an apparent regression.
Fixes: c1eedb43f3 ("i915: Fix wpos_tex vs. -1 comparison")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101451
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Same as with the colormasks, the blend color needs to be swizzled according
to the rendertarget format.
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Doom shipped with a broken version of GLSLang which handles samplers as
function arguments in a way that isn't spec-compliant. In particular,
it creates a temporary local sampler variable and copies the sampler
into it. While Dave has had a hack patch out for a while that gets it
working, we've never landed it because we've been hoping that a game
update would come out with fixed shaders. Unfortunately, no game update
appears on to be on the horizon and I've found this issue in yet another
application so I think we're stuck working around it. Hopefully, we can
delete this code one day.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99467
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Previously, texture formats were being used unconditionally without
checking. However nv30 supports neither RGBX8 nor R4A4/A4R4 formats. Add
sufficient fallbacks so that the nv30 driver can have working OSD.
Tested on a NV44A/PCI.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
By treating the rectangles as 1cpp, we can run up against some internal
copy engine limits and trigger a MEM2MEM_RECT_OUT_OF_BOUNDS error check
at launch time.
This commit enables the REMAP hardware, which allows us to specify both
the component size and number of components for a transfer. We're then
able to pass in the real width/nblocksx values and not hit the limits.
There's a couple of "supported" CPPs in the list that we can't actually
hit, but are there simply because they're possible.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Aside from reducing pushbuf usage in some situations, this commit should
have no other effect, and is just to make it somewhat obvious that those
methods have zero effect on linear surfaces.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
I just noticed a warning with a non-debug build, but really
this could all be one line, and I'm not even 100% the assert
makes sense here.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In 5f2fe9302c is_geminilake was introduced for the differenciate
broxton from geminilake. Unfortunately I failed as verifying that
is_broxton is throughout the code base to mean Gen9lp.
Fixes: 5f2fe9302c ("intel: common: add flag to identify platforms by name")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In blit_framebuffer we're already doing a NULL
pointer check for readFb and drawFb so it makes
sense to do it before we actually use the pointers.
CID: 1412569
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Inline function SWR_MULTISAMPLE_POS::PrecalcSampleData() was missing
definition. Include definition in core/state_funcs.h.
Fixes windows build.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
V2 (Anuj):
Squash the changes in one patch rebase on master.
Address the review comments made by Francisco Jerez.
Do the URB allocation per slice (not per bank).
V3 (Anuj):
Update the comment.
Format the table as other l3 config tables.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
---
V1 was sent out with the heading:
"i965/cnl: Properly handle l3 configuration"
Adding this variable better explains the computation of L3 way
size in the function.
V2: Use const variable for way_size_per_bank.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When intel_miptree_alloc_non_msrt_mcs fails, fall back to normal blorp
color clear instead of falling back to meta. With this change,
brw_blorp_clear_color can never fail.
v2: Combine two if-statements to remove a level of indentation.
Suggested by Jason.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We call convert_to_single_slice so they may end up with a non-trivial
offset that needs to be taken into account.
v2 (idr): Also set needs_src_offset. Suggested by Jason.
Fixes ES2-CTS.functional.texture.specification.basic_copyteximage2d.cube_rgba
and ES2-CTS.functional.texture.specification.basic_copytexsubimage2d.cube_rgba
on G45.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101284
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There is no intel_miptree_slice_has_hiz function, but there is a
intel_miptree_level_has_hiz function. I assume that's the correct one
to use.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
_mesa_lookup_vao() already returns NULL if id is zero.
v2: - change the conditional (Ian)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v1)
There's no reason we can't -- the mappings we expose are basically
equivalent to persistent/coherent, already.
Improves mesa-demos drawoverhead (no state change) performance by
5.21362% +/- 1.25078% (n=11).
A common user error is to call glDrawRangeElements() with the 'end'
argument being one too large. If we use the vbuf module to translate
some vertex attributes this error can cause us to read past the end of
the mapped hardware buffer, resulting in a crash.
This patch adjusts the vertex count to avoid that issue. Typically,
the vertex_count gets decremented by one.
This fixes crashes with the Unigine Tropics and Sanctuary demos with older
VMware hardware versions. The issue isn't hit with VGPU10 because we
don't hit this fallback.
No piglit changes.
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
First this happens:
1) amdgpu_cs_flush (lock bo_fence_lock)
-> amdgpu_add_fence_dependency
-> os_wait_until_zero (wait for submission_in_progress) - WAITING
2) amdgpu_bo_create
-> pb_cache_reclaim_buffer (lock pb_cache::mutex)
-> pb_cache_is_buffer_compat
-> amdgpu_bo_wait (lock bo_fence_lock) - WAITING
So both bo_fence_lock and pb_cache::mutex are held. amdgpu_bo_create can't
continue. amdgpu_cs_flush is waiting for the CS ioctl to finish the job,
but the CS ioctl is trying to release a buffer:
3) amdgpu_cs_submit_ib (CS thread - job entrypoint)
-> amdgpu_cs_context_cleanup
-> pb_reference
-> pb_destroy
-> amdgpu_bo_destroy_or_cache
-> pb_cache_add_buffer (lock pb_cache::mutex) - DEADLOCK
The simple solution is not to wait for submission_in_progress, which we
need in order to create the list of dependencies for the CS ioctl. Instead
of building the list of dependencies as a direct input to the CS ioctl,
build the list of dependencies as a list of fences, and make the final list
of dependencies in the CS thread itself.
Therefore, amdgpu_cs_flush doesn't have to wait and can continue.
Then, amdgpu_bo_create can continue and return. And then amdgpu_cs_submit_ib
can continue.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101294
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
To avoid useless DCC fetches when DCC is disabled, descriptors
have to be updated in order to reflect this change. This is
quite similar to how we update descriptors of bound textures.
As a side effect, this should also prevent VM faults when
bindless textures are invalidated, because the VA in the
descriptor has to be updated accordingly as well.
I don't see any performance improvements with DOW3.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Needed for updating all resident texture descriptors when
dirty_tex_counter changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We did support single value operand equations, but not single variable
operand ones. In particular we were failing on "$Sampler0Bottleneck".
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The perf infrastructure needs to identify specific platforms, not just
generations.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
and pass the ccs isl surface to blorp instead of creating a
copy.
v2 (Jason): Explain ccs change and use better assert checking
isl_surf_get_mcs_surf()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
v2 (Jason): Drop unused argument in intel_alloc_aux_buffer() and
move assignment of "buf->surf" in intel_alloc_aux_buffer()
into this patch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
v2 (Nanley): Minify depth in case of 3D surface. Also moved to
.c file to get minify() without additional
header inclusions
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
On gen < 6 one doesn't have level or layer specifiers available
for render and depth targets. In order to support rendering to
specific level/layer, driver needs to manually offset the surface
to the desired slice.
There are, however, alignment restrictions to respect as well and
in come cases the only option is to use temporary single slice
surface which driver copies after rendering to the full miptree.
Current alignment workaround introduces new texture images which
are added to the parent texture object. Texture validation later
on copies the additional levels back to the surface that contains
the full mipmap.
This only works for non-arrayed surfaces and driver currently
creates new arrayed images in vain - individual layers within the
newly created are still unaligned the same as before.
This patch drops this mechanism and instead attaches single
temporary slice into the render buffer. This gets immediately
copied back to the mipmapped and/or arrayed surface just after
the render is done.
Sitting on top of earlier series cleaning up the depth buffer
state, this patch additionally fixes the following piglit tests:
arb_framebuffer_object.fbo-generatemipmap-cubemap.g965m64
arb_texture_cube_map.copyteximage cube.g965m64
arb_texture_cube_map.copyteximage cube.ilkm64
arb_pixel_buffer_object.texsubimage array pbo.g965m64
ext_framebuffer_object.fbo-cubemap.g965m64
ext_texture_array.copyteximage 1d_array.g45m64
ext_texture_array.copyteximage 1d_array.g965m64
ext_texture_array.copyteximage 1d_array.ilkm64
ext_texture_array.copyteximage 2d_array.g45m64
ext_texture_array.copyteximage 2d_array.g965m64
ext_texture_array.copyteximage 2d_array.ilkm64
ext_texture_array.fbo-array.g965m64
ext_texture_array.fbo-generatemipmap-array.g965m64
ext_texture_array.gen-mipmap.g965m64
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
If you want to keep it for your driver, please raise your hand.
The prefix will probably have to be added into the driver instead of here.
I cringe when I look at my long renderer string:
Gallium 0.4 on AMD Radeon R9 Fury Series (DRM 3.17.0 / 4.11.0-staging-01277-gab25a9e, LLVM 5.0.0)
I'm sincerely sorry for all apps that detect Mesa by expecting "Gallium"
in the string.
Reviewed-by: Eric Anholt <eric@anholt.net>
The current implementation assumed that these were replaced in GLSL >= 4.10
by gl_Max{Vertex,Fragment}UniformVectors, however this is not true: both
built-ins should be produced from GLSL 4.10 onwards.
This was raised by new CTS tests that are in development.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
st/mesa creates a surface that reinterprets the compressed blocks as
RGBA16UI or RGBA32UI. We have to adjust width0 & height0 accordingly to
avoid out-of-bounds memory accesses by CB.
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The very last entry in the sid_strings_offsets table ended up missing,
leading to out-of-bounds reads and potential crashes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We were exposing 4096, but we can do up to 8192 in Gen4-6 and up to
16384 in gen7+. OpenGL 4.1+ requires at least 16384.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Similar to _mesa_uniform() except that we have to call
validate_uniform_parameters() instead of validate_uniform().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
It would be nice to have a no_error path for
_mesa_test_texobj_completeness() because this function doesn't
only test if the texture is complete.
Anyway, that seems enough for now and a bunch of checks are
skipped with this patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is done by introducing a separate list.
si_decompress_textures() is now 5x faster.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Useful for debugging performance issues when ARB_bindless_texture
is enabled. This query doesn't make a distinction between texture
and image handles.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Effectively there is the same code twice, once for depth and
again for stencil.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
In brw_workaround_depthstencil_alignment() corresponding
renderbuffers are always set to refer to the same temp miptrees.
There is no need to carry them in context.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
In case of gen < 6 stencil (if present) is always combined with
depth. Both stencil and depth attachments point to the same
physical surface.
Alignment workaround starts by considering depth and updates
stencil accordingly. Current logic continues with stencil and
in vain considers the case where depth would refer to different
surface than stencil.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Headers are required only when building with OpenCL. As we're building
w/o it libelf may be missing, hence we'll error out as below:
src/gallium/drivers/r600/evergreen_compute.c:27:10:
fatal error: 'gelf.h' file not found
^
1 error generated.
Fixes: d96a210842 ("r600g,compute: provide local copy of functions from
ac_binary.c")
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Tested-by: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Current version fails to set depthstencil.depth_offset when there
is only stencil attachment (it does set the intra tile offsets
though). Fixes piglits:
g45,g965,ilk: depthstencil-render-miplevels 1024 s=z24_s8
g45,ilk: depthstencil-render-miplevels 273 s=z24_s8
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
In intel_hiz_miptree_buf_create() the miptree is unconditionally
created with MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This patch finishes the work done by Ken of converting SF_STATE to genxml, and
merges it with gen6+ code for emitting that state.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Rename "Use Point Width State" to "Point Width Source". It accepts the same
values and has the same meaning as gen6+, so lets keep them with the same name
to simplify the code.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Binner/clipper read viewport array index from the vertex header as needed.
Move viewport state to BACKEND_STATE.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
The last FE stage can emit render target array index. Currently we only
check to see if GS is emitting it. Moved the state to BACKEND_STATE and
plumbed the driver to set it.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
For certain cases, we perform early z for optimization. The GL_SAMPLES_PASSED
query was providing erroneous results because we were counting the number
of samples passed before the fragment shader, which did not work if the
fragment shader contained a discard.
Account properly for discard and early z, by anding the zpass mask with
the post fragment shader active mask, after the fragment shader.
Fixes the following piglit tests:
- occlusion-query-discard
- occlusion_query_meta_fragments
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Removes large simdvertex stack allocation.
Vertex shader must ensure reads happen before writes.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Add support for dynamic vertex size for the vertex shader output.
Add new state in SWR_FRONTEND_STATE to specify the size.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Move fixed attributes to the top and pack single component SGVs.
WIP to support dynamically allocated vertex size.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
- Remove any special casing in the PS stage when primitive ID is input.
Treat as a normal attribute that must be set up properly in the FE linkage.
- Remove primitive id from the PS_CONTEXT and TRI_FLAGS
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
For the SAMPLE_POS and SAMPLE_INFO opcodes, clarify resource vs. render
target queries, range of postion values, swizzling, etc. We basically
follow the DX10.1 conventions.
For the TXQS opcode and TGSI_SEMANTIC_SAMPLEID, clarify return value
and type.
For the TGSI_SEMANTIC_SAMPLEPOS system value, clarify the range of
positions returned.
v2: use 'undef' for unused vector components. Use (0.5, 0.5, undef, undef)
for sample pos when MSAA not applicable.
v3: Add note that OPCODE_SAMPLE_INFO, OPCODE_SAMPLE_POS are not used yet
and the information is subject to change.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
To fix the build when VMX86_STATS is defined.
Also, some minor whitespace changes to match upstream code.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The swr driver uses vertex_buffer->stride to determine the number
of elements in a VBO. A recent change to the state-tracker made it
possible for VBO's with stride=0. This resulted in a divide by zero
crash in the driver. The solution is to use the pre-calculated vertex
element stream_pitch in this case.
This patch fixes the crash in a number of piglit and VTK tests introduced
by 17f776c27b.
There are several VTK tests that still crash and need proper handling of
vertex_buffer_index. This will come in a follow-on patch.
v2: Correctly update all parameters for VBO constants (stride = 0).
Also fixes the remaining crashes/regressions that v1 did
not address, without touching vertex_buffer_index.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
It could be useful to get the number of emited resolve operations when
doing driver optimizations.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Now that we support RB swapped targets by using a shader variant, we
must derive the color mask from both the blend state and the bound
framebuffer.
Fixes piglit: fbo-colormask-formats
Fixes: 7f62ffb68a ("etnaviv: add support for rb swap")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This replaces the open coded etnaviv version of the color pack with the
common util_pack_color.
Fixes piglits:
arb_color_buffer_float-clear
fcc-front-buffer-distraction
fbo-clearmipmap
Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
etna_resource_copy_region handles resources with multiple samples
by falling back to the software path. There is no need to kill the
application there.
Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
When copying a resource fully we can just blit the whole level. This allows
to use the RS even for level sizes not aligned to the RS min alignment. This
is especially useful, as etna_copy_resource is part of the software fallback
paths (used in etna_transfer), that are used for doing unaligned copies.
Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
If the blit region is not aligned to the RS min alignment don't try
to execute the blit, but fall back to the software path.
Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This is a verbatim copy of the code. The functions can be cleaned up since
r600 does not use all the stuff that gcn does.
The symbol names have been changed since we still use ac_binary.h header
(for struct definition)
v2: Add ifdef guard around r600_binary_clean call (Aaron)
Remove stray comment
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Tested-By: Aaron Watry <awatry@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Imagine there are 2 threads that both call _eglGetNativePlatform()
simultaneously:
- thread 1 completes the first "if (native_platform ==
_EGL_INVALID_PLATFORM)" check and is preempted to do something else
- thread 2 executes the whole function, does "native_platform =
_EGL_NATIVE_PLATFORM" and just before returning it's preempted
- thread 1 wakes up and calls _eglGetNativePlatformFromEnv() which
returns _EGL_INVALID_PLATFORM because no env vars are set, updates
native_platform and then gets preempted again
- thread 2 wakes up and returns wrong _EGL_INVALID_PLATFORM
Solve this by doing the detection in a local var and only overwriting
the global one at the end, if no other thread has updated it since.
This means the platform detected in the thread might not be the platform
returned by the function, but this is a different issue that will need
to be discussed when this becomes possible.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101252
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
My refactor missed the fact that `native_platform` is static.
Add the proper guard around the detection code, as it might not be
necessary, and only print the debug message when a detection was
actually performed.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101252
Fixes: 7adb9b0948 ("egl/display: remove unnecessary code and
make it easier to read")
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
The new generic checks were actually more restrictive than the previous svga-
specific tests and not vice versa. So bypass the common format checks for
copy_region_vgpu10.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
The blit.dst.resource member that was used as destination was
modified earlier in the function, effectively making us try to blit
the content onto itself. Fix this and also add a debug printout when the
format conversion blits fail.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
This fixes a tf2 srgb copy_region regression from
"svga: Rework the blit and resource_copy_region functionality v3"
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This reduces the number of cpu copy_region fallbacks on a Nvidia system
running the piglit command
./publish/bin/piglit run -1 -t copy -t blit tests/quick
from 64789 to 780
Previously this has caused a regression in piglit test
spec@!opengl 1.0@gl-1.0-scissor-copypixels, but I'm currently not able to
reproduce that regression.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The blitter has functions to save and restore the conditional rendering state,
but we currently don't save the needed info.
Since also the copy_region_vgpu10 path supports conditional blitting,
we instead use the same function as the clearing routines and move
that function to svga_pipe_query.c
Note that we still haven't implemented conditional blitting with
the software fallbacks.
Fixes piglit nv_conditional_render::copyteximage
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
It seems like the SVGA tests are in general more stringent than the utility
tests, but they also miss some blitter features like filters and window
rectangles, and if new blitter features are added in the future, it might
be possible that we forget adding tests for those.
So in addition to the SVGA tests, use the utility tests to restrict the
situations where we can use copy_region.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This work was initially trigged by the fact that imported surfaces may
be backed by other SVGA3D formats than the default. Therefore some fixes were
needed to avoid using the copy_region_vgpu10() functionality for incompatible
SVGA3D formats where the pipe formats were OK. This situation happens when
using dri3.
Also in some situations, for example where a R8G8_UNORM surface is backed by
an SVGA3D_NV12 format, we can't use the copy_region functionality at all and
thus need to fall back to the quad blitter also for the resource_copy_region
function. This situation doesn't happen currently, but will if we start using
video textures.
The patch makes the blit- and copy_region paths similar and the decision whether
to use a certain gpu command should now be easy to locate. Probably the
resource_copy_region path will suffer from a minor additional cpu overhead,
but on the other hand there are more cases now that we accelerate, since
we try harder before falling back to cpu copies / blits.
v2: Addressed review comments and fixed up piglit failures by sometimes
preferring cpu_copy_region() over blit().
v3: Removed a stray test statement. Updated commit message.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We need to fall back in a couple of cases:
- Sandybridge (it just doesn't do this in hardware)
- Occlusion queries on Gen7-7.5 with command parser version < 2
- Transform feedback overflow queries on Gen7, or on Gen7.5 with
command parser version < 7
In these cases, we printed a perf_debug message and fell back to
_mesa_check_conditional_render(), which stalls until the full
query result is available. Additionally, the code to handle this
was a bit of a mess.
We can do better by using our normal conditional rendering code,
and setting a new state, BRW_PREDICATE_STATE_STALL_FOR_QUERY, when
we would have set BRW_PREDICATE_STATE_USE_BIT. Only if that state
is set do we perf_debug and potentially stall. This means we avoid
stalls when we have a partial query result (i.e. we know it's > 0,
but don't have the full value). The perf_debug should trigger less
often as well.
Still, this is primarily intended as a cleanup.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Unneeded, since the PKG_CHECK_MODULES macro already does the
substitution of the package Cflags/Libs.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
As described inline - follow what's written in the manual and what works
for all platforms that Mesa supports.
We want to untangle things leaving only -pthread, yet that has a
potential of causing regressions. Thus we'll do it as a follow-up patch.
As a nice side-effect this resolves issues, where the system lacks
libpthread.so, yet the linker does not warn about it and we and up with
unresolved symbols.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101071
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The storage was once used by get_sampler_uniform_value() but that
was fixed long ago to use the uniform storage assigned by the
linker.
By not assigning storage for images/samplers the constant buffer
for gallium drivers will be reduced which could result in small
perf improvements.
V2: rebase on ARB_bindless_texture
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We shouldn't use the wide line stage if the line width is 1.
This check isn't strictly needed because all drivers are (now)
specifying a line wide threshold of at least 1.0 pixels, but
let's play it safe.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The line stipple fallback code for virtual HW version 8 didn't work.
With HW version 8, we were getting zero when querying the max line
widths (AA and non-AA). This means we were setting the draw module's
wide line threshold to zero. This caused the wide line stage to always
get enabled. That caused the line stipple module to fall because the
wide line stage was clobbering the rasterization state with a state
object setting the line stipple pattern to 0xffff.
Now the wide_lines variable in draw's validate_pipeline() will not
be incorrectly set.
Also improve debug output.
BTW, also this fixes several other piglit tests: polygon-mode,
primitive- restart-draw-mode, and line-flat-clip-color since they
all use the draw module fallback.
See VMware bug 1895811.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The SCons build has been using 10 digits of the git hash id for the
MESA_GIT_SHA1 string in git_sha1.h for about a year now. I bumped it
up after running into a case where a 7-digit hash ID was ambiguous.
This patch makes the same change for the autotools build.
The command "git log | grep "^commit" | cut -b 8-14 | sort | uniq -d"
shows there are currently 17 cases where 7 digits of hash id are
ambiguous on master (probably quite a few more if we'd consider other
branches).
Instead of using "git log -n 1 --oneline" use
"git rev-parse --short=10 HEAD" to get the HEAD hash id.
v2: use printf instead of sed, per Eric's suggestion.
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This follows the model of imx (display) and etnaviv (render): pl111 is a
display-only device, so when asked to do GL for it, we see if we have a
vc4 renderer, make the vc4 screen, and have vc4 call back to pl111 to do
scanout allocations.
The difference from etnaviv is that we share the same BO between vc4 and
pl111, rather than having a vc4 bo and a pl11 bo and copies between the
two. The only mismatch between their requirements is that vc4 requires
4-pixel (at 32bpp) stride alignment, while pl111 requires that stride
match width. The kernel will reject any modesets to an incorrect stride,
so the 3D driver doesn't need to worry about that.
v2: Rebase on Android rework, drop unused include.
v3: Fix another Android bug, from Rob Herring's build-testing.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Note that for requests for Prime FDs or flink names, we return handles to
the etanviv BO, not the scanout BO. This is at least better than previous
behavior of returning GEM handles for a request for an FD or flink name.
And add an assert that renderonly_get_handle is only used for getting the
GEM handle.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The rules to generate egd_tables.h are added in Android makefile
Fixes: f42fb00 "r600/eg: add support for tracing IBs after a hang."
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Adds libmesa_git_sha1 static (dummy) library to generate git_sha1.h
with some polishing to header dependency on .git/HEAD and scripted rules.
The now redundant generation rules are removed from Android.gen.mk
libmesa_git_sha1 whole static depedency is added to libmesa_pipe_svga,
libmesa_dricore and libmesa_st_mesa modules
Fixes the following building error:
external/mesa/src/gallium/drivers/svga/svga_screen.c:26:10:
fatal error: 'git_sha1.h' file not found
^
1 error generated.
Fixes: 1ce3a27 ("svga: Add the ability to log messages to
vmware.log on the host.")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We were not considering as multiple fixes lines with:
Fixes: $sha_1, Fixes: $sha_2
Now, we split the lines so we will consider them individually, as in:
Fixes: $sha_1,
Fixes: $sha_2
Additionally, we try to get the SHA from split lines so:
Fixes:
$sha_1
Will be considered as:
Fixes: $sha_1
v2:
- Treat empty spaces earlier in fix lines (Emil)
- Fold 2 lines into one to gather fix commit ids (Emil)
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
We were parsing the whole diff, although the candidates were
identified only by the commit message.
Now, we only use the commit message for parsing.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
This implements a way to import FDs with modifiers on plain GBM devices,
without the need to go through EGL. This is mostly to the benefit of
gbm_gralloc, which can keep its dependencies low.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
We tend to use the sources, as opposed to EXTRA_DIST to include the
headers.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
It calling itself recursively prevented it from being inlined, resulting
in a copy being generated in every compilation unit referencing it. This
bloated the text segment of the Gallium mega-driver *_dri.so by ~4%,
and might also have impacted performance.
Fixes: ecd6fce261 ("mesa/st: support lowering multi-planar YUV")
v2:
* Add comment above pipe_resource_next_reference [Samuel Pitoiset]
v3:
* Use loop to unreference the full chain of resources referenced via
the next members [Timothy Arceri]
v4:
* Stop chasing ->next chain at the first sub-resource which isn't
destroyed [Nicolai Hähnle]
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ISL already has all of the complexity required to figure out the correct
surface pitch and size taking tile alignment into account. When we get
a surface out of ISL, the pitch and size are already correct and using
brw_bo_alloc_tiled_2d doesn't actually gain us anything other than extra
asserts we have to do in order to ensure that the bufmgr code and ISL
agree. This new helper doesn't try to be smart but just allocates the
BO you ask for and sets up the tiling.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before, we weren't setting step rate so we got whatever old value
happened to be lying around. This can lead to some interesting
rendering errors. In particular, if you run the OpenGL ES CTS with
dEQP-GLES3.functional.instanced.types.mat2x4 immediately followed by one
of the dEQP-GLES3.functional.transform_feedback.* tests, the transform
feedback test gets stale instancing data from the other test and fails.
The only thing that is causing this to not be a problem today is that we
use meta for clears and meta is setting up vertex buffers via the VBO or
non-interleaved path and setting step_rate to 0 for us. When blorp
depth/stencil clears are enabled, meta is no longer sitting between the
two tests and the stale data starts causing noticeable problems.
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Instance divisor is a property of the vertex buffer and not the vertex
element so if we ever see anything other than 0, bail.
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
On gen5-6, SeparateStencilBufferEnable and HierarchicalDepthBufferEnable
come hand in hand and we have to set either both or neither.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This ensures that we get the correct layout for all stencil buffers, not
just those which are created as separate stencil for a depth buffer.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The CL CTS queries the max allocation size, and then attempts to
allocate buffers of that size. If not enough contiguous RAM/VRAM is
available, this causes errors in the radeon kernel module due to
inability to allocate the required memory.
It's a bit of a hack, but experimentally on my system, I can use ~3/4
of the card's VRAM for a single global/constant buffer allocation given
current GUI/compositor use.
For a 1GB Pitcairn (HD7850) this gets me from the reported clinfo values of:
Global memory size 2143076352 (1.996GiB)
Max memory allocation 1500153446 (1.397GiB)
Max constant buffer size 1500153446 (1.397GiB)
To:
Global memory size 2143076352 (1.996GiB)
Max memory allocation 751619276 (716MiB)
Max constant buffer size 751619276 (716MiB)
Fixes: OpenCL CTS test/conformance/api/min_max_mem_alloc_size,
OpenCL CTS test/conformance/api/min_max_constant_buffer_size
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This updates the Gen4-5 code to use a line end cap width of 0.5
for non-smooth lines, and 1.0 for smooth lines - which is what we
do on Gen6+.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
This unifies the Gen4-5 and Gen6+ line width calculations.
I believe it also fixes a bug - we weren't rounding the line width
to the nearest integer. The GL 4.5 (and GL 2.1) specs "Wide Lines"
section says:
"The actual width of non-antialiased lines is determined by rounding
the supplied width to the nearest integer, then clamping it to the
implementation-dependent maximum non-antialiased line width."
We don't need to care about _NEW_MULTISAMPLE here because multisampling
doesn't exist on Gen4-5, so the state shouldn't change.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
This effectively reverts Robert Ellison's 2009 commit
cc8afbd386.
I'm not seeing any GL spec text indicating that UPPER won't work.
On Gen6+, this bit moved to 3DSTATE_WM as a single bit, controlling
UPPER_LEFT vs. UPPER_RIGHT. There is no way to request LOWER_RIGHT,
so UPPER_RIGHT is the best you can do.
In the G45 docs, it's marked as "Reserved" as well, but we just
decided to use it anyway.
This patch unifies the behavior between Gen4-5 and Gen6+.
Note that this is separate from point sprite texcoord behavior.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Modern GL specifications say that the point size should be 1.0 when
gl_PointSize is unwritten and the last enabled stage is a geometry
or tessellation shader. If it's a vertex shader, though, both the
GL specs and ES 3.0 spec say that it's undefined - so since Gen4-5
only support vertex shaders, there's no actual requirement to do this.
Since there is a cost associated (an extra dirty bit, which may cause
SF_STATE to be emitted more often), it may not be a good idea.
The real benefit is that it makes all generations behave identically.
And that seems somewhat nice...
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Apparently, Nanhai made the Gen4-5 point size calculations round to the
nearest integer in commit 8d5231a358,
"according to spec". When Eric first ported the driver to Sandybridge,
he did not implement this rounding.
In the GL 2.1 and 3.0 specs "Basic Point Rasterization" section, it does
say "If antialiasing and point sprites are disabled, the actual width is
determined by rounding the supplied width to the nearest integer, then
clamping it to the implementation-dependent maximum non-antialised point
width."
In contrast, GL 3.1 and later do not appear to contain this rounding.
It might be reasonable to round, given that we only implement GL 2.1.
Of course, if we were to do that, we should actually implement the AA
vs. non-AA distinction. Brian added an XXX comment reminding us to fix
this 10 years ago, but it never happened.
I think a better plan is to follow the newer, unrounded behavior. This
is what we do on Gen6+ and it passes all the relevant conformance tests.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
According to the docs, a simple CS stall is insufficient to ensure that
the memory from the flush is visible and an end-of-pipe sync is needed.
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These two functions contain almost identical logic except for one SNB
workaround required for render target cache flushes. They may as well
call into the same code so we only have to handle the work-arounds in
one place.
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I forgot to add this when introducing the new key field. It doesn't
happen often - just with the Unigine workarounds. But we may as well
have it, so we get an accurate picture of why recompiles happen.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
wpos_tex used to be a GLuint so assigning -1 to it and
later comparing with -1 worked correctly, but commit
c349031c27 ("i915: Fix texcoord vs. varying collision in
fragment programs") changed wpos_tex to uint8_t and hence
broke the comparison. To fix this define a more explicit
invalid value for wpos_tex.
gcc warns us:
i915_fragprog.c:1255:57: warning: comparison is always true due to limited range of data type [-Wtype-limits]
if (inputsRead & VARYING_BITS_TEX_ANY || p->wpos_tex != -1) {
^
And clang says:
i915_fragprog.c:1255:57: warning: comparison of constant -1 with expression of type 'uint8_t' (aka 'unsigned char') is always true [-Wtautological-constant-out-of-range-compare]
if (inputsRead & VARYING_BITS_TEX_ANY || p->wpos_tex != -1) {
~~~~~~~~~~~ ^ ~~
Cc: Chih-Wei Huang <cwhuang@android-x86.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Fixes: c349031c27 ("i915: Fix texcoord vs. varying collision in fragment programs")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Testing with zlib versions 1.2.{3,4,5,6,7,8} showed no difference in
functionality, correctness, or zlib API usage and 1.2.3 is the oldest
version available in still actively deployed production Linux
distributions (RHEL/CentOS 6 and SuSE 11).
Build 17.1.1 against the system supplied zlib-devel packages for 1.2.3
in EL6 and 1.2.7 on EL7. I then swapped out the zlib version at runtime
via LD_LIBRARY_PATH with ones build from the release tarballs from
zlib.net
Testwise - I ran the piglit shader profile with --quick addded to the
tests since I figured that would exercise the shader cache, which would
in turn use zlib.
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
[Emil Velikov: add hunk about version/piglit testing]
Acked-by: Emil Velikov <emil.velikov@collabora.com>
When a buffer becomes resident, check if it has been invalidated,
if so update the descriptor and the dirty flag.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When texture buffers are invalidated the addr in the resident
descriptor has to be updated but we can't create a new descriptor
because the resident handle has to be the same.
Instead, use the WRITE_DATA packet which allows to update memory
directly but graphics/compute have to be idle in case the GPU is
reading the descriptor.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When the current bound shaders don't use any bindless textures
or images, it's useless to decompress the resident resources.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This adds some new helper functions to know if the current draw
call (or dispatch compute) is using bindless samplers/images,
based on TGSI analysis.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Similar to the existing decompression code path except that it
loops over the list of resident textures/images.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Analogous to bound textures/images. We should also update the
resident descriptors and disable COMPRESSION_EN for avoiding
useless DCC fetches, but I postpone this optimization for a
separate series.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This won't help much except for applications that use a ton
of resident handles. Though, this will reduce the winsys
overhead a little bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Resident buffers have to be added to every new command stream.
Though, this could be slightly improved when current shaders
don't use any bindless textures/images but usually applications
tend to use bindless for almost every draw call, and the winsys
thread might help when buffers are added early.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This implements the Gallium interface. Decompression of resident
textures/images will follow in the next patches.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For each texture/image handles, we need to allocate a new
buffer for the bindless descriptor. But when the number of
buffers added to the current CS becomes high, the overhead
in the winsys (and in the kernel) is important.
To reduce this bottleneck, the idea is to suballocate the
bindless descriptors using a slab similar to the one used
in the winsys.
Currently, a buffer can hold 1024 bindless descriptors but
this limit is arbitrary and could be changed in the future
for some reasons. Once a slab is allocated the "base" buffer
is added to a per-context list.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will be used in order to initialize resident descriptors
for bindless textures/images.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The ARB_bindless_texture spec say:
"If ARB_seamless_cubemap (or OpenGL 4.0, which includes it) is
supported, the per-context seamless cubemap enable is ignored
and treated as disabled when using texture handles."
"If AMD_seamless_cubemap_per_texture is supported, the seamless
cube map texture parameter of the underlying texture does apply
when texture handles are used."
The per-context seamless cubemap flag should only be enabled for
bound textures/samplers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When a texture (or an image) instruction uses a bindless sampler
(respectively a bindless image), make sure the DCE pass won't
remove code when the resource is a temporary variable.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes a 64-bit vs 32-bit mismatch when setting an array
of bindless samplers. Also, we need to unconditionally set
size_mul to 2 when the underlying uniform is bindless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When a bindless sampler/image is bound to a texture/image unit,
we have to overwrite the constant value by the resident handle
directly in the constant buffer before the next draw.
One solution is to keep track of a pointer to the data.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is analogous to the existing SamplerUnits and SamplerTargets,
but it loops over bindless samplers bound to texture units.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will also be used for looping over bindless samplers bound
to texture units.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will also be used for looping over bindless samplers bound
to texture units.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Yes, ARB_bindless_texture allows to do this. In other words, in
a situation like:
layout (bindless_sampler) uniform sampler2D tex;
The 'tex' sampler uniform can be either set with glUniform1()
(old-style bound samplers) or with glUniformHandleui() (resident
handles).
When glUniform1() is used, we have to somehow make the texture
resident "under the hood". This is done by requesting a texture
handle to the driver, making the handle resident in the current
context and overwriting the value directly in the constant buffer.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Old-style samplers (ie. bound samplers) are stored as
PROGRAM_SAMPLER, while bindless ones are PROGRAM_UNIFORM.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Bindless samplers are considered PROGRAM_UNIFORM but
add_uniform_to_shader::visit_field() is based on glsl_type.
Because only ir_variable knows if the uniform variable is
bindless via ir_variable::bindless, store it instead of
adding a new parameter to visit_field().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The ARB_bindless_texture spec says:
"The error INVALID_OPERATION is generated by BufferData if it is
called to modify a buffer object bound to a buffer texture while
that texture object is referenced by one or more texture handles."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The ARB_bindless_texture spec says:
"The error INVALID_OPERATION is generated by TexImage*, CopyTexImage*,
CompressedTexImage*, TexBuffer*, TexParameter*, as well as other
functions defined in terms of these, if the texture object to be
modified is referenced by one or more texture or image handles."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The ARB_bindless_texture spec says:
"The ARB_bindless_texture spec says: "The error INVALID_OPERATION
is generated by TexImage*, CopyTexImage*, CompressedTexImage*,
TexBuffer*, TexParameter*, as well as other functions defined in
terms of these, if the texture object to be modified is referenced
by one or more texture or image handles."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The ARB_bindless_texture spec says:
"The error INVALID_OPERATION is generated by SamplerParameter* if
<sampler> identifies a sampler object referenced by one or more
texture handles."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Bindless sampler/image handles are represented using 64-bit
unsigned integers.
The ARB_bindless_texture spec says:
"The error INVALID_OPERATION is generated by UniformHandleui64{v}ARB
if the sampler or image uniform being updated has the "bound_sampler"
or "bound_image" layout qualifier"."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This adds support in the VBO and array code to handle unsigned
64-bit vertex attributes as specified by ARB_bindless_texture.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Needed for bindless handles which are represented using
64-bit unsigned integers. All hash table implementations should
be uniformized later on.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This helper function will be used for managing dynamic arrays of
resident texture/image handles.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
clinfo no longer reports my discrete GCN card as unified memory
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The "family" name is often more informative than the "marketing" name. More
importantly, applications, like for example Wine, may recognise GPUs based on
the existing "family" names.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
I've since discovered the fragment shader sample mask system value (which
corresponds to gl_SampleMaskIn).
v2: It's a system value, not a shader input.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Some drivers require that the vertex buffers be unmapped prior to
drawing. This change unmaps the stream_uploader buffer after we've
uploaded the zero-stride attributes (unless the driver supports
rendering with mapped buffers).
This fixes a regression in the VMware driver since 17f776c27b.
Some Mesa demos such as mandelbrot and brick would display black
quads instead of the expected rendering.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Unnamed struct types are now equal across stages based on the fields they
contain, so overriding the type to make sure names match has become
unnecessary.
The check was originally introduced in commit 955c93dc08 ("glsl: Match
unnamed record types across stages.")
v2: clarify the commit message
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Previously, if an unnamed and a named struct contained the same fields,
they were considered the same type during linking of globals.
The discussion around commit e018ea81bf ("glsl: Structures must have
same name to be considered same type.") doesn't seem to have considered
this thoroughly, and I see no evidence that an unnamed struct should
ever be considered to be the same type as a named struct.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
As a result, unnamed structs defined in different places of the program
are considered the same types if they have the same fields in the same
order.
This will simplify matching of global variables whose type is an unnamed
struct.
It also fixes a memory leak when the same shader containing unnamed
structs is compiled over and over again: instead of creating a new type
each time, the existing type is re-used.
Finally, this does have the effect that some previously rejected programs
are now accepted, such as:
struct {
float a;
} s1;
struct {
float a;
} s2;
s2 = s1;
C/C++ do not allow that, but GLSL does seem to want to treat unnamed
structs with the same fields as the same type at least during linking
(and apparently, some applications require it), so it seems odd to treat
them as different types elsewhere.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This changes the logic during the conversion of the declaration list
struct S {
...
} v;
from AST to IR, but should not change the end result.
When assigning the type of v, instead of looking `S' up in the symbol
table, we read the type from the member variable of ast_struct_specifier.
This change is necessary for the subsequent change to how anonymous types
are handled.
v2: remove a type override when redefining a structure; should be
the same type in that case anyway
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
By splitting glsl_type::mutex into two, we can avoid dropping the hash
mutex while creating the new type instance (e.g. struct/record,
interface).
This fixes a time-of-check/time-of-use race where two threads would
simultaneously attempt to create the same type but end up with different
instances of glsl_type.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Based on the same logic in the i965 driver 2f225f6145 and
16060c5adc.
perf reports st_finalize_texture() going from 0.60% -> 0.16% with
this change when running the Xonotic benchmark from PTS.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Just use a temporary 16-bit index.
This fixes coverity issue, pointed to me by Ilia.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The max_array_access field applies to the first dimension, which means
we only want to set it for the 1D clip dist arrays.
This fixes an ir_validate assert seen with
KHR-GL44.cull_distance.functional
on nouveau and radeon with debug builds.
Fixes: a08c4ebbe (glsl: rewrite clip/cull distance lowering pass)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The resolve code looks at the current color draw buffers. These are not
valid until intel_prepare_render() is called. You can end up with one
color buffer bound, but where the renderbuffer has zero width/height and
no miptree allocated.
You can get a call chain like: _mesa_Clear -> _mesa_update_state ->
intel_update_state, where no brw driver hooks were called, so there is
no other point at which we could have called this.
Fixes crashes in KWin where Clear was causing intel_disable_rb_aux_buffer
to crash on irb != NULL but irb->mt == NULL.
According to Tapani, this also fixes crashes seen on Android.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Fixes trace dumping crash for SI or when RADV_DEBUG=noibs is set.
Fixes: 97dfff5410 "radv: Dump command buffer on hang."
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Before bcae327469 this was emitting CP DMA packet even on SI, but
apparently hasn't caused too many problems. After that commit the
CP DMA code now always sets the CIK+ only bit for prefetch. Just
follow radeonsi there and don't try to prefetch at all.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101334
Fixes: bcae327469 "radv: realign cp dma code with radeonsi"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The register header (and radeonsi comment) states V_411_SRC_ADDR_TC_L2
is for CIK+ only, so let's assert on earlier ASICs.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This patch adds support for the EGL_KHR_partial_update extension for
android platform. It passes 36/37 tests in dEQP for EGL_KHR_partial_update.
1 test not supported.
v2: add fallback for eglSetDamageRegionKHR (Tapani)
v3: The native_window_set_surface_damage call is available only from
Android version 6.0. Reintroduce the ANDROID_VERSION guard and
advertise extension only if version is >= 6.0. (Emil Velikov)
v4: use newly introduced ANDROID_API_LEVEL guard rather than
ANDROID_VERSION guard to advertise the extension.The extension
is advertised only if ANDROID_API_LEVEL >= 23 (Android 6.0 or
greater). Add fallback function for platforms other than Android.
Fix possible math overflow. (Emil Velikov)
Return immediately when n_rects is 0. Place function's entrypoint
in alphabetical order. (Eric Engestrom)
v5: Replace unnecessary calloc with malloc (Eric)
Check for BAD_ALLOC error (Emil)
Check for error in native_window_set_damage_region. (Emil, Tapani,
Eric).
Signed-off-by: Harish Krupo <harish.krupo.kps@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
softpipe throws integer division by zero exceptions on windows
when using % with integers in a geometry shader.
v2: Made error results consistent with existing div/mod zero handling in
tgsi. 64 bit signed integer division by zero returns zero like in
micro_idiv, unsigned returns ~0u like in micro_udiv.
Modulo operations always set all result bits to one (like in
micro_umod).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
v2 (Anuj):
Rebased on master and updated pci ids
Remove redundant initialization of max_wm_threads to 64 * 12.
For gen9+ max_wm_threads are initialized in gen_get_device_info().
v3 (Anuj):
Move the patch to end of series.
Remove unused gt1, gt2, gt3 functions.
Remove l3_banks variable. Variable is now available on master.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This will prevent the driver from even trying to work on Cannon Lake
until we get actual support added.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
v1: By Ben Widawsky <benjamin.widawsky@intel.com>
v2: v1 had an assert only for VS. Add the restriction for GS, HS and
DS as well and make sure the allocated sizes are not multiple of 3.
v3: Move the entry_size checks in to compiler code (Ken)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As sRGB now supports lossless compression, we also need to stop resolving
single sampled color render buffers for sRGB formats in Gen 10.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
V2: Start using gen10 functions isl_gen10*(), gen10_blorp_exec()
gen10_init_atoms() (Jason)
Remove Vulkan changes. Do them later in a separate patch.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This commit adds a gen10 case to the switch statement and
drops some unneeded code for handling gen numbers which
doesn't work on gen10 and above.
V2: Drop "z = float(z)" and the "z *= 10" lines
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
These declarations will help the code start compiling
once we wire up the makefiles for gen10. Later patches
will start using these functions for gen10.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
V2(Anuj):
Add default value for length of 3DPRIMITIVE command
Add values for 'Attribute Active Component Format'
Rename few fields to match gen9.xml
V3 (Ander Conselvan de Oliveira)
Add gen10 alias for MOCS
Make 3DSTATE_CONSTANT_BODY on Gen10 use arrays
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
All the "features" of the hardware are similar starting with GEN8, so remove as
much of the GEN9 uniqueness as possible. This makes implementing future gen
platforms a bit easier.
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The shader reads the descriptor to decide if it should take the
fmask value, however we weren't initing it always, which meant
random crap, esp with MSAA depth textures.
Fixes random hangs with:
dEQP-VK.glsl.builtin_var.fragdepth.*
v2: check fmask_state is not NULL
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes regressions from commits e0a9b261e5 and a16355d67d by
neutering async mappings on non-LLC to be synchronous, like they were
before those two commits. :(
The failing tests include
piglit-test piglit.spec.nv_primitive_restart.primitive-restart-vbo_index_only
piglit-test piglit.spec.nv_primitive_restart.primitive-restart-vbo_combined_vertex_and_index
piglit-test piglit.spec.nv_primitive_restart.primitive-restart-vbo_separate_vertex_and_index
piglit-test piglit.spec.nv_primitive_restart.primitive-restart-vbo_vertex_only
piglit-test piglit.spec.arb_pixel_buffer_object.texsubimage-unpack pbo
Since we created the file, we should be able to reopen it for appending, but
some weird filesystem error could cause that to be false. So simply check
whether we could reopen it or not.
CID: 1177144
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ask the driver for supported modifiers for a given format.
v2: move to __DRIimageExtension v16.
v3: fail if the supplied format is not supported by driver.
v4: purge PIPE_CAP_QUERY_DMABUF_ATTRIBS.
v5:
- move to __DRIimageExtension v15, pass external_only to the driver.
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de> (v4)
Cc: Lucas Stach <l.stach@pengutronix.de>
format modifiers tokens are driver specific, and hence, need to come
in from the driver. this allows drivers to be queried for supported
format modifiers for EGL_EXT_image_dma_buf_import_modifiers.
v2: rebase to master.
v3: drivers must return false on query failure.
v4: use pscreen->is_format_supported instead of adding a separate
format query handle, remove PIPE_CAP_QUERY_DMABUF_ATTRIBS.
(Lucas Stach)
v5: add external_only parameter.
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
ask the driver for supported dmabuf formats
v2: rebase to master.
v3: return false on failure.
v4: use pscreen->is_format_supported instead of adding a new query.
(Lucas Stach)
v5: stylefix to conform to formatting rules (Brian Paul). add fourcc list
here instead of using struct image_format from v4.
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de> (v4)
Cc: Lucas Stach <l.stach@pengutronix.de>
support importing dmabufs into DRIimage while taking format modifiers
in account, as per DRIimage extension version 15.
v2: initialize winsys modifier to DRM_FORMAT_MOD_INVALID (Daniel Stone)
v3: do not bump DRIimageExtension version. split out winsys changes.
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
adds a pscreen->resource_create_with_modifiers() to create textures
with modifier.
v2:
- stylefixes (Emil Velikov)
- don't return selected modifier from resource_create_with_modifiers. we can
use the winsys_handle to get this.
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de> (v1)
Cc: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
return the modifier selected by the driver when creating this image.
v2: since we can use winsys_handle->modifier to serve these, remove
DRIimage->modifier from v1.
use DRM_API_HANDLE_TYPE_KMS instead of DRM_API_HANDLE_TYPE_FD to avoid
ownership transfer. (Lucas)
Suggested-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
_NEW_SCISSOR mesa flag is set when a scissor test is enabled/disabled
or when a new rectangle is defined. However, it triggers too much
changes in the state tracker.
Actually, ST_NEW_RASTERIZER should only be called when a scissor
test is enabled/disabled, while ST_NEW_SCISSOR should be called
in both situations.
In other words, this will avoid to update the rasterizer every
time a new rectangle is defined using glScissor*().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Specification states that in case of error, value should not be
written, patch changes buffer age queries to return -1 in case of
error so that we can skip changing the value.
In addition, small change to droid_query_buffer_age to return 0
in case buffer does not have a back buffer available.
Fixes:
dEQP-EGL.functional.negative_partial_update.not_postable_surface
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
We have some features that seem to slow things down or cause other
possible undesireable side effects, but it would be nice to test
games etc with them easily.
I forsee multisample DCC and maybe some shader opt changes using this.
For now use it for batch chaining.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Here we make some assumptions about the AEcontext and set the
recalculate bools directly.
Some formating fixes are also made while we are here.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The code comment which seems to have been added in cab974cf6c
(from year 2000) says:
"Set ctx->NewState to zero to avoid recursion if
Driver.UpdateState() has to call FLUSH_VERTICES(). (fixed?)"
As far as I can tell nothing in any of the UpdateState() calls
should cause it to be called recursively.
V2: add a wrapper around the osmesa update function so it can still
be used internally.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
LLVMAddEarlyCSEMemSSAPass() is defined in LLVM 4.0.
Fixes: 257b538 ("radeonsi: do EarlyCSEMemSSA LLVM pass)
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The workaround causes a massive performance decrease on 1-SE parts.
(Cape Verde, Hainan, Oland)
The performance regression is already part of 17.0 and 17.1.
v2: check tess_uses_prim_id
Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This and the previous clip_regs commit decrease IB sizes and the number of
si_update_shaders invocations as follows:
IB size si_update_shaders calls
Borderlands 2 -10% -27%
Deus Ex: MD -5% -11%
Talos Principle -8% -30%
v2: always dirty cb_render_state in set_framebuffer_state
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2: Rebase and reuse tiling/modifier map. (Daniel Stone)
v3: bump DRIimageExtension to version 15, fill external_only array.
v4: Y-tiling works since gen 6
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Add support for createImageFromDmaBufs2, adding a modifier to the
original, and allow importing CCS resources with auxiliary data from
dmabufs.
v2: avoid DRIimageExtension version bump, pass single modifier to
createImageFromDmaBufs2.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Intel hardware requires that all planes of an image come from the same
buffer, which is currently implemented by testing that all FDs are
numerically the same.
However, when going through a winsys (e.g.) or anything which transits
FDs individually, the FDs may be different even if the underlying buffer
is the same.
Instead of checking the FDs for equality, we must check if they actually
point to the same buffer (Jason).
Reviewed-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This patch shouldn't actually do anything because the libdrm function
should already do this alignment. However, it preps us for a future
patch where we add in the CCS AUX size, and in the process it serves as
a good place to find bisectable issues if libdrm or kernel does
something incorrectly.
v2: Do proper alignment for X tiling, and make sure non-tiled case is
handled (Jason)
v3: Rebase (Daniel)
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The bufmgr took a mandatory size argument, which would only be used if
the kernel size query failed, i.e. an older kernel. It didn't actually
check that the BO size was sufficient for use.
Pull the check out of the bufmgr, and actually check that the BO is
sufficiently-sized for our import one level up. This also resolves a
chicken/egg we have when importing bufers without explicit modifiers,
namely that we need the tiling mode to calculate the size, but we need
the BO imported to query the tiling mode.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When allocating images, we record a tiling mode and then work backwards
to infer the modifier. Unfortunately this is the wrong way around, since
it is a one:many mapping (e.g. TILING_Y can be plain Y-tiling, or
Y-tiling with CCS).
Invert the mapping, so we record a modifier first and then map this to a
tiling mode.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Since the EGL attributes are signed integers, a straight OR would
also perform sign extension,
Fixes: 6f10e7c37a ("egl/dri2: Create EGLImages with dmabuf modifiers")
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fix build error.
CC i915_surface.lo
i915_surface.c:108:63: error: too few arguments to function call, expected 4, have 3
util_blitter_default_src_texture(&src_templ, src, src_level);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
../../../../src/gallium/auxiliary/util/u_blitter.h:271:1: note: 'util_blitter_default_src_texture' declared here
void util_blitter_default_src_texture(struct blitter_context *blitter,
^
Fixes: a893c91697 ("gallium/u_blitter: use 2D_ARRAY for cubemap blits if possible")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101340
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
As TS is also allowed on sampler resources, we need to make sure to resolve
to self when binding the resource as a texture, to avoid stale content
being sampled.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
A resolve to self is only necessary if the resource is fast cleared, so
there is never a need to do so if there is no TS allocated.
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Stolen from VC4. As we don't do any fancy reallocation tricks yet, it's
possible to upgrade also coherent mappings and shared resources.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
There is no need to special case compressed resources, as they are already
marked as linear on allocation. With that out of the way, there is room to
cut down on the number of if clauses used.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reduces bandwidth usage of transfers which discard the buffer contents,
as well as skipping unnecessary command stream flushes and CPU/GPU
synchronization.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
This gets rid of quite a bit of CPU/GPU sync on frequent vertex buffer
uploads and I haven't seen any of the issues mentioned in the comment,
so this one seems stale.
Ignore the flag if there exists a temporary resource, as those ones are
never busy.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
cpu_prep() already does all the required waiting, so the only thing that
needs to be done is flushing the commandstream, if a GPU write is pending.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Replace -1 with MESA_SHADER_NONE enum value to fix sign related warning:
external/mesa3d/src/compiler/glsl/link_varyings.cpp:1415:25: warning: comparison of constant -1 with expression of type 'gl_shader_stage' is always true [-Wtautological-constant-out-of-range-compare]
(consumer_stage != -1 && consumer_stage != MESA_SHADER_FRAGMENT))) {
~~~~~~~~~~~~~~ ^ ~~
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Commit 621b3410f5 ("util/vulkan: Move Vulkan utilities to
src/vulkan/util") broke the Android build with the following error:
build/core/binary.mk:1427: error: external/mesa3d/src/vulkan/Android.mk: libmesa_vulkan_util: Unused source files: util/vk_util.h).
Fixes: 621b3410f5 ("util/vulkan: Move Vulkan utilities to src/vulkan/util")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: Alex Smith <asmith@feralinteractive.com>
Signed-off-by: Rob Herring <robh@kernel.org>
This is similar to the previous commit only for HiZ. For HiZ, apart
from everything looking different, there is really only one functional
change: We now track the ISL_AUX_STATE_COMPRESSED_NO_CLEAR state.
Previously, if you rendered to a resolved slice of the miptree and then
did a fast-clear with a different clear color, that slice would get
resolved even though it hadn't been fast-cleared. Now that we can track
COMPRESSED_NO_CLEAR, we know that it doesn't have any blocks in the
"clear" state so we can skip the resolve.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
This commit reworks the resolve tracking for CCS and MCS to use the new
isl_aux_state enum. This should provide much more accurate and easy to
reason about tracking. In order to understand, for instance, the
intel_miptree_prepare_ccs_access function, one only has to go look at
the giant comment for the isl_aux_state enum and follow the arrows.
Unfortunately, there's no good way to split this up without making a
real mess so there are a bunch of changes in here:
1) We now do partial resolves. I really have no idea how this ever
worked before. So far as I can tell, the only time the old code
ever did a partial resolve was when it was using CCS_D where a
partial resolve and a full resolve are the same thing.
2) We are now tracking 4 states instead of 3 for CCS_E. In particular,
we distinguish between compressed with clear and compressed without
clear. The end result is that you will never get two partial
resolves in a row.
3) The texture view rules are now more correct. Previously, we would
only bail if compression was not supported by the destination
format. However, this is not actually correct. Not all format
pairs are supported for texture views with CCS even if both support
CCS individually. Fortunately, ISL has a helper for this.
4) We are no longer using intel_resolve_map for tracking aux state but
are instead using a simple array of enum isl_aux_state indexed by
level and layer. This is because, now that we're tracking 4
different states, it's no longer clear which should be the "default"
and array lookups are faster than linked list searches.
5) The new code is very assert-happy. Incorrect transitions will now
get caught by assertions rather than by rendering corruption.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
This is only needed to fix rendering corruptions caused by not flushing
after doing a resolve operation. The resolve now does all the needed
flushing so this is unnecessary.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This also removes an unneeded brw_render_cache_set_check_flush() call.
We were calling it in the case where the surface got resolved to satisfy
the flushing requirements around resolves. However, blorp now does this
itself, so the extra is just redundant.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This commit adds a new unified interface for doing resolves. The basic
format is that, prior to any surface access such as texturing or
rendering, you call intel_miptree_prepare_access. If the surface was
written, you call intel_miptree_finish_write. These two functions take
parameters which tell them whether or not auxiliary compression and fast
clears are supported on the surface. Later commits will add wrappers
around these two functions for texturing, rendering, etc.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This enum describes all of the states that a auxiliary compressed
surface can have. All of the states as well as normative language for
referring to each of the compression operations is provided in the
truly colossal comment for the new isl_aux_state enum. There is also
a diagram showing how surfaces move between the different states.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
We have two different bits of resolve code for render targets: one in
brw_draw where it's always been and one in brw_context to deal with sRGB
on gen9. Let's pull them together.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
There are several places where we were resolving the entire miptree
when we really only needed to resolve a single slice. Let's avoid the
unneeded resolving.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Previously, we had two checks for can_fast_clear and a tiny bit of
shared code in between. This commit pulls all of the fast clear code
together and duplicates the tiny bit that declares some surface structs
and calls blorp_surf_for_miptree. The duplication is no real loss and
we're about to change the two in slightly different ways.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
None of the other methods such as blit work with CCS either so we need
to do the resolve for all maps. This change also makes us only resolve
the one slice we're mapping and not the entire image.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
There is exactly one caller so it's a bit pointless to have all of this
plumbing. Just inline it at the one place it's used.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
The new version now takes a range of levels as well as a range of
layers. It should also be a tiny bit faster because it only walks the
resolve_map list once instead of once per layer.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
For clang, enums are unsigned by default and gives the following warning:
external/mesa3d/src/mesa/main/buffers.c:764:21: warning: comparison of constant -1 with expression of type 'gl_buffer_index' is always false [-Wtautological-constant-out-of-range-compare]
if (srcBuffer == -1) {
~~~~~~~~~ ^ ~~
Replace -1 with an enum value to fix this.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
clang gives a warning in blob_overwrite_bytes because offset type is
size_t which is unsigned:
src/compiler/glsl/blob.c:110:15: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare]
if (offset < 0 || blob->size - offset < to_write)
~~~~~~ ^ ~
Remove the less than 0 check to fix this.
Additionally, if offset is greater than blob->size, the 2nd check would
be false due to unsigned math. Rewrite the check to avoid subtraction.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Rob Herring <robh@kernel.org>
This removes the linear search which is fail when number of variables
goes up to 30000 or so.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of having to search the whole array, just use the whole
thing and store a valid bit in there with the rename.
Removes this from the profile on some of the fp64 tests
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
with some of the fp64 emulation, we are seeing shaders coming in with
> 32K temps, they go out with 40 or so used, but while doing register
renumber we need to store a lot of them.
So bump this fields back up to 32-bit.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Also solve "outinfo may be used uninitialized" warning by putting in an
unreachable().
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Most functions are only inspecting nir, so nir related arguments can be
marked const. Some more can be done if/when some nir changes are
accepted.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: Add a func pointer to radeon_winsys to support radeon later.
Change-Id: I614ea71424f9e5c97e4ae68654315d28c89eaa5f
Signed-off-by: Samuel Li <Samuel.Li@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Also, prepare for the next commit by correcting some coding style
changes. This should be all non-functional changes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
These will need to be in place to avoid regressions when
removing these includes from the u_dynarray
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
When a stencil buffer is part of the framebuffer state, it is
decompressed but because it's bindless, all draw calls set
stencil_dirty_level_mask to 1.
v2: Marek - set the flags outside the loop
- also clear and set framebuffer.do_update_surf_dirtiness there
- do it in the DB->CB copy path too
v3: Marek - save and restore the do_update_surf_dirtiness flag
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
so that LLVM IR looks like CSE has been run on it. It's also recommended
by the instruction combining pass.
This also fixes:
- GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 (crash)
- piglit/spec/arb_shader_ballot/execution/fs-readFirstInvocation-uint-loop (fail)
The code size decrease is positive, the register usage isn't. There is
a decrease in VGPR spilling for Tomb Raider, but increase in DiRT Showdown
and GRID Autosport.
EarlyCSEMemSSA has a -0.01% change in code size compared EarlyCSE.
SGPRS: 1935420 -> 1938076 (0.14 %)
VGPRS: 1645504 -> 1645988 (0.03 %)
Spilled SGPRs: 2493 -> 2651 (6.34 %)
Spilled VGPRs: 107 -> 115 (7.48 %)
Private memory VGPRs: 1332 -> 1332 (0.00 %)
Scratch size: 1512 -> 1516 (0.26 %) dwords per thread
Code Size: 61981592 -> 61890012 (-0.15 %) bytes
Max Waves: 371847 -> 371798 (-0.01 %)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Heaven LDS usage for LS+HS is below. The masks are "outputs_written"
for LS and HS. Note that 32K is the maximum size.
Before:
heaven_x64: ls=1f1 tcs=1f1, lds=32K
heaven_x64: ls=31 tcs=31, lds=24K
heaven_x64: ls=71 tcs=71, lds=28K
After:
heaven_x64: ls=3f tcs=3f, lds=24K
heaven_x64: ls=7 tcs=7, lds=13K
heaven_x64: ls=f tcs=f, lds=17K
All other apps have a similar decrease in LDS usage, because
the "outputs_written" masks are similar. Also, most apps don't write
POSITION in these shader stages, so there is room for improvement.
(tight per-component input/output packing might help even more)
It's unknown whether this improves performance.
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If the XRGB view is sampling from an ARGB svga format, change
PIPE_SWIZZLE_W to PIPE_SWIZZLE_1 for all channels.
Previously we unconditionally set PIPE_SWIZZLE_1 on the alpha channel which
could be both insufficient and incorrect.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
When deciding to create a view with or without an alpha channel we need to
look at the SVGA3D format and not the PIPE format.
This fixes the glx-tfp piglit test for dri3/xa.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Gallium RGB textures may be backed by imported ARGB svga3d surfaces. In those
and similar cases we need to set the alpha value to 1 when sampling.
Fixes piglit glx::glx-tfp
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
For the purpose of surface sharing, treat SVGA3D_R5G6B5 and
SVGA3D_B5G6R5_UNORM as identical formats.
This fixes the following piglit tests with dri3/xa:
glx@glx-visuals-depth -pixmap
glx@glx-visuals-stencil -pixmap
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Deepak Singh Rawat <drawat@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
It appears like the GLX_EXT_buffer_age extension also prevents Compiz /
Ubuntu Unity from performing partial buffer swaps when it otherwise
feels like doing so. So try to get them back again. We also disable
GLX_OML_sync_control since it appears it had a favourable impact on
gnome-shell.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Increases performance on vmwgfx since we're avoiding full buffer damage and
since we can't sync to vertical retrace anyway.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Allow gallium drivers to turn off GLX_EXT_buffer_age and
GLX_OML_sync_control if needed, using driconf.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
With GLX_EXT_buffer_age turned on, gnome-shell will use full-screen damage
with GLX, which severely hurts performance with architectures that emulate
page-flips with copies. Like vmware. We would like to be able to turn off that
extension. Similarly, typically the GLX_OML_sync_control doesn't make much
sense on a virtual architecture since we don't really sync to the host's
vertical retrace. We'd like to be able to turn it off as well.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There will be situations where we want to control, for example, the
GLX behaviour based on applications and drivers. So allow DRI users access
to the driver options.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
LS is merged into TCS. If there is no TCS, LS is merged into fixed-func
TCS. The problem is the fixed-func TCS was ignored by scratch update
functions, so LS didn't have the scratch buffer set up.
Note that Mesa 17.1 doesn't have merged shaders.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If we enqueue too many jobs and destroy the GL context, it may take
several seconds before the jobs finish. Just drop them instead.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This fixes piglit:
arb_texture_view-rendering-r32ui
TEX (image_sample) flushes denorms to 0 with FP32 textures on GCN, but such
a texture can contain integer data written using an integer render view.
If we do a transfer blit with TEX, denorms are flushed to 0. Luckily,
TXF (image_load) doesn't do that.
TXF also doesn't need to load the sampler state, so blit shaders don't have
to do s_load_dwordx4.
TXF doesn't do CLAMP_TO_EDGE, so it can only be used if the src box is
in bounds, or if we clamp manually (this commit doesn't).
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The sampler views always have first_level == last_level.
Now radeonsi doesn't have to use the WQM. (a few SALU removed)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
BLORP has been capable of doing gen8-style HiZ ops for a while now. We
might as well start using it. The one downside is that this may cause a
bit more state emission since we still re-emit most things for BLORP.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The blorp_hiz_op entrypoint always acts on a full subresource of a HiZ
buffer so we can just set the flag unconditionally.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This also changes it to be predicated so we only do the flush/stall on
clears and HiZ resolves. The docs only say it's needed for clears but
empirical evidence says it's also needed for HiZ resolves.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This commit does a few things:
1) Now that BLORP can do HiZ ops on gen8+, drop the gen6 prefix.
2) Switch parameters to uint32_t to match the rest of blorp.
3) Take a range of layers and loop internally.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This commit, out of necessity, makes a number of changes at once:
1) Changes intel_mipmap_tree to store the clear color for both color
and depth as an isl_color_value.
2) Changes the depth/stencil emit code to do the format conversion of
the depth clear value on Haswell and earlier instead of pulling a
uint32_t directly from the miptree.
3) Changes ISL's depth/stencil emit code to perform the format
conversion of the depth clear value on Haswell and earlier instead
of assuming that the depth value in the float is pre-converted.
4) Changes blorp to pass the depth value through as a float.
5) Changes the Vulkan driver to pass the depth value to blorp as a
float rather than a uint.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
A number of internal VMware apitrace traces image comparisons fail with
dri3 because the viewport transformation becomes incorrect after an X
drawable resize. The incorrect viewport transformation sometimes persist
until the second draw-call after a swapBuffer.
Comparing with the dri2 glx code there are a couple of places where dri2
invalidates the drawable in the absence of server-triggered invalidation,
where dri3 doesn't do that. When these invalidation points are added to
dri3, the image comparisons become correct.
v2:
Addressed review comment by Michel Dänzer.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-and-tested-by: Michel Dänzer <michel.daenzer@amd.com>
The BLEND_STATE documentation says that alpha to one must be disabled
when dual color blending is enabled. However, it appears that it simply
fails to override src1 alpha to one.
We can work around this by leaving alpha to one enabled, but overriding
SRC1_ALPHA to ONE and ONE_MINUS_SRC1_ALPHA to ZERO. This appears to be
what the other driver does, and it looks like it works despite the
documentation saying not to do it.
Fixes spec/ext_framebuffer_multisample/alpha-to-one-dual-src-blend *
Piglit tests.
v2: Add UNUSED to shut up warning on generations which don't use this.
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The whole GLES3 block has been moved before the buffer validation
checks.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This also adds a 'no_error' parameter to vertex_array_vertex_buffer()
to be used in a following patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The only caller is _mesa_update_state_locked() which already
checks if _NEW_PIXEL is set before calling _mesa_update_pixel().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
GLSL ES spec includes the following:
"It is an error to undefine or to redefine a built-in
(pre-defined) macro name."
But desktop GLSL doesn't. This has sparked some discussion
in Khronos, and the final conclusion was to update the
GLSL 4.50 spec to include the following:
"By convention, all macro names containing two consecutive
underscores ( __ ) are reserved for use by underlying
software layers. Defining or undefining such a name in a
shader does not itself result in an error, but may result
in unintended behaviors that stem from having multiple
definitions of the same name. All macro names prefixed
with “GL_” (“GL” followed by a single underscore) are also
reserved, and defining or undefining such a name results in
a compile-time error."
In other words, undefining GL_* names should be an error, but
undefining other names with a double underscore in them is
not strictly prohibited in desktop GLSL.
This patch fixes the preprocessor to apply these rules,
following exactly the implementation already present
in GLSLang. This fixes some tests in CTS.
Khronos bug:
https://cvs.khronos.org/bugzilla/show_bug.cgi?id=16003
Fixes:
KHR-GL45.shaders.preprocessor.definitions.undefine_core_profile_vertex
KHR-GL45.shaders.preprocessor.definitions.undefine_core_profile_fragment
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Instead of having the fragile code to do a second pass, just
give the pointers you want params in to the initial code,
then call a later pass to assign them.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Just pass a pointer and increment inside the function,
makes the code less error prone.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The other user of print_sync_dispatch() was ending up with code that
looked like:
_mesa_glthread_finish(ctx);
_mesa_glthread_restore_dispatch(ctx);
_mesa_glthread_finish(ctx);
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This seems to matter here in a profile, without this we spend a lot
more time exiting this function with no flush bits.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just moves lots of stuff to the bind stage rather than
dealing with it in the draw stage.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Simple search for a backslash followed by two newlines.
If one of the newlines were to be removed, this would cause issues, so
let's just remove these trailing backslashes.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
No sense checking each bit separately in the common case of none
being set.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Divides are pretty slow, and this is in the hot path of a draw.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The manual detiling paths are not prepared to handle Gen4-G45 with
swizzling enabled, so explicitly disable them. (They're already
disabled because these platforms don't have LLC but a future patch could
enable this path).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Now that unsynchronized maps actually work, we can use them, like we do
on LLC platforms.
On Broxton, the performance of Unigine Valley 1.1-rc1 is improved by
37.6656% +/- 0.401389% (n=20) at 1280x720/QUALITY_LOW, and by
20.862% +/- 2.20901% (n=3) at 1920x1080/QUALITY_LOW.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Broxton, the performance of Unigine Valley 1.0 is improved by
13.3067% +/- 0.144322% (n=40) at 1280x720/QUALITY_LOW, and by
1.68478% +/- 0.484226% (n=3) at 1920x1080/QUALITY_LOW.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This way we can let brw_bo_map() choose the best mapping type.
Part of the patch inlines map_gtt() into brw_bo_map_gtt() (and removes
map_gtt()). brw_bo_map_gtt() just wrapped map_gtt() with locking and a
call to set_domain(). map_gtt() is called by brw_bo_map_unsynchronized()
to avoid the call to set_domain(). With the MAP_ASYNC flag, we now have
the same behavior previously provided by brw_bo_map_unsynchronized().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We can encapsulate the logic for choosing the mapping type. This will
also help when we add WC mappings.
A few functional changes are made in this patch. On non-LLC, what were
previously WB mappings are now GTT mappings (in the prefilling debug
code in brw_performance_query.c; the shader_time code in brw_program.c;
and in the case of an RW mapping in intel_buffer_objects.c).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
brw_bo_map_cpu() took a write_enable arg, but it wasn't always clear
whether we were also planning to read from the buffer. I kept everything
semantically identical by passing only MAP_READ or MAP_READ | MAP_WRITE
depending on the write_enable argument.
The other flags are not used yet, but MAP_ASYNC for instance, will be
used in a later patch to remove the need for a separate
brw_bo_map_unsynchronized() function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm going to make a new function named brw_bo_map() in a later patch
that is responsible for choosing the mapping type, so this patch clears
the way.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I think these are better names, and it reduces the delta between
upstream and Chris Wilson's brw-batch branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since we can distinguish when mapping between READ and WRITE, we can
pass along the map mode to avoid stalls and flushes where possible.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Based on discussions with Jason, Ivy Bridge and Bay Trail only actually
support 16 samplers, while newer hardware can support more than the
current limit of 64. Therefore set the lower limit where needed, and
bump up to 128 for everything else. There is also a limit on the total
number of other resources of around 250.
This allows Dawn of War III to render correctly on ANV.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This means it can be reused for other Vulkan drivers. Also fix up a
typo, need to search for '.' in the version string rather than ','.
v2: Remove unneeded temporary version variable (Emil, Eric)
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We have Vulkan utilities in both src/util and src/vulkan/util. The
latter seems a more appropriate place for Vulkan-specific things, so
move them there.
v2: Android build system changes (from Tapani Pälli)
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The current way of handling groups doesn't seem to be able to handle
MI_LOAD_REGISTER_* with more than one register. This change reworks
the way we handle groups by building a traversal list on loading the
GENXML files.
Let's say you have
Instruction {
Field0
Field1
Field2
Group0 (count=2) {
Field0-0
Field0-1
}
Group1 (count=4) {
Field1-0
Field1-1
}
}
We build of linked on load that goes :
Instruction -> Group0 -> Group1
All of those are gen_group structures, making the traversal trivial.
We just need to iterate groups for the right number of timers (count
field in genxml).
The more fancy case is when you have only a single group of unknown
size (count=0). In that case we keep on reading that group for as long
as we're within the DWordLength of that instruction.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Now, st_update_window_rectangles() won't be called when the
scissor is going to be updated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This new driver flag will replace _NEW_SCISSOR which is
emitted when setting new window rectangles but it actually
triggers useless changes in the state tracker (like scissor
and rasterizer).
EXT_window_rectangles is currently only supported by Nouveau.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is actually useless because this driver call is only used
by the classic DRI drivers which don't support that extension
and probably won't never support it.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We usually check that given parameters are different before
updating the state.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We usually check that given parameters are different before
updating the state.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We usually check that given parameters are different before
updating the state.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We moved to INTEL_SCALAR_* when we added more than a single stage, but
never went back and converted the VS to work that way. Be consistent.
Also update the documentation to actually mention these debug variables.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This just sets the vulkan device type depending on whether
this is an APU or GPU.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
We always compute HTILE size using addrlib, even when not TC compatible.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlied <airlied@redhat.com>
I'm open to reverting this closer to release if bad things
happen, but it might be easier to debugging to leave it for now.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We don't support these yet, and it'll take a bit of work to do so.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These are just some register changes ported from radeonsi for gfx9.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These are ported from radeonsi, don't know all the rules for
when they should be inserted.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds some rb+ support, as on GFX9 we have to disable
it as per radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
GFX9 needs to write event EOP to a fence buffer, allocate some
space for this, and just write an ever increasing number to it,
this isn't exactly what radeonsi does, but it seems to work.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds gfx9 support for the texture descriptor along
with the fmask/cmask allocation routines.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds support for initialising some GFX9 registers,
and handles the different init for the VGT reuse reg.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support to the CP dma code for GFX9, ported from
radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds the strings and includes the gfx9 register defs
in some files that we need them in.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just splits out some non-gfx9 bits in advance to avoid
regressions.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just moves the code around in preparation for gfx9 support.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In advance of GFX9 to reduce chances for regression, refactor
this code out so adding the GFX9 changes will be more obvious.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just collapses a few per-stage things into a loop,
shouldn't affect anything.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Replace some static assertions with runtime assertions. The static
asserts don't work/fail on MSVC, despite the offsets being multiples
of 16 (checked with softpipe).
Use correct parameter types for a few gallium context functions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The atomic counters on evergreen are implemented via append/consume
UAV counters. This just adds the register info for them. The EOS
packets are used to get the atomic totals extracted post shader
execution for storing into a buffer.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
This just documents in the headers the RAT operation list,
and the RAT encoding for exports.
The immediate registers are used to point to buffers for the
RAT return values (_RTN instructions).
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Not relevant to radeonsi, because Position/Face are system values
with radeonsi, while this codepath is for drivers where Position and
Face are ordinary inputs.
Reviewed-by: Brian Paul <brianp@vmware.com>
In order to do resolves for texture views with different formats, we
need intel_texture_object::_Format to be valid. Calling
intel_finalize_mipmap_tree can safely be done multiple times in a row
and should be a fairly cheap operation.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This fixes a performance issue with the shader cache that delayed Gallium
shader create calls until draw calls.
I'd like this in stable, but it's not a showstopper.
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The old code copied over all the surface info from the image
surface, we only want some bits of it, and to modify the flags.
This prevents a regression in dEQP-VK.api.copy_and_blit.resolve_image.*
and others in the subsequent switch to ac_compute_surface.
v2:
- also disable opt4Space in radv_amdgpu_surface, so that we can
apply this patch separately *before* switching to ac_compute_surface
and hopefully avoid intermittent regressions (Nicolai)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is mostly mechanical changes of renaming types and introducing
"legacy" everywhere.
It doesn't use the ac_surface computation functions yet.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Like radeonsi. This saves memory, and the information can easily be
recomputed on the fly where necessary.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This ports: 55445ff189 from radeonsi
radeonsi: tell LLVM not to remove s_barrier instructions
LLVM 5.0 removes s_barrier instructions if the max-work-group-size
attribute is not set. What a surprise.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support for exporting 2D images, to an
opaque fd.
This implements the:
VK_KHX_external_memory_capabilities
VK_KHX_external_memory
VK_KHX_external_memory_fd
extensions.
These are used by SteamVR, we should work with anv
to decide if we should ship these under an env
var or something.
v2 (Bas): - Don't expose the semaphore ext without implementing it.
- Only export the capabilities ext as instance ext.
- Implement radv_GetPhysicalDeviceExternalBufferPropertiesKHX.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Sets could have been ignored during previous descriptor set flush
due to the shader not using them and therefore no SGPR being assigned.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: ae61ddabe8 "radv: move userdata sgpr ownership to compiler side."
We clear the descriptors_dirty array afterwards, so the SGPRs for
the other pipeline don't get updated on the flush for that other
draw/dispatch, so we have to make sure we do it immediately.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: ae61ddabe8 "radv: move userdata sgpr ownership to compiler side."
Currently we signal the availabilty of the query result using an
unordered pipe-control write. As it is unordered, it may be executed
before the write of the query result itself - and so an observer may
read the query result too early. Fix this by requesting that the write
of the availablity flag is ordered after earlier pipe control writes.
Testcase: piglit/arb_query_buffer_object-qbo/*async*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
BGRA8 image stores on Fermi don't work, which results in breaking
PBO downloads, such that they always return 0x0. Discovered this
through a glamor bug, and confirmed it does indeed break a good number
of piglit tests such as spec/arb_pixel_buffer_object/pbo-read-argb8888
Fixes: 8e7893eb53 ("nvc0: add support for BGRA8 images")
Signed-off-by: Lyude <lyude@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
By making use of l3_banks field in gen_device_info struct
l3_way_size for gen7+ = 2 * l3_banks.
V2: Keep the get_l3_way_size() function.
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This new field helps simplify l3 way size computations
in next patch.
V2: Initialize the l3_banks to 0 in macros.
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When given an *unsupported* mesa_format,
brw_isl_format_for_mesa_format() returned 0, a *valid* isl_format,
ISL_FORMAT_R32G32B32A32_FLOAT. The problem is that
brw_isl_format_for_mesa_format's inner table used 0 instead of
ISL_FORMAT_UNSUPPORTED to indicate unsupported mesa formats.
Some callers of brw_isl_format_for_mesa_format() were aware of this
weirdness, and worked around it. This patch removes those workarounds.
v2: Ensure that all array elements are initialized to
ISL_FORMAT_UNSUPPORTED, even when new formats are added to enum
mesa_format, by using an designated range initializer.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Since the fence implementation is not dri2.c specific, put
it in a separate file. This way SW implementations can use this
extension too.
v2: Don't depend on dri2.c for extensions (Emil)
v3: Make this patch only move extension into a separate file (Chad).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
glXGetCurrentDisplay() has been broken for years and nobody noticed until
recently. This change adds a new XMesaGetCurrentDisplay() that the GLX
emulation API can call, just as we did for glXGetCurrentContext().
Tested by hacking glxgears to call glXGetCurrentContext() before and
after glXMakeCurrent() to verify the return value is NULL beforehand and
the same as the opened display afterward.
Also tested by Tom Hudson with his tests programs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100988
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Tom Hudson <tom.hudson.phd@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
This reworks this code to be like radeonsi, which will make it
easier to add GFX9 support to it in the future.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For GFX9 these will be needed to be 64-bit, so bump them early,
to avoid it causing any wierdness later.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In prep for GFX9 refactor some of the eop event writing code
out.
This changes behaviour, but aligns with what radeonsi does,
it does double emits on CIK/VI, whereas previously it only
did this on CIK.
v2: bump the size checks.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Frequently, get_image_offset_sa is combined with get_intratile_offset_sa
so it makes sense to have a single helper to do both. If the caller
doesn't want the intratile offsets, it can simply pass NULL and ISL will
assert that they are 0.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The only surface layout for which slice0 makes any sense is GEN4_2D.
Move all of the slice0 stuff into isl_calc_phys_total_extent_el_gen4_2d
and make the others trivially return the total size in surface elements.
As a side-effect, array_pitch_el_rows is now returned from these helpers
as well.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We've already implicitly been using a physical total size in surface
elements. This just centralizes things a bit.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This is a fairly common operation and it's nice to be able to just call
the one little function.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Over 90% of the function only applies to ISL_DIM_LAYOUT_GEN4_2D anyway
so we can just handle the other two as special cases at the top. The
two "generic" cases below the switch only apply on gen9 and above and
only to 3D or CCS surfaces. This implies that they only apply to
surfaces with ISL_DIM_LAYOUT_GEN4_2D. Making them look generic is a
lie.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We were only using it for validating that we don't use Ys/Yf on gen8 and
earlier. Removing it from isl_tiling_get_info lets us remove it from a
bunch of other things that had no business needing a hardware
generation.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Sandy Bridge does not technically support mipmapped depth/stencil. In
order to work around this, we allocate what are effectively completely
separate images for each miplevel, ensure that they are page-aligned,
and manually offset to them. Prior to layered rendering, this was a
simple matter of setting a large enough halign/valign.
With the advent of layered rendering, however, things got more
complicated. Now, things weren't as simple as just handing a surface
off to the hardware. Any miplevel of a normally mipmapped surface can
be considered as just an array surface given the right qpitch. However,
the hardware gives us no capability to specify qpitch so this won't
work. Instead, the chosen solution was to use a new "all slices at each
LOD" layout which laid things out as a mipmap of arrays rather than an
array of mipmaps. This way you can easily offset to any of the
miplevels and each is a valid array.
Unfortunately, the "all slices at each lod" concept missed one
fundamental thing about SNB HiZ and stencil hardware: It doesn't just
always act as if you're always working with a non-mipmapped surface, it
acts as if you're always working on a non-mipmapped surface of the same
size as LOD0. In other words, even though it may only write the
upper-left corner of each array slice, the qpitch for the array is for a
surface the size of LOD0 of the depth surface. This mistake causes us
to under-allocate HiZ and stencil in some cases and also to accidentally
allow different miplevels to overlap. Sadly, piglit test coverage
didn't quite catch this until I started making changes to the resolve
code that caused additional HiZ resolves in certain tests.
This commit switches Sandy Bridge HiZ and stencil over to a new scheme
that lays out the non-zero miplevels horizontally below LOD0. This way
they can all have the same qpitch without interfering with each other.
Technically, the miplevels still overlap, but things are spaced out
enough that each page is only in the "written area" of one LOD.
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We were linking src/glx with -Bsymbolic, but not the classic/gallium X11
libGL.so.
But it's always a good idea to build all libGL.so and all DRI drivers
with -Bsymbolic, otherwise they might resolve symbols from the 3rd party
application executable or shared libraries, which is _never_ what we
want.
In particular, this can happen when intercepting OpenGL calls with
apitrace, before
63194b2573
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This function never occurs in the callchain of a GL function. It occurs
only in the callchain of eglCreate*Surface and the analogous paths for
GLX. Therefore, even if a thread does have a bound GL context,
emitting a GL error here is wrong. A misplaced GL error, when no GL
call is made, can confuse clients.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
translate_tex_format() asserted that isl_format != 0. But 0 is a valid
format, ISL_FORMAT_R32G32B32A32_FLOAT.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It returns an isl_format, not GLuint BRW_FORMAT. I updated every
translate_tex_format() found by git-grep.
No change in behavior.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It returns an isl_format, not uint32_t BRW_FORMAT.
I updated every brw_isl_format_for_mesa_format() found by git-grep.
No change in behavior.
v2: Rebased atop Anuj's patch, which has some of the same fixes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
This patch makes non functional changes. Renaming is just to
make the code more readable.
V2: update the types to "enum isl_format"
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
If the EGLImage's format is not a supported texture format according to
brw_surface_formats.c, then refuse to create the miptree. This follows
the precedent in glEGLImageRenderbufferStorage (implemented by
intel_image_target_renderbuffer_storage), which rejects the EGLImage's
format if is not renderable.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If you had a group as the first element of a struct, i.e.
<struct name="3DSTATE_CONSTANT_BODY" length="10">
<group count="4" start="0" size="16">
<field name="ReadLength" start="0" end="15" type="uint"/>
</group>
...
</struct>
we would get a group_offset of 0, causing create_field() to think the
field wasn't in a group, and fail to offset forward for successive array
elements. So we'd mark all the array elements as offset 0.
Using ctx->group->elem_size is a better check for "are we in a group?".
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
If you have something like:
<group count="0" start="96" size="32">
<field name="Entry_0" start="0" end="15" type="GATHER_CONSTANT_ENTRY"/>
<field name="Entry_1" start="16" end="31" type="GATHER_CONSTANT_ENTRY"/>
</group>
We would reset ctx->group_count to 0 after processing the first field,
so the second would not have a group count.
This is largely untested, as the only groups with multiple fields are
packets we don't emit in Mesa. Found by inspection.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes for:
src/util/rand_xor.c:60:13: error: implicit declaration of function 'open' [-Werror=implicit-function-declaration]
int fd = open("/dev/urandom", O_RDONLY);
^~~~
src/util/rand_xor.c:60:34: error: 'O_RDONLY' undeclared (first use in this function)
int fd = open("/dev/urandom", O_RDONLY);
^~~~~~~~
Signed-off-by: Nicolas Dechesne <nicolas.dechesne@linaro.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The cpu_fini() call pushes the buffer back into the GPU domain, which needs
to be done for all buffers, not just the ones with CPU written content. The
etnaviv kernel driver currently doesn't validate this, but may start to do
so at a later point in time. If there is a temporary resource the fini needs
to happen before the RS uses this one as the source for the upload.
Also remove an invalid comment about flushing CPU caches, cpu_fini takes
care of everything involved in this.
Fixes: c9e8b49b88 ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
Bindless samplers/images are represented with 64-bit unsigned
integers and they can be assigned with explicit constructors.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Memory/format layout qualifiers shouldn't be lost when arrays
of images are splitted by this pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
GL_ARB_bindless_texture allows images to be declared inside
structures, but when memory/format qualifiers are used, they
should be propagated when structures are splitted.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We can initialize structs directly, avoid some temporaries, and cut out
about half of the skip component handling.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We can just update the gl_transform_feedback_info fields at link time
to make the VUE header fields have the right location and component.
Then we don't need to handle them specially at draw time, which is
expensive.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
These were correct since they were used only in conversions to signed integers,
however this makes the implementation a bit more is more consistent and reduces
chances of propagating use of these macros to unsigned cases in the future, which
would not be correct.
Reviewed-by: Matt Turner <mattst88@gmail.com>
As we do for all other cases of float/double conversions to integers.
v2: use round() instead of IROUND() macros (Iago)
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2:
- need unsigned rounding for double->uint64 conversion (Nicolai)
- use round() instead of IROUND() macros (Iago)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Like we do for the 32-bit case.
v2:
- need unsigned rounding for float->uint64 conversion (Nicolai)
- use roundf() instead of IROUND() macros (Iago)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Section 2.2.2 (Data Conversions For State Query Commands) of the
OpenGL 4.5 October 24th 2016 specification says:
"If a command returning unsigned integer data is called, such as
GetSamplerParameterIuiv, negative values are clamped to zero."
v2: uint to int conversion should clamp to INT_MAX (Nicolai)
v3 (Iago)
- Add conversions conversions from 64-bit integer paths
- Rebase on master
v4:
- need unsigned rounding for float/double->uint conversions (Nicolai)
- use round{f}() instead of IROUND() macros (Iago)
Fixes:
KHR-GL45.gpu_shader_fp64.state_query
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v2)
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: also change the style of the large conditional in that function
to follow the style from most other parts of Mesa (Nicolai)
Reviewed-by: Matt Turner <mattst88@gmail.com>
The wrapper is for a renderbuffer around a texture. Textures can have
formats (e.g., 3) that aren't valide for API generated renderbuffers.
_mesa_base_fbo_format will return 0, but _mesa_get_format_base_format
will return the base format of RGB.
Fixes a crashes in piglit tests fbo-alphatest-formats (all subtests
pass) and fbo-colormask-formats (some subtests pass, some fail).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a poor man's version of radeonsi ddebug stuff, this
should get hooked into that infrastructure, and grow more stuff,
but for now, just create R600_TRACE var that points to a file
that you want to dump the last IB to.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Otherwise we'd get progress continually set if we had non 64-bit
versions of these ops.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
0x30f regressed mad max.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Fixes: df91abfe5a "radv: Use correct clear words for HTILE."
sysmacros.h was getting implicitly included in types.h until recently in
AOSP master. Define MAJOR_IN_SYSMACROS to explicitly include sysmacros.h.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
In all codepaths, this var ends up assigned to the struct, except one:
a cleanup codepath, where the `close()` was removed, leading to fd leaks.
Remove the temp fd and assign to the struct field directly instead.
CovID: 1213930
Fixes: 7ec07beedf ("egl/drm: make use of the
dri2_display_destroy() helper")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Because get_texobj_by_name() can already throw a INVALID_ENUM
error, it makes more sense to add a check directly there.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
To display better function names when INVALID_OPERATION is
returned. Requested by Timothy.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
u_vector_foreach is currently only used by the Intel Vulkan
driver but when this macro is used in mesa core, GCC reports
a compile-time error. Probably because some compiler options
are different.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Results in always having at least one WFI between draws, which was
slowing stk down by ~5% and ~10% in xonotic.
(also drop bogus assert while we're at it.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reduce/simplify vertex storage usage in PA_STATE_OPT, fix PA
GetNextVSOutput wrap-around behaviour and eliminate unnecessary
SIMDVERTEX copies/storage for tri fan in PA_STATE_OPT
Fixes the OpenGL tri fan test failure under SIMD16 -
triangle-rasterization-overdraw.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Ben and I haven't observed these to help anything, but they enable
hardware optimizations for particular cases. It's probably best to
enable them ahead of time, before we run into such a case.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
These were already in gen8.xml but not gen9.xml. There are a few new
fields and a couple that have changed. These are all documented in the
Skylake PRM, Volume 2c Command Reference: Registers, Part 1.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This is woefully undocumented. It's some kind of optimization that
avoids unnecessary render target reads when blending with a floating
point render target, using independent alpha blending modes.
The internal documentation indicates that this bit exists on Cherryview
as well, but the other driver doesn't appear to set it on that platform.
There's also some confusing wording that indicates that it may exist on
Broadwell, but the documentation says it's reserved, so who knows.
I was not able to find any workload that benefited from setting this
bit, but it seems like a good idea to set it nonetheless.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
It's an array of isl_format, not uint32_t. This patch updates every
reference to render_target_format[] git-grep.
Trivial cleanup. No change in behavior.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The name is misleading because the function is unrelated to GL
renderbuffers. Rename it to intel_create_winsys_renderbuffer.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The old comment pinned this function to X11 windows. In reality, this
function serves more than X11 and more than just windows.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
query and return supported dmabuf format modifiers for
EGL_EXT_image_dma_buf_import_modifiers.
v2: move format check to the driver instead of making format queries
here and then checking.
v3: Check DRIimageExtension version before query (Daniel Stone)
v4:
- move to DRIimageExtension version 15, check queryDmaBufModifiers before
calling (Jason Ekstrand)
- pass external_only to the driver instead of setting as EGL_TRUE here
(Emil Velikov, Daniel Stone)
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
allow egl clients to query the dmabuf formats supported on this platform.
v2: return EGLBoolean.
v3: Check DRIimageExtension version before querying (Daniel Stone).
v4: move to DRIimageExtension version 15, error checking (Jason Ekstrand).
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Allow creating EGLImages with dmabuf format modifiers when target is
EGL_LINUX_DMA_BUF_EXT for EGL_EXT_image_dma_buf_import_modifiers.
v2:
- clear modifier assembling and error label name (Eric Engestrom)
v3:
- remove goto jumps within switch-case (Emil Velikov)
- treat zero as valid modifier (Daniel Stone)
- ensure same modifier across all dmabuf planes (Emil Velikov)
v4:
- allow modifiers to add extra planes (Louis-Francis Ratté-Boulianne)
v5:
- fix error checking, some cleanups (Jason Ekstrand)
- pass single copy of the modifier to createImageFromDmaBufs2
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
these allow dmabuf import with modifiers, and supported format and
modifier queries, which are used to implement
EGL_EXT_image_dma_buf_import_modifiers.
v2:
- squash dmabuf queries into DRIimage version 15 (Jason Ekstrand).
- add external_only param to queryDmaBufModifiers (Emil, Daniel Stone)
- pass a single modifier form createImageFromDmaBufs2 since all planes have
the same modifier (Jason Ekstrand)
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The EGL_EXT_dma_buf_import_modifiers extension adds support for a
fourth plane, just like DRM KMS API does.
Bump maximum dma_buf plane count to four.
v2: prevent attribute tokens from being parsed if
EXT_image_dma_buf_import_modifiers is not suported. (Emil Velikov)
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Signed-off-by: Varad Gautam <varad.gautam@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
If info->index_size is zero, info->index will point to uninitialized
memory.
Fatal signal 11 (SIGSEGV), code 2, fault addr 0xab5d07a3 in tid 20456 (surfaceflinger)
lst: Remove useless indexbuf conditional in the index_size != 0 case.
Fixes: 330d0607ed ("gallium: remove pipe_index_buffer and set_index_buffer")
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
See commit ece0e535a4. This makes
Gen4-5 follow the behavior we use on Gen6+. It seems to have
worked out there.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This brings the improved guardbanding we implemented on Gen6+
back to the older Gen4-5 code. It also deletes piles of code.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Gen4-5 include a single SCISSOR_RECT in SF_VIEWPORT.
Making a helper function will allow us to reuse this code for Gen4-5.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
These are fairly related. Gen4-5 combine the scissor rectangle and
SF_VIEWPORT. Co-locating them will allow me to avoid forward
declarations of helper functions in a few patches.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Gen6+ support multiple scissor rectangles, and define a SCISSOR_RECT
structure containing their dimensions. On Gen4-5, those same fields
exist in SF_VIEWPORT.
This patch extracts the SF_VIEWPORT fields into a SCISSOR_RECT
structure. Although not a named concept on Gen4-5, it works just
as well, and gives us a consistent SCISSOR_RECT structure across
all generations, making it easier to reuse code.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
On Gen7+ we emit 3DSTATE_VIEWPORT_STATE_POINTERS_{SF_CL,CC} when
emitting a new viewport.
This patch makes us take the same approach on Sandybridge - but because
we have a combined command, we just set the appropriate "change" bits.
This eliminates an atom, some dirty flagging, and some brw->*.vp_offset
writes. It does mean we'll emit two 3DSTATE_VIEWPORT_STATE_POINTERS
instead of one if both change, but that's probably fine.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Scalar mode has been default since Broadwell, and vector mode is getting
increasingly unmaintained. There are a few things that don't even fully
work in vector mode on Skylake, but we've never cared because nobody
uses it. There's no point in porting it forward to new platforms.
So, just ignore the debug options to force it on.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This allows some tidy up and also makes it so we can add KHR_no_error
support.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This will allow us to skip the error checkes when adding
KHR_no_error support.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
flush_compute_state doesn't reserve a large chunk, so these need their own reservation.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Print an error message for the user if the requirement isn't met, or
we're not thread safe.
v2: based on Nicolai feedbacks
Check the DRI extension version
v3: based on Emil feedbacks
improve commit and error messages.
use backgroundCallable variable to improve readability
v5: based on Emil feedbacks
Properly check the function pointer
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
v2:
bump version
v3:
Add code comment
s/IsGlThread/IsThread/ (and variation)
Include X11/Xlibint.h protected by ifdef
v5: based on Daniel feedback
Move non X11 code outside of X11 define
Always return true for Wayland
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
v2:
bump version
v3:
Add code comment
s/IsGlThread/IsThread/ (and variation)
v4:
DRI3 doesn't hit X through GL call so it is always safe
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
DRI-drivers could call Xlib functions, for example to allocate a new back
buffer.
When glthread is enabled, the driver runs mostly on a separate thread.
Therefore we need to guarantee the thread safety between libX11 calls
from the applications (not aware of the extra thread) and the ones from
the driver.
See discussion thread:
https://lists.freedesktop.org/archives/mesa-dev/2017-April/152547.html
Fortunately, Xlib allows to lock display to ensure thread safety but
XInitThreads must be called first by the application to initialize the lock
function pointer. This patch will allow to check XInitThreads was called
to allow glthread on GLX or EGL platform.
Note: a tentative was done to port libX11 code to XCB but it didn't solve fully
thread safety.
See discussion thread:
https://lists.freedesktop.org/archives/mesa-dev/2017-April/153137.html
Note: Nvidia forces the driver to call XInitThreads. Quoting their manpage:
"The NVIDIA OpenGL driver will automatically attempt to enable Xlib
thread-safe mode if needed. However, it might not be possible in some
situations, such as when the NVIDIA OpenGL driver library is dynamically
loaded after Xlib has been loaded and initialized. If that is the case,
threaded optimizations will stay disabled unless the application is
modified to call XInitThreads() before initializing Xlib or to link
directly against the NVIDIA OpenGL driver library. Alternatively, using
the LD_PRELOAD environment variable to include the NVIDIA OpenGL driver
library should also achieve the desired result."
v2: based on Nicolai and Matt feedback
Use C style comment
v3: based on Emil feedback
split the patch in 3
s/isGlThreadSafe/isThreadSafe/
v5: based on Marek comment
Add a comment that isThreadSafe is supported by extension v2
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Analogous to earlier commits - image_driver and image_loader are meant
to be used hand in hand.
v2: Rebase
Cc: Derek Foreman <derekf@osg.samsung.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Strictly speaking __DRI_DRI2 implies __DRI2_FLUSH. Although since we're
using the latter in the callback, we want to use the correct guard.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Rather than misleadingly depending on DRI2 for the WL_DRM vs WL_SHM
formats, use the wl_drm and wl_shm interface respectively.
Fixes: a1727aa75e ("egl/wayland: Don't use DRM format codes for SHM")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
They are meant to be used together. Otherwise we'll need workarounds
like egl/wayland. Namely register an image_loader_extension even thought
we should be using only DRI2.
v2: Add missing the bracket to fix the build (Tapani).
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Analogous to earlier commit.
Note that the dri2_x11_post_sub_buffer and dri2_x11_swap_buffers_region
paths already implicitly require __DRI2_FLUSH. The corresponding
extensions (NV_post_sub_buffer and NOK_swap_region) are enabled only
with DRI2.
v2: Split cosmetic changes into separate patch.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
The current __DRI_DRI2 imples __DRI2_FLUSH. At the same time, one can
use __DRI_IMAGE_DRIVER alongside the latter, so the current check is
confusing at best.
Check for what we use.
v2: Split out from whitespace changes
Reviewed-by: Chad Versace <chadversary@chromium.org> (v1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
With the final place that modifies the vtbl removed as of last commit we
can annotate the symbols accordingly.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
With previous commit we'll error out should one be using the extension
when it's not available. Thus we no longer need to modify the vtbl.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Currently f one does the silly thing by probing the entry point w/o
checking the extension they will attempt to use the extension even
though it cannot work.
That is due our of of an assert which gets removed in release builds.
Simply error out if the extension is not enabled. Thus we can
apply some cleanups with next commits.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Currently GBM attempts to know all the extensions that might be required
by EGL/DRM [at some later stage].
That is a bit unclear and we often forget to update GBM as EGL gets
attention.
To avoid that, simply let EGL manage it's own required extensions based
on the base primitive (screen) we provide it.
v2: Rework the approach - GBM should not dive into EGL/DRM.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
Allows us to keep things in sync easier and lets us simplify the
interface between the two even further.
v2: Don't set GBM's extensions.
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
Split the create_screen into:
- create screen
- setup/bind extensions
- setup screen
This will allow us to reuse the latter two on egl/drm. Said platform
does create its own screen and attempts to reinvent the later two
functions itself.
Since the GBM ones tend to get out of sync quite often, and there is no
distinct reason why it does so we'll drop them with latter commits.
v2: disp -> dpy for the Android platform.
v3: use correct goto label (Rob)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
Within dri2_display_release() we already tear down all the display
specifics. Within the platform specific dri initialize however we badly
and partially duplicate that.
Let's stop that by fleshing out the required functionality into a helper
and using it throughout the codebase.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Tested-by: Rob Herring <robh@kernel.org>
With later commits we'll split and reuse the destroy side of the
function for the initialize_foo error path.
In such cases, driver_configs may be NULL leading to a crash.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
[Emil Velikov: reword commit message]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
The former already keeps track of the DRI module opened, based on the
driver_name provided. So let's keep them together.
As a nice bonus this Will allows us to remove the gbm_drm_device all
together with next patch.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
The struct is a simple wraper around gbm_bo and brings no actual
benefit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
Introduced back in 2012 with fd6acb97fb ("gbm: Create hooks for
dri2_loader_extension in dri backend") and hasn't been used since.
Seemingly a copy/paste thinko from development stage.
Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
Android tries to create a FENCE_FD fence without any rendering. And
then falls over when that fails. So just always create an initial
batch.
Fixes: e4ad8695 ("freedreno: fix crash when flush() but no rendering")
Signed-off-by: Rob Clark <robdclark@gmail.com>
cso_set_blend_color() already checks if the old state is different.
Only Nine uses pipe::set_blend_color() directly but I guess it
should use the cache too.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The vertex information we compute here is really dependent on the last
stage before FS. It just happened to work most of the time because new
GS tend to come with new VS and/or FS...
(The LP_NEW_GS flag was previously set but never used.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We have a few mistakes in our shader translation code, but the virtual
GPU is forgiving.
Reviewed-by: Michal Krol <michal@vmware.com>
Reviewed-by: Neha Bhende<bhenden@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This was just an accidental typo in the refactoring. The intention was
to try the blitter on gen4-5, not just gen4.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes linking error in libOSmesa when using libunwind.
CXXLD libOSMesa.la
src/gallium/auxiliary/.libs/libgallium.a(u_debug_stack.o): In function `symbol_name_cached':
./src/gallium/auxiliary/util/u_debug_stack.c:87: undefined reference to `_ULx86_64_get_proc_name'
src/gallium/auxiliary/.libs/libgallium.a(u_debug_stack.o): In function `debug_backtrace_capture':
./src/gallium/auxiliary/util/u_debug_stack.c:114: undefined reference to `_Ux86_64_getcontext'
./src/gallium/auxiliary/util/u_debug_stack.c:115: undefined reference to `_ULx86_64_init_local'
./src/gallium/auxiliary/util/u_debug_stack.c:117: undefined reference to `_ULx86_64_step'
./src/gallium/auxiliary/util/u_debug_stack.c:123: undefined reference to `_ULx86_64_get_reg'
./src/gallium/auxiliary/util/u_debug_stack.c:124: undefined reference to `_ULx86_64_get_proc_info'
./src/gallium/auxiliary/util/u_debug_stack.c:120: undefined reference to `_ULx86_64_step'
collect2: error: ld returned 1 exit status
v2 : Fixes title and adds the original error it is fixing.
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We keep the blit path because it's probably faster when it works.
However, now that we can use blorp, we can delete that nasty CPU
fall-back path.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The width and height of the copy don't have to be aligned to the block
size if they specify the right or bottom edges of the image. (See also
the comment and asserts right above). We need to round them up when we
do the division in order to get it 100% right.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
We don't support replicated data clears yet. Those take a bit more work
and enabling replicated data clears in its own commit is probably better
for bisectibility anyway.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Due to complications with things such as URB setup on gen4-5, it's
easier to keep gen4 support in blorp completely internal to i965. This
makes things a bit awkward because that means there's a file in i965
that includes blorp_priv.h but it's either that or have a file in blorp
that includes brw_context.h.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
As part of enabling support for SF programs, we plumb the SF URB size
through to emit_urb_config. For now, it's always zero but, on gen4, it
may be something larger.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We also add a slot variable and use it as an iterator. This will make
it much easier to conditionally put something between the header and the
vertex position.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
It isn't supported prior to gen6 and, on gen6+, NIR will fuse the fmul
and fadd into an ffma automatically for us anyway.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Gen5 and earlier can't do non-normalized coordinates so we need to
compensate in the shader. Fortunately, it's pretty easy plumb through.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Having it be a pointer means that we end up caching clip programs based
on a pointer to wm_prog_data rather than the actual interpolation modes.
We've been caching one clip program per FS ever since 91d61fbf7c
where Timothy rewrote brw_setup_vue_interpolation().
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Having it be a pointer means that we end up caching clip programs based
on a pointer to wm_prog_data rather than the actual interpolation modes.
We've been caching one clip program per FS ever since 91d61fbf7c
where Timothy rewrote brw_setup_vue_interpolation().
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
These need special handling because they have no "DWord Length"
parameter and they have an unusual bias of 1.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
It isn't a pointer to "color calc state", that's the packet it's in.
It's a pointer to the CC viewport state.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Iron Lake introduced the multiple KSP thing and so you have KSP0-3.
However, the genxml didn't have an index on the first "Kernel Start
Pointer" or "GRF Register Count". Add one to match gen6+. While we're
here, we drop the brackets from the other "GRF Register Count" fields.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Most things on gen4-5 are addresses because we don't have dynamic state
base address and we don't have instruction state base on gen4. However,
whoever converted things to addresses got a little over-excited and
converted too much.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Gen4 cube maps are a 2-D surface with ISL_DIM_LAYOUT_GEN4_3D which is a
bit weird but accurate none the less.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
On Iron Lake, the packets exist but we never emit them so there's no
need for us to ask the driver to make batch space for them.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The guts of blorp and ISL don't understand i965's partial miptrees.
Instead, we need to subtract off first_level before we hand anything off
to blorp.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
ISL doesn't have a concept of a partial miptree. Instead, we need to
subtract off first_level.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
It's not needed for blorp_copy because it already overrides formats.
It's also not needed for blorp_clear because it clears stencil as
stencil.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The blorp_copy entrypoint is designed for doing memcpy like operations
which is what we need to do here while blorp_blit is for handling format
conversion and scaling. Using blorp_copy is much simpler and prevents
us from getting formats wrong. While we're here, we get rid of the
layers_per_blit thing since stencil always uses interleaved MSAA.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We've discovered in the Vulkan driver that simply doing the end-of-pipe
sync afterwards is insufficient. The specific requirement stated in the
PRM is that you have to do one every time you transition between the
tree modes of "clear", "render", and "resolve". This is GL, so we could
track it but any attempt to do so would most likely get it wrong. For
now, it's easier to just assume that every fast-clear op is an island
and do the sync both before and after.
This also removes the unneeded flush and stall after slow-clear
operations.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
Support for Android 4.4 and earlier has already been removed from mesa.
Remove this remaining piece from nouveau, too.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Simplify the handling of mmap for Android by using mmap64 instead. mmap64
may have not existed for Android when this was written, but it's been
around since 2013.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Simplify the handling of mmap for Android by using mmap64 instead. mmap64
may have not existed for Android when this was written, but it's been
around since 2013.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Since commit 7a5b5f5226 ("Android: drop Android 4.4 (KitKat) support"),
Android 4.4 or earlier is no longer supported, so exit with an error if we
try building on it.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Fixes this warning:
In file included from ../../../src/compiler/glsl/ir.cpp:25:0:
../../../src/compiler/glsl/ir.h: In constructor 'ir_swizzle::ir_swizzle(ir_rvalue*, ir_swizzle_mask)':
../../../src/compiler/glsl/ir.h:1955:20: warning: 'ir_swizzle::mask' will be initialized after [-Wreorder]
ir_swizzle_mask mask;
^
../../../src/compiler/glsl/ir.h:1954:15: warning: 'ir_rvalue* ir_swizzle::val' [-Wreorder]
ir_rvalue *val;
^
../../../src/compiler/glsl/ir.cpp:1592:1: warning: when initialized here [-Wreorder]
ir_swizzle::ir_swizzle(ir_rvalue *val, ir_swizzle_mask mask)
^
Reviewed-by: Matt Turner <mattst88@gmail.com>
VCN decode has a new interface, and that depends on the latest libdrm
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
If the SVGA3D_BindGBSurface() call in svga_buffer_hw_storage_unmap()
fails, we'll flush and that might involve unmapping other buffers.
That leads to a recursive lock on svga_screen::swc_mutex and causes
a deadlock. Fix this by initializing the mutex with mtx_recursive.
Note that this only happened on Linux, not Windows. On Windows, the
mutex functions are implemented with Win32 critical sections which
support recursive locking.
Also add a comment about this.
Fixes VMware bug 1831549 (Unigine Tropics demo freeze on Linux).
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Neha Bhende<bhenden@vmware.com>
This is useful for Piglit when thousands of tests are run and we want
to determine which test triggered a device error.
v2: only log command line info if the new SVGA_EXTRA_LOGGING env var is set
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The assembly code used by the SVGA message feature doesn't
build properly with older compilers, so limit it to only
gcc 5.3.0 and newer.
Also modified the stubs to avoid "unused variable" warnings.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
For now this capability only exists in the SVGA driver but
can be exported later if other modules, e.g. winsys, wants
to use it for logging.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since we're going to stop aubinator without a valid device id, better
report an error. This also silences a Coverity warning.
CID: 1405004
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We're using both exit(1) & exit(EXIT_FAILURE), settle for one, same
for success.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
BOs larger than the minimum fragment size should have their VA
alignet to at least the fragment size for optimal performance.
v2: drop unused leftover from initial implementation
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The idea behind doing this was to make it easier to set various flags.
However, we have enough custom flag settings floating around the driver
that this is more of a nuisance than a help. This commit has the
following functional changes:
1) The workaround_bo created in anv_CreateDevice loses both flags.
This shouldn't matter because it's very small and entirely internal
to the driver.
2) The bo created in anv_CreateDmaBufImageINTEL loses the
EXEC_OBJECT_ASYNC flag. In retrospect, it never should have gotten
EXEC_OBJECT_ASYNC in the first place.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Instead of returning valid types as just a number, we now walk the list
and check the buffer's usage against the usage flags we store in the new
anv_memory_type structure. Currently, valid_buffer_usage == ~0.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Before, we were just comparing the type index to 0. Now we actually
look the type up in the table and check its properties to determine what
kind of mapping we want to do.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
This doesn't matter right now since it only affects whether or not we
set the kernel bit but, if we ever do anything else based on it, we'll
want it to be correct per-gen.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Up until now, we've been memsetting the auxiliary surface to 0 at
BindImageMemory time to ensure that it is properly initialized.
However, this isn't correct because apps are allowed to freely alias
memory between different images and buffers so long as they properly
track whether or not a particular image is valid and, if it isn't,
transition from UNINITIALIZED to something else before using it. We
now implement those transitions so we can drop the hack.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Mark the functions 'exec="skip"' in the XML instead. libGL will still
have the functions, but the driver won't try to use them. I verified
that this commit works with piglit's 'object-namespace-pollution glClear
vertex-array' on x64 with a driver built from mesa-12.0.3 tag.
In fairness, this test also works with a libGL built from 7927d03. I
believe it continues to work because on non-Windows platforms we
generate some extra, dummy dispatch functions that can be used when a
driver requests a function unknown to libGL. This was done to provide
some "forward" compatibility with drivers that need more functions.
This doesn't work on Windows because the Windows calling convention is
for the callee to clean up the stack. That's the theory anyway.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Now that we lower vars to regs, we no longer regress for anything that
does complex dereferences. (With tgsi, derefers are already lowered
before tgsi_to_nir, but not with glsl_to_nir.) In fact it actually
fixes a few things to bypass tgsi.
So make NIR the default (finally!)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Instead of using load/store_var intrinsics, which can have complex
derefs in the case of multi-dimensional arrays, lower these to regs
and handle the direct/indirect loads in get_src() and stores in
put_dst().
This should let us switch to using nir by default.
Signed-off-by: Rob Clark <robdclark@gmail.com>
standalone_compiler_cleanup() frees the glsl types, among other things,
so it needs to come after nir->ir3. But since we exit after dumping the
disassembly, it is easier to just not call it at all.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Won't ever hit this w/ a420 gpu, so this is dead code. Need to get astc
working to know whether to rip this out entirely or not.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Maybe there is a better way to do this. But by the time we get to
assigning uniform locs, we want the atomic_uint's to all be gone,
otherwise we assert in st_glsl_attrib_type_size().
Signed-off-by: Rob Clark <robdclark@gmail.com>
I wasn't sure if I should filter the flags so that we only use
flags that actually change the shader output. To avoid manual
updates we just pass in everything for now.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will be used for things such as adding driver specific environment
variables to the key. Allowing us to set environment vars that change
the shader and not have the driver ignore them if it finds existing
shaders in the cache.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
This prevents spurious failures when libtxc-dxtn-s2tc is installed.
Note: lp_test_format doesn't need any change since we were already
ignoring S3TC failures there.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
Not really what the fast depth clear does, no matter whether you use
EXPCLEAR or not. Seems the fast clear using the DB HW always touches
the main buffer.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Did some RE'ing what several HTILE words give when read from a descriptor
with HTILE compression enabled.
Seems to align with -pro usage for D16 too.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
And correct implementation to specify only what we support.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Coverity caught the use of dead code copy-paste for
found_colors[] and num_found_colors.
CID: 1341850
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We're already verified that 'window' wasn't NULL, I'm guessing this
allocation error is about the newly created queue.
CID: 1409754
Fixes: 03dd9a88b0 ("egl/wayland: Use per-surface event queues")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
APPLE_vertex_array_object support was removed in 7927d0378f.
However it turns out we can't remove the functions because this
can cause issues when libglapi is used together with DRI
drivers built prior to said commit
Fixes: 7927d0378f ("mesa: drop APPLE_vertex_array_object support")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Potentially more efficient as it may avoid the struct being initialised
twice.
Also add var to the initialisation list while we are here.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
If the str is long or isn't null-terminated, strlen() could take a lot
of time or even crash. I don't know why was it used in the first place,
maybe for platforms without strnlen(), but strnlen() is already used
inside of ralloc_strndup(), so this change should not additionally
break anything.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Overwhelming majority of shaders don't use line continuations. In my
shader-db only shaders from the Talos Principle and Serious Sam used
them, less than 1% out of all shaders. Optimize for this case, don't
do any copying if no line continuation was found.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
strcmp() is slow. Initiate comparison with "__LINE__" or "__FILE__"
only if the identifier starts with '_', which is rare.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is shorter and easier on the eyes. At the same time this
also ensures that we are always asserting that the table pointer
is not NULL. Currently that was not done for all situations.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Use our knowledge that pointers are at least 4 byte aligned to remove
the useless digits. Then shift by 6, 10, and 14 bits and add this to
the original pointer, effectively folding in the entropy of the higher
bits of the pointer into a 4-bit section. Stopping at 14 means we can
add the entropy from 18 bits, or at least a 600Kbyte section of memory.
Assuming that ralloc allocates from a linearly allocated heap less than
this we can make a very efficient pointer hashing function for our usecase.
Even if we are not on an architecture that is 4 byte aligned, there is
still a high big chance that the thing we are allocating is at least
8 bytes in size, so even then we will have entropy into the third bit.
The 4 bit increment on the shifts is chosen rather arbitrarily; if we
had chosen a 3 bit increment we would need to add another xor to
cover a decently sized memorypool. Increasing it to 5 bits would
spread our entropy more, possibly hurting us with more collisions on
hash tables of size less than 32. With a hash table of size 16 there
are a max of 11 entries, and we can assume that with such a small table
collisions are not that painfull.
This allows us to hash the whole 32 or 64 bit pointer at once,
instead of running FNV1a, looping through each byte and doing
increments, decrements, muls, and xors on every byte. This cuts
_mesa_hash_data from 1.5 % on profiles, to making _mesa_hash_pointer
show up with a 0.09% share. Collisions on insertion actually seems to be
ever so slightly lower with this hash function, as found by printing
a loop counter and sorting the data.
perf stat shows a 1.5% reduction in instruction count,
and a 5% reduction in stalled cycles. Shader-db runtime goes
from 225 to 220 seconds.
No instruction-count changes in shader-db, but there are some minor
changes in cycle-count that is likely caused by nir walking a set
in some of its passes, and this causing a different ordering.
That might eventually lead to a difference in register allocation.
However, the effect is a net positive;
total cycles in shared programs: 24739550 -> 24738482 (-0.00%)
cycles in affected programs: 374468 -> 373400 (-0.29%)
helped: 178
HURT: 49
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Before the swapchain event queue is destroyed, all proxy objects that reference
it must be dropped. Otherwise we risk a use-after-free if a frame callback event
or buffer release events are received afterwards.
This happens when an application destroys and recreates a swapchain in FIFO
mode between two frames without using the VkSwapchainCreateInfoKHR::oldSwapchain
mechanism to keep the old swapchain until after the next redraw.
Fixes: 5034c61558 ("vulkan/wsi/wayland: Use proxy wrappers for swapchain")
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
This fixes the long-standing problem with Dying Light where the game would
produce a black screen when running under Mesa. This happened because the
game's vertex shaders redeclare gl_VertexID, which is a GLSL builtin.
Mesa's GLSL compiler is a little more strict than others, and would not
compile them:
error: `gl_VertexID' redeclared
The allow_glsl_builtin_variable_redeclaration directive allows the shaders
to compile and the game to render. The game also requires OpenGL 4.4+ (GLSL
440), but does not request it explicitly. It must be forced with an
override, such as MESA_GL_VERSION_OVERRIDE=4.5 and
MESA_GLSL_VERSION_OVERRIDE=450. A compatibility context is *not* required
and forcing one with 4.5COMPAT or allow_higher_compat_version results in
graphical artifacts.
Dead Island Definitive Edition is another Techland port on the same engine
with the same problems, so we set the
allow_glsl_builtin_variable_redeclaration option for that game as well.
v2 (Samuel Pitoiset):
- Rename allow_glsl_builtin_redeclaration ->
allow_glsl_builtin_variable_redeclaration
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96449
Signed-off-by: John Brooks <john@fastquake.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This option will allow GLSL builtins to be redeclared verbatim (e.g.
redeclaring "in int gl_VertexID" in a vertex shader). This is not strictly
valid and would normally fail to compile, but some applications (such as
newer Techland ports) do it and need more leniency.
v2 (Samuel Pitoiset):
- Rename allow_glsl_builtin_redeclaration ->
allow_glsl_builtin_variable_redeclaration
Signed-off-by: John Brooks <john@fastquake.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The previous condition was to clear it out if it had previously been
set, not what's in the current draw. That information is gone now, so
just clear it unconditionally.
Fixes: 330d0607e ("gallium: remove pipe_index_buffer and set_index_buffer")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Use a user-buffer-aware cleanup function.
Fixes: c24c3b94ed ("gallium: decrease the size of pipe_vertex_buffer - 24 -> 16 bytes")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
We can easily use the upload BO for push constants on Gen7.5/Gen8 too,
at the cost of a relocation when emitting 3DSTATE_CONSTANT_XS. We can
simply switch to using constant buffer pointer 2 instead of pointer 0,
like we do on Gen9+.
Ivybridge and Baytrail can't do this trick because they require the
constant buffers to be enabled in order, starting with 0. We'd have
to set the INSTPM bit to make the constant buffer pointer not relative
to dynamic state base address, which would need kernel command parser
support.
Improves performance in GLBenchmark 2.7/TRex Offscreen by:
- Broadwell GT2: 0.305608% +/- 0.19877% (n = 68)
- Braswell: No difference proven (n = 742)
- Haswell GT3e: 0.180755% +/- 0.0237505% (n = 30)
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Shaders can use quite a bit of uniform data. Better to put it in the
upload buffers, like we do for client vertex data, rather than the
batch buffer state area, which is primarly used for indirect state.
This should free up batch space, allowing us to emit more commands in a
batch before flushing. Because BRW_NEW_BATCH also causes a lot of state
to be re-emitted, it may also reduce CPU overhead a little bit.
We took this approach on Gen4-5, but switched to using the batch area
on Gen6+ because buffer 0 is relative to Dynamic State Base Address by
default, which is set to the start of the batch.
On Gen9+, we already use a relocation due to a workaround, so this is
trivial to change and has basically no downside.
Unfortunately we can't change compute shader push constants because
MEDIA_CURBE_LOAD always uses an offset from dynamic state base address.
Improves performance in GLBenchmark 2.7/TRex Offscreen by:
- Skylake GT4e: 0.52821% +/- 0.113402% (n = 190)
- Apollolake: 0.510225% +/- 0.273064% (n = 70)
Reviewed-by: Chris Forbes <chrisforbes@google.com>
I don't think CS push constant uploading uses the section of L3
controlled by 3DSTATE_PUSH_CONSTANT_ALLOC_XS. So I don't think
it needs to be re-emitted when that space is reallocated.
The programming note in gen7_allocate_push_constants doesn't
indicate this is necessary, at least.
Reviewed-by: Chris Forbes <chrisforbes@google.com>
The instruction encodings only allow for immediates. Don't try to
replace a zero (which is dumb to have in that op in any case) with RZ.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Just like is done on desktop and what is expected by the build-id code.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Former is not a thing, even if I have a hacked xcb-fixes.pc on my system.
Thanks for spotting it Mark!
Fixes: 9a90d6a9d4 ("configure.ac: add xcb-fixes to the XCB DRI3 list")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
The XCB module is used by the VL targets. Thus omitting it can lead to
link-time errors due to unresolved symbols.
Other DRI3 users such as the Vulkan WSI and the dri3 loader helper do
not use an update region in their xcb_present_pixmap() call. We will
look into that at a later stage.
Fixes: acf3d2afab ("configure: check once for DRI3 dependencies")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101110
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
As gen_builder.hpp file is generated, it contains information that is
specific to the LLVM version it originates from.
As suggested by Tim, the file seems to be forwards compatible. So in
order to produce ship a file which will work everywhere we should be
using earlies supported LLVM - 3.9.
With this we're back on track and can build all of mesa without
python/mako/flex and friends.
In the long term we might want to see if the python generators can be
updated to produce LLVM version agnostic files. At least within the
range supported by SWR.
Cc: <mesa-stable@lists.freedesktop.org>
Cc: Chuck Atkins <chuck.atkins@kitware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
When we fallback currently the gl_program objects are re-allocated.
This is likely to change when the i965 cache lands, but for now
this fixes a crash when using MESA_GLSL=cache_fb. This env var
simulates the fallback path taken when a tgsi cache item doesn't
exist due to being evicted previously or some kind of error.
Unlike i965 we are always falling back at link time so it's safe to
just re-allocate everything. We will be unnecessarily freeing and
re-allocate a bunch of things here but it's probably not a huge deal,
and can be changed when the i965 code lands.
Fixes: 0e9991f957 ("glsl: don't reference shader prog data during cache fallback")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
For the gallium state tracker a tgsi binary may have been evicted
from the cache to make space. In this case we would take the
fallback path and recompile/link the shader.
On i965 there are a number of reasons we can get to the program
upload stage and have neither IR nor a valid cached binary.
For example the binary may have been evicted from the cache or
we need a variant that wasn't previously cached.
This environment variable enables us to force the fallback path that
would be taken in these cases and makes it easier to debug these
otherwise hard to reproduce scenarios.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will explicitly state that we are following the fallback
path when we find invalid/corrupt cache items. It will also
output the fallback message when the fallback path is forced
via an environment variable, the following patches will allow
this.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously we required --enable-egl for the platform selection to work.
Additionally due to the broken DRI3 dependency tracking we needed
--enable-glx.
Since both of these are now sorted now we no longer need the
workarounds.
While we're here, explicitly enable dri3.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The vl_*_screen_create API properly falls back to a NOP when we're
building without specific platforms. So the only thing we need is to
handle the lack of X11/Xlib.h and provide a dummy Display define.
Cc: <mesa-stable@lists.freedesktop.org>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Provide a dummy stub when the user has opted w/o said platform, thus
we can build the binaries without unnecessarily requiring X11/other
headers.
In order to avoid build and link-time issues, we remove the HAVE_DRI3
guards in the VA and VDPAU state-trackers.
With this change st/va will return VA_STATUS_ERROR_ALLOCATION_FAILED
instead of VA_STATUS_ERROR_UNIMPLEMENTED. That is fine since upstream
users of libva such as vlc and mpv do little error checking, let
alone distinguish between the two.
Cc: Leo Liu <leo.liu@amd.com>
Cc: Guttula, Suresh <Suresh.Guttula@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Currently we are having the XCB_DRI3 dependencies duplicated,
partially.
Just do a once-off check and add all of the respective CFLAGS/LIBS
where needed.
As a nice side effect this helps us solve a couple of FIXMEs.
DRI3 is not a thing w/o X11 so disable it in such cases.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Building EGL/Vulkan/other without X11, while GLX is enabled is confusing
and misleading. In practise anyone aiming at the former will also
disable GLX.
The inverse (some examples below) should still work:
./configure --disable-glx --with-platforms=x11 --with-vulkan-drivers=intel
./configure --disable-glx --with-platforms=x11 --enable-egl
Keep in mind that the X11 platform is enabled, by default.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Rather than having multiple places that define the macros, do it just
once in configure. Makes existing code a bit shorter and easier to
manage as we fix the VL targets with follow-up commits.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
There's still an error after my recent clean-up if LLVM is not patched to
enable AMDGPU target:
external/mesa3d/src/amd/common/ac_llvm_util.c:38:2: error: implicit declaration of function 'LLVMInitializeAMDGPUTargetInfo' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
LLVMInitializeAMDGPUTargetInfo();
^
external/mesa3d/src/amd/common/ac_llvm_util.c:39:2: error: implicit declaration of function 'LLVMInitializeAMDGPUTarget' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
LLVMInitializeAMDGPUTarget();
^
external/mesa3d/src/amd/common/ac_llvm_util.c:40:2: error: implicit declaration of function 'LLVMInitializeAMDGPUTargetMC' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
LLVMInitializeAMDGPUTargetMC();
^
external/mesa3d/src/amd/common/ac_llvm_util.c:41:2: error: implicit declaration of function 'LLVMInitializeAMDGPUAsmPrinter' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
LLVMInitializeAMDGPUAsmPrinter();
^
We need to drop libmesa_amd_common when LLVM is disabled, however there's
still a dependency on include paths for ac_binary.h. So explicitly add the
include path when LLVM is disabled.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Commit 3dfe61ed6e ("gallium: decrease the size of pipe_box - 24 -> 16
bytes") changed the size of pipe_box, but the virgl code was relying on
pipe_box and drm_virtgpu_3d_box structs having the same size/layout doing
a struct copy. Copy the fields one by one instead.
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Fixes: 3dfe61ed6e ("gallium: decrease the size of pipe_box - 24 -> 16 bytes")
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Function droid_swap_buffers may get called without dri2_surf->buffer set,
in these cases we don't have a back buffer set either. Patch fixes segfault
seen with 3DMark that uses android.opengl.GLSurfaceView for rendering it's UI.
backtrace:
#00 pc 00013f88 /system/lib/egl/libGLES_mesa.so (droid_swap_buffers+104)
#01 pc 000117b2 /system/lib/egl/libGLES_mesa.so (dri2_swap_buffers+50)
#02 pc 000058b2 /system/lib/egl/libGLES_mesa.so (eglSwapBuffers+386)
#03 pc 00011329 /system/lib/libEGL.so (eglSwapBuffersWithDamageKHR+553)
#04 pc 000118e7 /system/lib/libEGL.so (eglSwapBuffers+55)
#05 pc 000754dc /system/lib/libandroid_runtime.so
Note, this is v1 as v2 caused dEQP regressions.
Fixes: 2acc69d ("EGL/Android: Add EGL_EXT_buffer_age extension")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Commit 9ca6711faa changed the Wayland winsys to only block for the
frame callback inside SwapBuffers, rather than get_back_bo. get_back_bo
would perform a single non-blocking Wayland event dispatch, to try to
find any release events which we had pulled off the wire but not
actually processed. The blocking dispatch was moved to SwapBuffers.
This removed a guarantee that we would've processed all events inside
get_back_bo(), and introduced a failure whereby the server could've sent
a buffer release event, but we wouldn't have read it. In clients
unconstrained by SwapInterval (rendering ~as fast as possible), which
were being displayed directly without composition (buffer release delayed),
this could lead to get_back_bo() failing because there were no free
buffers available to it.
The drawing rightly failed, but this was papered over because of the
path in eglSwapBuffers() which attempts to guarantee a BO, in order to
support calling SwapBuffers twice in a row with no rendering actually
having been performed.
Since eglSwapBuffers will perform a blocking dispatch of Wayland
events, a buffer release would have arrived by that point, and we
could then choose a buffer to post to the server. The effect was that
frames were displayed out-of-order, since we grabbed a frame with random
past content to display to the compositor.
Ideally get_back_bo() failing should store a failure flag inside the
surface and cause the next SwapBuffers to fail, but for the meantime,
restore the correct behaviour such that get_back_bo() no longer fails.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98833
Fixes: 9ca6711faa ("Revert "wayland: Block for the frame callback in get_back_bo not dri2_swap_buffers"")
During display initialisation, we need a separate event queue to handle
the registry events, which is correctly handled. But we also need
separate per-surface event queues to handle swapchain-related events,
such as surface frame events and buffer release events. This avoids two
surfaces from the same EGLDisplay, both current on separate threads,
dispatching each other's events.
Create separate per-surface event queues, create wl_surface and wl_drm
proxy wrapper objects per surface, so we eliminate the race around
sending events to the wrong queue. swrast buffers do not need a
dedicated proxy wrapper, as the wl_shm_pool used to create the
wl_buffers, being transient, can itself be assigned to a queue.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 36b9976e1f ("egl/wayland: Avoid race conditions when on non-main thread")
Cc: mesa-stable@lists.freedesktop.org
Though most swapchain operations used a queue, they were racy in that
the object was created with the queue only set later, meaning that its
event could potentially be dispatched from the default queue in between
these two steps.
Use proxy wrappers to avoid this race, also assigning wl_buffers created
for the swapchain to the event queue.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Calling random callbacks on the display's event queue is hostile, as
we may call into client code when it least expects it. Create our own
event queue, one per wsi_wl_display, and use that for the registry.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
There's no need to call wl_display_roundtrip() after trying to create a
buffer through wl_drm; if it succeeds then everything is fine, and if it
fails, then we get a fatal protocol error so can't recover anyway.
Additionally, doing a roundtrip on the default / main application queue,
is destructive anyway, so would need to be its own queue.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
The procedure for decompressing an opaque DXT1 OpenGL format is
dependant on the comparison of two colors stored in the first 32 bits of
the compressed block. Here's the specified OpenGL behavior for
reference:
The RGB color for a texel at location (x,y) in the block is given by:
RGB0, if color0 > color1 and code(x,y) == 0
RGB1, if color0 > color1 and code(x,y) == 1
(2*RGB0+RGB1)/3, if color0 > color1 and code(x,y) == 2
(RGB0+2*RGB1)/3, if color0 > color1 and code(x,y) == 3
RGB0, if color0 <= color1 and code(x,y) == 0
RGB1, if color0 <= color1 and code(x,y) == 1
(RGB0+RGB1)/2, if color0 <= color1 and code(x,y) == 2
BLACK, if color0 <= color1 and code(x,y) == 3
The sampling operation performed on an opaque DXT1 Intel format essentially
hard-codes the comparison result of the two colors as color0 > color1.
This means that the behavior is incompatible with OpenGL. This is stated
in the SKL PRM, Vol 5: Memory Views:
Opaque Textures (DXT1_RGB)
Texture format DXT1_RGB is identical to DXT1, with the exception that the
One-bit Alpha encoding is removed. Color 0 and Color 1 are not compared, and
the resulting texel color is derived strictly from the Opaque Color Encoding.
The alpha channel defaults to 1.0.
Programming Note
Context: Opaque Textures (DXT1_RGB)
The behavior of this format is not compliant with the OGL spec.
The opaque and non-opaque DXT1 OpenGL formats are specified to be
decoded in exactly the same way except the BLACK value must have a
transparent alpha channel in the latter. Use the four-channel BC1 Intel
formats with the alpha set to 1 to provide the behavior required by the
spec. Note that the alpha is already set to 1 for RGB formats in
brw_get_texture_swizzle().
v2: Provide a more detailed commit message (Kenneth Graunke).
v3: Ensure the alpha channel is set to 1 for DXT1 formats.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100925
Cc: <mesa-stable@lists.freedesktop.org>
Acked-by: Tapani Pälli <tapani.palli@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
The procedure for decompressing an opaque BC1 Vulkan format is dependant on the
comparison of two colors stored in the first 32 bits of the compressed block.
Here's the specified OpenGL (and Vulkan) behavior for reference:
The RGB color for a texel at location (x,y) in the block is given by:
RGB0, if color0 > color1 and code(x,y) == 0
RGB1, if color0 > color1 and code(x,y) == 1
(2*RGB0+RGB1)/3, if color0 > color1 and code(x,y) == 2
(RGB0+2*RGB1)/3, if color0 > color1 and code(x,y) == 3
RGB0, if color0 <= color1 and code(x,y) == 0
RGB1, if color0 <= color1 and code(x,y) == 1
(RGB0+RGB1)/2, if color0 <= color1 and code(x,y) == 2
BLACK, if color0 <= color1 and code(x,y) == 3
The sampling operation performed on an opaque DXT1 Intel format essentially
hard-codes the comparison result of the two colors as color0 > color1. This
means that the behavior is incompatible with OpenGL and Vulkan. This is stated
in the SKL PRM, Vol 5: Memory Views:
Opaque Textures (DXT1_RGB)
Texture format DXT1_RGB is identical to DXT1, with the exception that the
One-bit Alpha encoding is removed. Color 0 and Color 1 are not compared, and
the resulting texel color is derived strictly from the Opaque Color Encoding.
The alpha channel defaults to 1.0.
Programming Note
Context: Opaque Textures (DXT1_RGB)
The behavior of this format is not compliant with the OGL spec.
The opaque and non-opaque BC1 Vulkan formats are specified to be decoded in
exactly the same way except the BLACK value must have a transparent alpha
channel in the latter. Use the four-channel BC1 Intel formats with the alpha
set to 1 to provide the behavior required by the spec.
v2 (Kenneth Graunke):
- Provide a more detailed commit message.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100925
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
This is mostly for running in our CI system to prevent dEQP from
continuing on to the next test if we get a GPU hang. As it currently
stands, dEQP uses the same VkDevice for almost all tests and if one of
the tests hangs, we set the anv_device::device_lost flag and report
VK_ERROR_DEVICE_LOST for all queue operations from that point forward
without sending anything to the GPU. dEQP will happily continue trying
to run tests and reporting failures until it eventually gets crash that
forces the test runner to start over. This circumvents the problem by
just aborting the process if we ever get a GPU hang. Since this is not
the recommended behavior most of the time, we hide it behind an
environment variable.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We weren't wrapping this before because anv_cmd_buffer_execbuf may throw
a more meaningful error message. However, we do change the error code
into VK_ERROR_DEVICE_LOST, so we should print a new message.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
On GFX9 with only 4K CE RAM, define the range of slots that will be
allocated in CE RAM. All other slots will be uploaded directly. This will
switch dynamically according to which slots are used by current shaders.
GFX9 CE usage should now be similar to VI instead of being often disabled.
Tested on VI by taking the GFX9 CE allocation codepath and setting
num_ce_slots = 2 everywhere to get frequent switches between both modes.
CE is still disabled on GFX9.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
A later commit will only upload descriptors used by shaders, so we won't do
full dumps anymore, so the only way to have a complete mirror of CE RAM
in memory is to do a separate dump after the last draw call.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
All updates of descriptors_dirty also set dirty_mask, so the return is
unnecessary. The next commit will want this function to be executed
even if dirty_mask == 0.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Sampler slots: slot[8], .. slot[39] (ascending)
Image slots: slot[7], .. slot[0] (descending)
Each image occupies 1/2 of each slot, so there are 16 images in total,
therefore the layout is: slot[15], .. slot[0]. (in 1/2 slot increments)
Updating image slot 2n+i (i <= 1) also dirties and re-uploads slot 2n+!i.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Constant buffers: slot[16], .. slot[31] (ascending)
Shader buffers: slot[15], .. slot[0] (descending)
The idea is that if we have 4 constant buffers and 2 shader buffers, we only
have to upload 6 slots. That optimization is left for a later commit.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Only the first array element was declared, so tgsi_shader_info::
shader_buffers_declared didn't match what the shader was using.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
RadeonSI needs to do a special lowering for Gather4 with integer
formats, but with bindless samplers we just can't access the index.
Instead, store the return type in the instruction like the target.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The datalayout for modules was purposely not being set in order to work around
the fact that the ExecutionEngine requires that the module's datalayout
matches the datalayout of the TargetMachine that the ExecutionEngine is
using.
When the pass manager runs on a module with no datalayout, it uses
the default datalayout which is little-endian. This causes problems
on big-endian targets, because some optimizations that are legal on
little-endian or illegal on big-endian.
To resolve this, we set the datalayout prior to running the pass
manager, and then clear it before creating the ExectionEngine.
This patch fixes a lot of piglit tests on big-endian ppc64.
Cc: mesa-stable@lists.freedesktop.org
Fixes regressions in Android CtsVerifier.apk on Intel Chrome OS devices
due to incorrect error handling in eglMakeCurrent. See below on how to
confirm the regression is fixed.
This partially reverts
commit 23c86c74cc
Author: Chad Versace <chadversary@chromium.org>
Subject: egl: Emit error when EGLSurface is lost
The problem with commit 23c86c74 is that, once an EGLSurface became
lost, the app could never unbind the bad surface. Each attempt to unbind
the bad surface with eglMakeCurrent failed with EGL_BAD_CURRENT_SURFACE.
Specificaly, the bad commit added the error handling below. #2 and #3
were right, but #1 was wrong.
1. eglMakeCurrent emits EGL_BAD_CURRENT_SURFACE if the calling
thread has unflushed commands and either previous surface is no
longer valid.
2. eglMakeCurrent emits EGL_BAD_NATIVE_WINDOW if either new surface
is no longer valid.
3. eglSwapBuffers emits EGL_BAD_NATIVE_WINDOW if the swapped surface
is no longer valid.
Whe I wrote the bad commit, I misunderstood the EGL spec language
for #1. The correct behavior is, if I understand correctly now, is
below. This patch doesn't implement the correct behavior, though, it
just reverts the broken behavior.
- Assume a bound EGLSurface is no longer valid.
- Assume the bound EGLContext has unflushed commands.
- The app calls eglMakeCurrent. The spec requires eglMakeCurrent to
implicitly flush. After flushing, eglMakeCurrent emits
EGL_BAD_CURRENT_SURFACE and does *not* alter the thread's
current bindings.
- If the app calls eglMakeCurrent again, and the app inserts no
commands into the GL command stream between the two eglMakeCurrent
calls, then this second eglMakeCurrent succeeds without emitting an
error.
How to confirm this fixes the regression:
Download android-cts-verifier-7.1_r5-linux_x86-x86.zip from
source.android.com, unpack, and `adb install CtsVerifier.apk`.
Run test "Projection Cube". Click the Pass button (a
green checkmark). Then run test "Projection Widget". Confirm that
widgets are visible and that logcat does not complain about
eglMakeCurrent failure.
Then confirm there are no regressions in the cts-traded module that
commit 263243b1 fixed:
cts-tf > run cts --skip-preconditions --skip-device-info \
-m CtsCameraTestCases \
-t android.hardware.camera2.cts.RobustnessTest
Tested with Chrome OS board "reef".
Fixes: 23c86c74 (egl: Emit error when EGLSurface is lost)
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Nicolas Boichat <drinkcat@chromium.org>
Cc: Emil Velikov <emil.velikov@collabora.com>
According to the VK_KHX_multiview spec:
"Multiview causes all drawing and clear commands in the subpass to
behave as if they were broadcast to each view, where each view is
represented by one layer of the framebuffer attachments."
This adds support for multiview clears, which were missing in the
initial implementation.
v2 (Jason):
- split multiview from regular case
- Use for_each_bit() macro
Fixes new CTS multiview tests:
dEQP-VK.multiview.clear_attachments.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There is really no reason why the current DrawBuffer needs to be complete
at this point. In particular, the assertion gets hit on the X server side
in libglx when running .../piglit/bin/glx-get-current-display-ext -auto
(which uses indirect GLX rendering).
Fixes: 19b61799e3 ("st/mesa: don't cast the incomplete framebufer to st_framebuffer")
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reorder the uniforms to load first the dvec4-aligned variables in the
push constant buffer and then push the vec4-aligned ones. It takes
into account that the relocated uniforms should be aligned to their
channel size.
This fixes a bug were the dvec3/4 might be loaded one part on a GRF and
the rest in next GRF, so the region parameters to read that could break
the HW rules.
v2:
- Fix broken logic.
- Add a comment to explain what should be needed to optimise the usage
of the push constant buffer slots, as this patch does not pack the
uniforms.
v3:
- Implemented the push constant buffer usage optimization.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
It was setting XYWZ swizzle and writemask to all uniforms, no matter if they
were a vector or scalar, so this can lead to problems when loading them
to the push constant buffer.
Moreover, 'shift' calculation was designed to calculate the offset in
DWORDS, but it doesn't take into account DFs, so the calculated swizzle
for the later ones was wrong.
The indirect case is not changed because MOV INDIRECT will write
to all components. Added an assert to verify that these uniforms
are aligned.
v2:
- Fix 'shift' calculation (Curro)
- Set both swizzle and writemask.
- Add assert(shift == 0) for the indirect case.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We are going to add a packing feature to reduce the usage of the push
constant buffer. One of the consequences is that 'nr_params' would be
modified by vec4_visitor's run call, so we need to restore it if one of
them failed before executing the fallback ones. Same thing happens to the
uniforms values that would be reordered afterwards.
Fixes GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 when
the dvec4 alignment and packing patch is applied.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Francisco Jerez <currojerez@riseup.net>
If X11 did a software fallback to the entire screen, we would throw out
the BO the screen is scanning out from and allocate a new one.
Cc: mesa-stable@lists.freedesktop.org
Before: DrawElements (16 VBOs) w/ no state change: 4.34 million/s
After: DrawElements (16 VBOs) w/ no state change: 8.80 million/s
This inefficiency was uncovered by Timothy Arceri's no_error work.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Together with some fixes to xdriinfo this fixes xdriinfo not working
with glvnd.
Since apps (xdriinfo) expect GetDriverConfig to work without going to
need through the dance to setup a glxcontext (which is a reasonable
expectation IMHO), the dispatch for this ends up significantly different
then any other dispatch function.
This patch gets the job done, but I'm not really happy with how this
patch turned out, suggestions for a better fix are welcome.
Cc: Kyle Brenneman <kbrenneman@nvidia.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
This change fixes the build break with llvm-svn.
r301981 of llvm-svn made add/remove of function attributes
use AttrBuilder instead of AttributeList.
Tested with llvm-3.9, llvm-4.0, llvm-svn.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Commit 6facb0c0 ("android: fix libz dynamic library dependencies")
unconditionally adds libz as a dependency to all shared libraries.
That is unnecessary.
Commit 85a9b1b5 introduced libz as a dependency to libmesa_util.
So only the shared libraries that use libmesa_util need libz.
Fix Android Lollipop build by adding the include path of zlib to
libmesa_util explicitly instead of getting the path implicitly
from zlib since it doesn't export the include path in Lollipop.
Fixes: 6facb0c0 "android: fix libz dynamic library dependencies"
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rob Herring <robh@kernel.org>
This reduces duplication between the dsa and non-dsa function
and will also be used in the following commit to add
KHR_no_error support.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
To be used to add KHR_no_error support while sharing code between
the DSA and non-DSA OpenGL function.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
All combined depth stencil buffers (even those with just stencil)
require a 4x4 alignment on Sandy Bridge. The only depth/stencil buffer
type that requires 4x2 is separate stencil.
Reviewed-by: Chad Versace <chadversary@chromium.org>
The Ivy Bridge PRM provides a nice table that handles most of the
alignment cases in one place. For standard color buffers we have a
little freedom of choice but for most depth, stencil and compressed it's
hard-coded. Chad's original functions split halign and valign apart and
implemented them almost entirely based on restrictions and not the
table. This makes things way more confusing than they need to be. This
commit gets rid of the split and makes us implement the exact table
up-front. If our surface isn't one of the ones in the table then we
have to make real choices.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
The reasoning Chad gave in the comment for choosing a valign of 4 is
entirely bunk. The fact that you have to multiply pitch by 2 is
completely unrelated to the halign/valign parameters used for texture
layout. (Not completely unrelated. W-tiling is just Y-tiling with a
bit of extra swizzling which turns 8x8 W-tiled chunks into 16x4 y-tiled
chunks so it makes everything easier if miplevels are always aligned to
8x8.) The fact that RENDER_SURFACE_STATE::SurfaceVerticalAlignmet
doesn't have a VALIGN_8 option doesn't matter since this is gen7 and you
can't do stencil texturing anyway.
v2 (Jason Ekstrand):
- Delete most of Chad's comment and add a more descriptive commit
message.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
On all 3 gens, we have 4 bits for width and height in the VSC pipe
config. And overflow results in setting width and/or height to zero
which causes hangs.
Signed-off-by: Rob Clark <robdclark@gmail.com>
We probably *could* do this with blit path, but I think it would involve
clobbering settings from batch->gmem (see emit_zs()).
Signed-off-by: Rob Clark <robdclark@gmail.com>
This way we can just test the feature bits and don't need to spread
the debug overrides to all locations touching a feature.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
PIPE_BUFFER is a target enum, not a binding. This caused the driver to
up-align the height of buffer resources, leading to largely oversizing
those resources. This is especially bad, as the buffer resources used
by the upload manager are already 1MB in size. Height alignment meant
that those would result in 4 to 8MB big BOs.
Fixes: c9e8b49b88 ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Coverity doesn't understand that we'll never pass non-NULL for vertex
shaders.
This is a bit lame, actually. A straightforward cross-procedural analysis
limited to this source file should be enough to prove that there's no
NULL-pointer dereference. Oh well.
CID: 1405999
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
What we care about is whether PrimID is used while tessellation is
enabled; whether it's used in TCS/TES or further down the pipeline is
irrelevant.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This builds on commit 0549ea15ec ("radeonsi: fix primitive ID in
fragment shader when using tessellation").
Fixes piglit
arb_tessellation_shader/execution/gs-primitiveid-instanced.shader_test
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There should be no way the OpenGL test suites don't hit the assert()
should we do something to cause this code path to be taken.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
From Seciton 7.6 (UNIFORM VARIABLES) of the OpenGL 4.5 spec:
"If the value of location is -1, the Uniform* commands will
silently ignore the data passed in, and the current uniform values
will not be changed.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Which will allow us to print validation errors found in shader assembly
in GPU hang error states.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Newer Gens' names don't have the brackets. Having common names will make
some later patches simpler.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
intel_asm_annotation.c is part of libintel_compiler.la, which contains
code for disassembling and validating shaders that we want to call in
aubinator_error_decode.
dump_assembly() calls nir_print_instr() to print annotations, and
although dump_assembly() is not called by aubinator_error_decode (nor is
any function in intel_asm_annotation.c) it causes undefined references
to nir_print_instr().
To work around, provide a no-op weak symbol to resolve against.
This will allow the validator to run on shader programs we find in the
GPU hang error state.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This will allow us to more easily run brw_validate_instructions() on
shader programs we find in GPU hang error states.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
When the GPU hangs, the kernel saves some state for us. Until now it has
not included the shader programs, which are very often the reason the
GPU hang occurred. With the programs saved in the error state, we should
be more capable of debugging hangs.
Thanks to Chris Wilson and Ben Widawsky who provided the kernel support
for this feature ("drm/i915: Copy user requested buffers into the error
state"), which will be in kernel v4.13.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
1ce5853 broken compilation since LOG_ERROR is not defined and also
macro expansion won't work as planned (expands to 'ANDROID_egl2alog[level]')
v2: append 'ANDROID' to egl2alog table and use LOG_PRI
(suggested by Chih-Wei Huang)
Fixes: 1ce5853 ("egl: simplify the Android logger")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
In the unlikely case the parsing of genxml files fails, we were
leaking an xml parser object.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
"tc" will be initialized by the next commit.
v2: rename stuff according to v2 changes in u_threaded_context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
State trackers can set this to tell the driver when u_threaded_context is
desirable.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
If we haven't created a batch, just bail in pipe->flush(), since there
is nothing to do.
Fixes crash in warsow, which creates a whole bunch of contexts used for
nothing but texture uploads.
Signed-off-by: Rob Clark <robdclark@gmail.com>
My fault for not having time to test Marek's patches while they were on
list.
Fixes: 330d0607 ("gallium: remove pipe_index_buffer and set_index_buffer")
Signed-off-by: Rob Clark <robdclark@gmail.com>
Totally independent.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 0e6d532d32 "radv/meta: add support for save/restore meta without vertex data."
Some things trigger batches that only contain a clear (like glmark2
startup). No point to use GMEM for this.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Coverity caught the use of the uninitialised variable `type`.
However, it was `info->type`, which is initialised, which was meant to
be used.
CID: 1406000
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixes: b490ca9a38 ("nv50/ir: Fail if encountering unknown shader type")
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We warn again if there are more than one line with the "fixes:" tag.
The warning is silenced when the commit has already landed or each
fixes tag reference a commit that is in branch.
v2:
- Warn if any of the fixes tags has not landed (Emil)
v3:
- Remove unnecessary head command
- Clarify commit message (Emil)
- Skip already picked commits sooner (Emil)
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
By keeping track of fewer generics, everything can fit into 64 bits.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is a high as possible while still allowing to merge the bitfields
with the next commit.
For OpenGL, 32 would be sufficient. Nine apparently uses (much!) higher
indices than. Indices that are out of bound don't hurt for VS-PS
pipelines, except that the VS output kill optimization is not applied.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
OpenGL uses at most 32 generic outputs/inputs in any stage, and they always
have a shader IO index and therefore fit into the outputs_written/
inputs_read/kill_outputs fields.
However, Nine uses semantic indices more liberally. We support that
in VS-PS pipelines, except that the optimization of killing outputs
must be skipped.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ARB_bindless_texture allows images to be declared inside
structures. This is similar to samplers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
In order to sort indices for images inside a struct array we
need to do something similar to samplers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Use an alias for this field on 3DSTATE_INDEX_BUFFER on gen6+, so we can set
the same value as the defines.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Several issues were caught on review after the original patch landed.
This commit fixes them.
v2:
- Fix padding (Topi)
- Remove .DestinationElementOffset change from this patch (Topi)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was used by the meta fast clear code. Now that we've switched
back to BLORP, it's always true.
We might want it back when we add a RECTLIST extension to GL, but
that's someday in the future...
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This whole code is surrounded in #if GEN_GEN >= 6, and this code only
applies on Sandybridge. So, use GEN_GEN == 6 to reduce the delta in
the next patch, when we add Gen4-5 support.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We use Instruction State Base Address on Ironlake, so we want KSP to be
an offset not an actual pointer. Gen4/G45 use pointers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Like images, this prevents out-of-bound access when the explicit
binding layout qualifier is used with an array which contains
too much samplers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
If libunwind is not found we'll fail at PKG_CHECK_MODULES, so the
follow-up check will be false. Additionally the AM_CONDITIONAL is not
used, so we can drop it.
Fixes: 3bcef6aa24 ("configure.ac: honour --disable-libunwind if the .pc file is present")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
No longer required as of commit d90bf4ef3e ("radeon: remove unused
radeon_elf_util.{c,h}")
v2: Add the required libelf link in src/amd/Makefile.common.am
Fixes: d90bf4ef3e ("radeon: remove unused radeon_elf_util.{c,h}")
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Drop the unsupported pre-JellyBean macros and use a simple egl2android
mapping. With this we loose the explicit abort() provided by LOG_FATAL,
although Mesa already already calls exit(1) in case of a fatal errors.
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Including libgcc breaks on Android O (master). This doesn't appear to be
needed any more as both Android M and N have also been built w/o libgcc.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Android O moves to LLVM 3.9 and also has some differences in header
dependencies as LLVM has moved to blueprint files. It seems libLLVMCore
was only needed for header dependencies, so we can drop that for O.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Currently, building with "mmma external/mesa3d" which builds all targets
and dependencies is broken for targets that require LLVM. This is due to
the build settings depending on MESA_ENABLE_LLVM. Instead of using a
conditional in the global Android.common.mk, make all the components that
need LLVM explicitly include the necessary build settings.
GALLIVM_CPP_SOURCES doesn't exist anymore, so remove that as well.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Add libelf as a library dependency rather than explicitly listing its
include paths. This should work for Android M and later which have the
necessary exported directories in libelf.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Mesa no longer supports LLVM 3.5 for any targets we support.
Android-x86 adds support for llvmpipe which could work, but android-x86
for L is using mesa 11.0 anyway.
Dropping this support enables clean-up of libelf dependencies.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Add a driver string "all" so that if BOARD_GPU_DRIVERS is set to "all",
all the drivers are enabled in the build. This makes build testing all
drivers easier to maintain.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
src/gallium/targets/dri/Android.mk contains lots of conditional for
individual drivers. Let's move these details into the individual driver
makefiles.
In the process, align the make driver conditionals with automake
(i.e. HAVE_GALLIUM_*).
Signed-off-by: Rob Herring <robh@kernel.org>
[Emil Velikov: add the radeon winsys for radeonsi]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
It is not necessary to filter driver and winsys directories based on the
list of enabled drivers. Selecting the included driver libraries or not is
sufficient to control what is built.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
A build of only swrast is broken as the Android EGL now depends on
libdrm as does GBM. While we could make EGL conditionally depend on
libdrm, we probably want to enable kms_dri winsys as well and that will
need libdrm enabled. So just always enable libdrm and simplify the
Android makefiles a bit.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
[Emil Velikov: drop related inline comment]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Building libmesa_amd_common fails with:
external/mesa/src/amd/common/ac_shader_info.c:23:10: fatal error: 'nir/nir.h' file not found
^
external/mesa/src/compiler/nir/nir.h:48:10: fatal error: 'nir_opcodes.h' file not found
^
libmesa_amd_common now depends on libmesa_nir, so add it as a dependency
and export the necessary directories.
Fixes: 224cf29 "radv/ac: add initial pre-pass for shader info gathering"
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Add exported include paths rather than explicitly adding the includes
in each user of the common AMD libs.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Explicitly including libcxx includes is not necessary at least on
Android M and later. It appears that libc++ was made the default in
commit "Make libc++ the default STL." in Android build system post L.
However, if L support is still needed, using "LOCAL_CXX_STL=libc++" is
the preferred way.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Necessary to fix the following radeonsi building errors:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_blit.c:24:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_pipe.h:29:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_shader.h:71:
In file included from external/llvm/include/llvm-c/Core.h:18:
In file included from external/llvm/include/llvm-c/ErrorHandling.h:17:
In file included from external/llvm/include/llvm-c/Types.h:17:
external/llvm/include/llvm/Support/DataTypes.h:49:3: error: "Must #define __STDC_LIMIT_MACROS before #including Support/DataTypes.h"
^
external/llvm/include/llvm/Support/DataTypes.h:53:3: error: "Must #define __STDC_CONSTANT_MACROS before " "#including Support/DataTypes.h"
^
2 errors generated.
[Emil Velikov: add inline comment about the defines]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Inspired by Chih-Wei Huang and Zhen Wu similar patches
Linking against llvm with both static and shared may be avoided,
provided that libLLVM shared library for device supports
whole static R600/AMDGPU libraries, necessary for radeonsi/amdgpu.
Complementary changes, limited to android external/llvm project
are necessary to correclty build libLLVM
Tested with marshmallow-x86 and nougat-x86 builds
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This was added in b527dd65c8 as a work around because fixed function
fragment shaders were tracked in ctx->FragmentProgram._Current as
a gl_program rather than gl_shader_program.
However after my refactoring of the program and shader structs
at the end of 2016 which culminated in c505d6d852, we no longer
need gl_shader_program to track the current program making
_CurrentFragmentProgram obsolete.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_problem() is still useful in some places such as is if a backend
compile fails, but for the majority of cases we should be able to
remove it.
OpenGL test suites are becoming very mature, we should place more
trust in debug builds picking up missed cases.
Reviewed-by: Eric Anholt <eric@anholt.net>
deregisterEHFrames doesn't take any parameters anymore.
Reviewed-by: Vedran Miletić <vedran@miletic.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bas pointed out the fs key doesn't take srgb into account,
since there is just one srgb variant, just create a separate
pipeline for it. This also uses dest format to be more consistent
on when srgb matters.
Fixes: 69136f4e63 "radv/meta: add resolve pass using fragment/vertex shaders"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is something the original decoder did, but I didn't bother with
until now. I recently had to debug an Ironlake issue, and wanted to
inspect VS_STATE. So, now it's back.
The other packets in the switch statement are all Gen6/7+, where we
use offsets from dynamic state base address, so we don't need the
gtt_offset subtraction introduced here. We might want to make a
helper for this hack at some point - perhaps when we introduce the
next occurance.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The BRW_NEW_CURBE_OFFSETS dirty bit is signalled when changing the
partitioning of the Constant Buffer URB section between the various
shader stages, on Gen4-5.
BRW_NEW_PUSH_CONSTANT_ALLOCATION is basically the same thing on Gen7+.
So, save a bit, and use the new name.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Previously we guarded large swathes of code with #if GEN ... #endif
blocks. This made it difficult to see which generations include what.
This patch splits up the #if..#endif sections so they surround a small
section of code - usually a single function/atom, or sometimes a group
of related functions. It should make the code easier to work on.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Drop the old brw_get_line_width() helper which return the unsigned
fixed-point encoding of the line width - it's been dead since the
conversion to GENXML (which does the encoding for us).
Then rename brw_get_line_width_float() to the shorter name.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
For whatever reason, we had an INTEL_DEBUG=stats option that enabled
various statistics counters on Gen4-5 systems. It's been around
forever, though I can't think of a single time that it's been useful.
On Gen6+, we enable statistics all the time because they're necessary
to support various query object targets. Turning them off would break
those queries.
Gen4-5 don't support those queries, so the statistics counters generally
aren't useful; we disabled them by default. This patch disables them
altogether.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
We apparently enabled this on all platforms in Mesa 10.6. However, it
was only ever implemented for Gen6+. The Gen4-5 query code goes up in
flames with an "Unrecognized query target" unreachable() error if you
even attempt to use any of the new functionality.
This wasn't caught because the Piglit tests require OpenGL 3.0, which
Gen4-5 cannot support. The extension spec does say 3.0 is required,
though I'm not sure why - it seems like 2.1 would work fine.
We could implement it anyway, but it's a little bit of a pain due to the
lack of hardware contexts (so we have to snapshot around batches).
Given that it's been 100% broken for two years and I haven't seen a bug
report about it, I'm not terribly inclined to care. So, let it go.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The function was pretty slow. This brings a substantial decrease in draw
call overhead when min/max index bounds are not needed:
Before: DrawElements (1 VBO) w/ no state change: 5.75 million
After: DrawElements (1 VBO) w/ no state change: 7.03 million
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is the best place to do it. Now drivers without u_vbuf don't have to
do it.
v2: use correct upload size and optimal alignment
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The next patch will use it. This is really for svga and GL2-level drivers.
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
pipe_draw_info::indexed is replaced with index_size. index_size == 0 means
non-indexed.
Instead of pipe_index_buffer::offset, pipe_draw_info::start is used.
For indexed indirect draws, pipe_draw_info::start is added to the indirect
start. This is the only case when "start" affects indirect draws.
pipe_draw_info::index is a union. Use either index::resource or
index::user depending on the value of pipe_draw_info::has_user_indices.
v2: fixes for nine, svga
Similar to how image resources are handled. That way we are sure
that inst->resource.file is PROGRAM_SAMPLER for "bound" samplers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This fixes:
si_shader.c: In function ‘si_shader_dump_stats’:
si_shader.c:6704:31: warning: passing argument 1 of ‘si_get_max_workgroup_size’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
si_get_max_workgroup_size(shader);
^~~~~~
si_shader.c:5832:17: note: expected ‘struct si_shader *’ but argument is of type ‘const struct si_shader *’
static unsigned si_get_max_workgroup_size(struct si_shader *shader)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The check in update_single_program_texture() can also be
removed.
v2: - remove unused 's' variable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
valgrind reports them as leaked, and I could not find anything making a
copy of the nir pointer. Also, radv_device_init_meta_blit_color() is
already freeing them unconditionally like this.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
After successful drmGetDevices2() call, drmFreeDevices() needs to be
called.
Fixes: b1fb6e8d "anv: do not open random render node(s)"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> # radv version
drmGetDevices2 takes count and not size. Probably hasn't caused problems
yet in practice and was missed as setups with more than 8 DRM devices
are not very common.
Fixes: b1fb6e8d "anv: do not open random render node(s)"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The only thing still using it is INVOCATION_ID for geometry shaders.
That's easily enough inlined into the nir_intrinsic_load_invocation_id
handling code.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're already doing this in the FS back-end. This just does the same
thing in the vec4 back-end.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The NIR pass already handles remapping system values to attributes for
us so we delete the system value code as part of the conversion.
We also change nir_lower_vs_inputs to take an explicit inputs_read
bitmask and pass in the inputs_read from prog_data instead from pulling
it out of NIR. This is because the version in prog_data may get
EDGEFLAG added to it on some old platforms.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We also add a nice little comment to make it more clear exactly what
happens with the edge flag copy.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NIR calls these system values but they come in from the VF unit as
vertex data. It's terribly convenient to just be able to treat them as
such in the back-end.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The vec4 backend will want to count in units of vec4s, not scalar
components. The simplest solution is to move the multiplication by 4
into the scalar backend. This also improves consistency with how we
count varyings.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we have nice block iterators, there's no good reason for this
to be off on it's own. While we're here, we convert to using the NIR
const index getters/setters instead of whacking const_index values
directly.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit e1af20f18a changed the shader_info
from being embedded into being just a pointer. The idea was that
sharing the shader_info between NIR and GLSL would be easier if it were
a pointer pointing to the same shader_info struct. This, however, has
caused a few problems:
1) There are many things which generate NIR without GLSL. This means
we have to support both NIR shaders which come from GLSL and ones
that don't and need to have an info elsewhere.
2) The solution to (1) raises all sorts of ownership issues which have
to be resolved with ralloc_parent checks.
3) Ever since 00620782c9, we've been
using nir_gather_info to fill out the final shader_info. Thanks to
cloning and the above ownership issues, the nir_shader::info may not
point back to the gl_shader anymore and so we have to do a copy of
the shader_info from NIR back to GLSL anyway.
All of these issues go away if we just embed the shader_info in the
nir_shader. There's a little downside of having to copy it back after
calling nir_gather_info but, as explained above, we have to do that
anyway.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's now basically a single expression, so it probably makes sense to
have it inlined into the callers.
Suggested by Marek.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
With commit 10c17f23b7 ("freedreno: core compute state support"),
Android builds fail with the following error:
external/mesa3d/src/gallium/drivers/freedreno/freedreno_screen.c:610:17: error: format string is not a string literal (potentially insecure) [-Werror,-Wformat-security]
sprintf(ret, ir);
^~
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Needed to fix android building errors:
external/mesa/src/mesa/drivers/dri/i965/brw_state_upload.c:148: error: undefined reference to 'gen5_init_atoms'
external/mesa/src/mesa/drivers/dri/i965/brw_state_upload.c:150: error: undefined reference to 'gen45_init_atoms'
external/mesa/src/mesa/drivers/dri/i965/brw_state_upload.c:152: error: undefined reference to 'gen4_init_atoms'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
Fixes: 5a19d0b ("i965: Get real per-gen atom lists")
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Rasterizer core only supports polygonmode front==back. Add logic for
populating fillMode for the rasterizer only for that case correctly.
Provide enum conversion between mesa enums and core enums.
The core renders lines/points as tris. Previously, code would enable
stipple for polygonmode != FILL. Modify stipple enable logic so that
this works correctly.
No regressions in vtk tests.
Fixes the following piglit tests:
pointsprite
gl-1.0-edgeflag-const
v2: remove cc stable, and remove "not implemented" assert
v3: modified commit message
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Add support for polygonmode point in the binner. This is done by
splitting BinPostSetupPoints from BinPoints, so the earlier call can be
called from BinTriangles. Setup has already been done at the time
BinPostSetupPoints needs to be called.
This checkin just adds support in the rasterizer. A separate checkin
will add the appropriate driver support.
v2: remove cc stable
v3: modified commit message and subject line
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
v3: list piglit tests fixed by this patch. Fixed typo Tim pointed out.
v2: Reword commit message to more closely adhere to community
guidelines.
This patch moves msaa resolve down into core/StoreTiles where the
surface format conversion routines are available. The previous
"experimental" resolve was limited to 8-bit unsigned render targets.
This fixes a number of piglit msaa tests by adding resolve support for
all the render target formats we support.
Specifically:
layered-rendering/gl-layer-render: fail->pass
layered-rendering/gl-layer-render-storage: fail->pass
multisample-formats *[2,4,8,16] gl_arb_texture_rg: crash->pass
multisample-formats *[2,4,8,16] gl_ext_texture_snorm: crash->pass
multisample-formats *[2,4,8,16] gl_arb_texture_float: fail->pass
multisample-formats *[2,4,8,16] gl_arb_texture_rg-float: fail->pass
MSAA is still disabled by default, but can be enabled with
"export SWR_MSAA_MAX_COUNT=4" (1,2,4,8,16 are options)
The default is 0, which is disabled.
This patch improves the number of multisample-formats supported by swr,
and fixes several crashes currently in the 17.1 branch. Therefore, it
should be considered for inclusion in the 17.1 stable release. Being
disabled by default, it poses no risk to most users of swr.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
cc: mesa-stable@lists.freedesktop.org
The spec text cited above says you can't, but only the GLSL 3.00 (redefine
or overload) case was implemented.
Fixes dEQP scoping.invalid.redefine_builtin_fragment/vertex.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Matt Turner <mattst88@gmail.com>
From the spec,
Arrays are allowed as arguments, but not as the return type. [...] The
return type can also be a structure if the structure does not contain
an array.
Fixes DEQP shaders.functions.invalid.return_array_in_struct_fragment.
v2: Spec cite wording change
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Matt Turner <mattst88@gmail.com>
The renumbering code didn't take into account that multiple VS exports
can have the same PARAM index. This also significantly simplifies
the renumbering. Thankfully, we have piglits for this:
spec@arb_gpu_shader5@arb_gpu_shader5-interpolateatcentroid-packing
spec@glsl-1.50@execution@interface-blocks-complex-vs-fs
Reported by Michel Dänzer.
Fixes: b08715499e ("ac: eliminate duplicated VS exports")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
main/egldisplay.c: In function '_eglParseX11DisplayAttribList':
main/egldisplay.c:491:38: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
display->Options.Platform = (void *)value;
The fix: cast to uinptr_t before void*.
^
Fixes: ddb99127 egl/x11: Honor the EGL_PLATFORM_X11_SCREEN_EXT attribute
Cc: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Make it a bit clearer that the index spaces are logically seperate by
having them defined in different functions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Limiting LS-HS to a single wave is required on all SI chips due to an
issue with a power management feature.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In a VS->TCS->TES->PS pipeline, the primitive ID is read from TES exports,
so it is as if TES were using the primitive ID.
Specifically, this fixes a bug where the primitive ID is not reset at
the start of a new instance.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There are a bunch of piglit fast clear tests that regressed on SI, for
example ./bin/ext_framebuffer_multisample-fast-clear single-sample.
The problem is that a texture is bound as a framebuffer, cleared, and
then rendered from in a loop that loops through different clear colors.
The texture is never rebound during all this, so the change to
tex->dirty_level_mask during fast clear was not taken into account
when checking for compressed textures.
I have considered simply reverting the problematic commit. However,
I think this solution is better. It does require looping through all
bound textures after a fast clear, but the alternative would require
visiting more textures needless on every draw. Draws are much more
common than clears.
Note that the rendering feedback loop rules do not apply here, because
the framebuffer binding is changed between the glClear and the draw
that samples from the texture that was cleared.
Fixes: bdd6449769 ("radeonsi: don't mark non-dirty textures with CMASK as compressed")
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The header is used only to provide STATIC_ASSERT. The latter is already
available in utils/macros.h so use that instead and kill of the header.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Chad Versace <chadversary@chromium.org>
As of last commit nobody requires anything else but the
_eglDefaultLogger(). As such use it directly and simplify the
implementation.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Chad Versace <chadversary@chromium.org>
ARB_bindless_texture allows to declare image types inside
structures, which means we need to keep track of the format.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When the arrays are initialized later on with -1, that's useless
to use rzalloc_array().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The build/check test should be done with an appropriate combination of
flags, depending on the changes introduced by the patch set.
Also, mention to cross compile with mingw-w64 for Windows.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The maintanier should not just rely on the mesa-stable@ mailing list
but actually check the master branch in search for suitable nomination
candidates.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
If an identified commit was having more than one fix, we would warn
about that and only treat the first.
Now, we don't warn but treat all of them.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
When running shader-db with intel_stub and recent Mesa, context creation
fails when making a logical hardware context. In this case, we call
intelDestroyContext(), which gets here and tries to unmap the cache BO.
But there isn't one - we haven't made it yet. So we try to unmap a
NULL pointer, which used to be safe (it did nothing), but crashes
after commit 7c3b8ed878.
The result is that we crash rather than failing context creation with
a nice message. Either way nothing works, but this is more polite.
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This reverts commit c5bf7cb529.
This broke rendering in "Total War: WARHAMMER", which uses a single
level RGBA_UINT32 texture and the default filter modes of GL_LINEAR
and GL_NEAREST_MIPMAP_LINEAR. However, the texture max level is 0,
so it is actually mipmap complete - it's the integer + linear rule
that causes the error.
I'm working with Khronos to find a real solution. However it turns
out, this patch is not correct and breaks real programs, so let's
revert it for now.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100690
Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=16224
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Just like other type hash tables are destroyed in
_mesa_glsl_release_types(), also destroy the ones for function and
subroutine types.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
These were being fed to the shader as floats via the vertex
path, so also push them as floats here.
This fixes missing overlay in Sascha Willems demos.
Signed-off-by: Dave Airlie <airlied@redhat.com>
After moving everything to using push constants,
these paths are no longer needed.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The color clear value is uniform and needs only to be emitted from
the frag shader, so just push it down via a push constant,
and remove the vertex buffer completely.
The depth clear value needs to be emitted from the vertex
shader, but is only a single value.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This isn't necessary yet but I'd like to use the range in
some future patches.
[airlied: add new resolve pass]
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This drops the resolve workarounds that change an image
tiling mode behinds it's back, this is horrible and breaks
the image_view->image relationship. Remove all this.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
There are 3 resolve paths, the fastest being the hw resolver
but it has restriction on tile modes and can't do subresolves,
the compute resolver is next speed wise, but can't handle DCC
destinations, the fragment resolver handles that case.
This will end up with a slow down as currently we hack the
hw resolver paths when they shouldn't work, but we shouldn't
keep doing that.
The next patch removes the hacks.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In order to resolve into DCC enabled dests we need to use
the fragment shader. This reuses the code from the compute
path and implements a resolve path in vertex/fragment shader.
This code isn't used until later.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds a path to allow compute resolves to be used
for subpass resolves.
This isn't used yet, but will be later.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I want to reuse the same code for the fragment shader
version of the resolve shaders.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we are resolving into an srgb dest, we need to convert
to linear so the store does the conversion back.
This should fix some wierdness seen when we subresolves
hit the compute path.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Copy nir_print.c's snprintf definition for now, to unbreak Windows
builds.
We can and should cleanup all snprintf definitions in a follow up
change, but I rather not leave Windows build broken any further.
Trivial.
This code was merged commented out, and has stayed that way ever since.
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
BLORP doesn't program 3DSTATE_VF, since it doesn't use index buffers,
making the setting irrelevant. So there's no need to re-emit it after
a BLORP operation - the old setting will still be in place.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The whole "it might be used for non-indexed draws" thing is no longer
true - it turns out this was a mistake, and removed in OpenGL 4.5.
(See Marek's commit 96cbc1ca29e0b1f4f4d6c868b8449999aecb9080.) So
we can simplify this and just program 0 for non-indexed draws.
We can also use #if blocks to remove the atom on Ivybridge/Baytrail,
now that they have a separate atom list from Haswell. No more runtime
checks.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This reverts commit 7088b655e8.
It breaks performance counters. If you use them with this commit, they hang
the machine hard. Sysrq and ssh don't work.
Possibly other gen's have a similar limit. Fixes glmark2 -b shadow
with larger resolutions on devices with small gmem (for example,
fullscreen 1080p on 8x16/db410c).
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robdclark@gmail.com>
From section 4.4.6 of the ARB_bindless_texture spec:
"If both bindless_sampler and bound_sampler, or bindless_image
and bound_image, are declared at global scope in any
compilation unit, a link- time error will be generated."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
From section 2.14.8 of the ARB_bindless_texture spec:
"(modify second paragraph, p. 126) ... against the
MAX_COMBINED_TEXTURE_IMAGE_UNITS limit. Samplers accessed
using texture handles (section 3.9.X) are not counted against
this limit."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
From section 5.4.1 of the ARB_bindless_texture spec:
"In the following four constructors, the low 32 bits of the
sampler type correspond to the .x component of the uvec2 and
the high 32 bits correspond to the .y component."
uvec2(any sampler type) // Converts a sampler type to a
// pair of 32-bit unsigned integers
any sampler type(uvec2) // Converts a pair of 32-bit unsigned integers to
// a sampler type
uvec2(any image type) // Converts an image type to a
// pair of 32-bit unsigned integers
any image type(uvec2) // Converts a pair of 32-bit unsigned integers to
// an image type
v4: - fix up comment style
v3: - rebase (and remove (sampler) ? 1 : vector_elements)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
For the explicit conversions.
From section 4.1.7 of the ARB_bindless_texture spec:
"Samplers are represented using 64-bit integer handles, and
may be converted to and from 64-bit integers using constructors."
From section 4.1.X of the ARB_bindless_texture spec:
"Images are represented using 64-bit integer handles, and
may be converted to and from 64-bit integers using constructors."
v3: - add spec comment
- update the glsl error message
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (v2)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
From section 4.1.7 of the ARB_bindless_texture spec:
"Samplers may be declared as shader inputs and outputs, as uniform
variables, as temporary variables, and as function parameters."
From section 4.1.X of the ARB_bindless_texture spec:
"Images may be declared as shader inputs and outputs, as uniform
variables, as temporary variables, and as function parameters."
v3: - add spec comment
- update the glsl error message
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
From section 4.1.7 of the ARB_bindless_texture spec:
"Samplers can be used as l-values, so can be assigned into and
used as "out" and "inout" function parameters."
From section 4.1.X of the ARB_bindless_texture spec:
"Images can be used as l-values, so can be assigned into and
used as "out" and "inout" function parameters."
v4: - invert the logic
v3: - update spec comment formatting
- keep the read_only check
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Yes, this is a bit hacky but we don't really have the choice.
Plain GLSL doesn't accept bindless samplers/images as l-values
while it's allowed when ARB_bindless_texture is enabled.
The default NULL parameter is because we can't access the
_mesa_glsl_parse_state object in few places in the compiler.
One is_lvalue(NULL) call is for IR validation but other checks
happen elsewhere, should be safe.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
From section 4.1.7 of the ARB_bindless_texture spec:
"Samplers aggregated into arrays within a shader (using square
brackets []) can be indexed with arbitrary integer expressions."
v3: - update spec comment formatting
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
From section 4.3.4 of the ARB_bindless_texture spec
"(modify last paragraph, p. 35, allowing samplers and images as
fragment shader inputs) ... Fragment inputs can only be signed
and unsigned integers and integer vectors, floating point scalars,
floating-point vectors, matrices, sampler and image types, or
arrays or structures of these. Fragment shader inputs that are
signed or unsigned integers, integer vectors, or any
double-precision floating- point type, or any sampler or image
type must be qualified with the interpolation qualifier "flat"."
v3: - update spec comment formatting
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
From section 4.3.4 of the ARB_bindless_texture spec:
"(modify third paragraph of the section to allow sampler and
image types) ... Vertex shader inputs can only be float,
single-precision floating-point scalars, single-precision
floating-point vectors, matrices, signed and unsigned integers
and integer vectors, sampler and image types."
v3: - update spec comment formatting
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
From section 4.3.4 of the ARB_bindless_texture spec:
"(modify third paragraph of the section to allow sampler and image
types) ... Vertex shader inputs can only be float,
single-precision floating-point scalars, single-precision
floating-point vectors, matrices, signed and unsigned integers
and integer vectors, sampler and image types."
From section 4.3.6 of the ARB_bindless_texture spec:
"Output variables can only be floating-point scalars,
floating-point vectors, matrices, signed or unsigned integers or
integer vectors, sampler or image types, or arrays or structures
of any these."
v3: - add spec comment
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
ARB_bindless_texture spec allows images to be declared as
shader inputs.
v2: - put the */ on the following line (Timothy Arceri)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
ARB_bindless_texture allows to declare images inside structures
which means that qualifiers like writeonly should be allowed.
I have a got a confirmation from Jeff Bolz (one author of the spec),
because the spec doesn't clearly explain this.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
From section 4.3.7 of the ARB_bindless_texture spec:
"(remove the following bullet from the last list on p. 39, thereby
permitting sampler types in interface blocks; image types are also
permitted in blocks by this extension)"
* sampler types are not allowed
v3: - update the spec comment
- update the glsl error message
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The ARB_bindless_texture spec doesn't clearly state this, but as
it says "Replace Section 4.1.7 (Samplers), p. 25" and,
"Replace Section 4.1.X, (Images)", this should be allowed.
v3: - add spec comment
- update the glsl error message
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
From section 4.1.7 of the ARB_bindless_texture spec:
"Samplers can be used as l-values, so can be assigned into and used
as "out" and "inout" function parameters."
From section 4.1.X of the ARB_bindless_texture spec:
"Images can be used as l-values, so can be assigned into and used as
"out" and "inout" function parameters."
v3: - add spec comment
- update the glsl error message
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
From section 4.1.7 of the ARB_bindless_texture spec:
"Samplers may be declared as shader inputs and outputs, as uniform
variables, as temporary variables, and as function parameters."
From section 4.1.X of the ARB_bindless_texture spec:
"Images may be declared as shader inputs and outputs, as uniform
variables, as temporary variables, and as function parameters."
v3: - add validate_storage_for_sampler_image_types()
- update spec comment
- update the glsl error message
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This adds bindless_sampler and bound_sampler (and respectively
bindless_image and bound_image) to the parser.
v3: - add an extra space in apply_bindless_qualifier_to_variable()
- fix indentation in merge_qualifier()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
In plain GLSL, sampler and image types can only be declared
uniform-qualified global variables or 'in' function parameters.
Setting the read_only flag seems quite useless because other
checks will prevent sampler/image variables to be assigned and
also because the flag is not set for atomic_uint types which are
opaque types.
This will also help for ARB_bindless_texture because samplers
and images can be assigned when they are considered bindless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
As a side effect, this will magically fix std140/std430 interfaces
for bindless samplers/images and will help for implementing the
explicit conversions with constructors.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bindless samplers/images are 64-bit unsigned integers, which
means they consume two components as specified by
ARB_bindless_texture.
It looks like we are not wasting uniform storage by changing
this because default-block uniforms are not packed. So, if
we use N uint uniforms, they occupy N * 16 bytes in the
constant buffer. This is something that could be improved.
Though, count_uniform_size needs to be adjusted to not count
a sampler (or image) twice.
As a side effect, this will probably break the cache if you
have one because it will consider sampler/image types as
two components.
v3: - update the comments
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The ARB_bindless_texture spec says:
"Samplers are represented using 64-bit integer handles."
and,
"Images are represented using 64-bit integer handles."
It seems simpler to always consider sampler and image types
as 64-bit unsigned integer.
This introduces a temporary workaround in _mesa_get_uniform()
because at this point no flag are used to distinguish between
bound and bindless samplers. This is going to be removed in a
separate series. This avoids breaking arb_shader_image_load_store-state.
v3: - update the comment slightly
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
radv_bind_descriptor_set cannot be used to bind a push descriptor set
since a push descriptor set does not have a buffer list. However,
there is no need to add the buffers again when restoring a set, so
this fix is also an optimization.
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
On ARM Android platforms, the host_os tuple should be linux-androideabi,
so let's match both -android and -androideabi (or any other
-android* tuple) to determine if we should do an Android build.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Actually put something in unreachable(), so as not to break the build on
a Friday evening.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Mark Janes <mark.a.janes@intel.com>
When a buffer is being created from FD or GEM flink import, the current
API makes no provision for passing modifier information along with this.
Set the modifier for such images to DRM_FORMAT_MOD_INVALID.
Also preserve the modifier when duplicating an image, as will be done by
GBM when importing from a wl_buffer.
This doubly tripped up Wayland, as the images would first have been
created (as wl_buffers) with a 0 modifier, and then lost what modifier
they would've had when being duplicated into gbm_bos.
Fixes: d78a36ea62 ("i965/dri: Handle the linear fb modifier")
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Use a helper function and struct to convert between a modifier and
tiling mode, so we can use it later for a tiling -> modifier lookup.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The TGSI DCE pass doesn't eliminate dead assignments like
MOV TEMP[0], TEMP[1] in presence of loops because it assumes
that the visitor doesn't emit dead code. This assumption is
actually wrong and this situation happens.
However, it appears that the merge_registers() pass accidentally
takes care of this for some weird reasons. But since this pass has
been disabled for RadeonSI and Nouveau, the renumber_registers()
pass which is called *after*, can't do its job correctly.
This is because it assumes that no dead code is present. But if
there is still a dead assignment, it might re-use the TEMP
register id incorrectly and emits wrong code.
This patches fixes the issue by recording writes instead of reads,
and this has the advantage to be faster.
This should fix Unigine Heaven on RadeonSI and Nouveau.
shader-db results with RadeonSI:
47109 shaders in 29632 tests
Totals:
SGPRS: 1923308 -> 1923316 (0.00 %)
VGPRS: 1133843 -> 1133847 (0.00 %)
Spilled SGPRs: 2516 -> 2518 (0.08 %)
Spilled VGPRs: 65 -> 65 (0.00 %)
Private memory VGPRs: 1184 -> 1184 (0.00 %)
Scratch size: 1308 -> 1308 (0.00 %) dwords per thread
Code Size: 60095968 -> 60096256 (0.00 %) bytes
LDS: 1077 -> 1077 (0.00 %) blocks
Max Waves: 431889 -> 431889 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
It's still interesting to disable the merge_registers() pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Lost is an EGLBoolean, so we should assign it to EGL_TRUE/EGL_FALSE,
not true/false.
Fixes: e5eace5868 ("egl/android: Mark surface as lost when dequeueBuffer fails")
Fixes: 0212db3504 ("egl/android: Cancel any outstanding ANativeBuffer in surface destructor")
Reviewed-by: Chad Versace <chadversary@chromium.org>
Now that we can allocate states larger than the block size, we no longer
need a block size of 1MB which can be rather wasteful.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Previously, the maximum size of a state that could be allocated from a
state pool was a block. However, this has caused us various issues
particularly with shaders which are potentially very large. We've also
hit issues with render passes with a large number of attachments when we
go to allocate the block of surface state. This effectively removes the
restriction on the maximum size of a single state. (There's still a
limit of 1MB imposed by a fixed-length bucket array.)
For states larger than the block size, we just grab a large block off of
the block pool rather than sub-allocating. When we go to allocate some
chunk of state and the current bucket does not have state, we try to
pull a chunk from some larger bucket and split it up. This should
improve memory usage if a client occasionally allocates a large block of
state.
This commit is inspired by some similar work done by Juan A. Suarez
Romero <jasuarez@igalia.com>.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
The old algorithm worked fine assuming a constant block size. We're
about to break that assumption so we need an algorithm that's a bit more
robust against suddenly growing by a huge amount compared to the
currently allocated quantity of memory.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Now that the state stream is allocating off of the state pool, there's
no reason why we need the block pool to be separate.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Now that everything is going through the state pools, the block pool no
longer needs to be able to handle re-use.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Since the state_stream is now pulling from a state_pool, the only thing
pulling directly off the block pool is the state pool so we can just
move the block_size there. The one exception is when we allocate
binding tables but we can just reference the state pool there as well.
The only functional change here is that we no longer grow the block pool
immediately upon creation so no BO gets allocated until our first state
allocation.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
The helper functions aren't really gaining us as much as they claim and
are actually about to be in the way.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
We should only use size_t when referring to sizes of bits of CPU memory.
Anything on the GPU or just a regular array length should be a type that
has the same size on both 32 and 64-bit architectures. For state
objects, we use a uint32_t because we'll never allocate a piece of
driver-internal GPU state larger than 2GB (more like 16KB).
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
This just adds the chip in the right places.
We don't set the partial_vs_wave workaround, as radeonsi
doesn't, but have to confirm it's not required.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
That is, call ANativeWindow::cancelBuffer in droid_destroy_surface().
This should prevent application deadlock when the app destroys the
EGLSurface after EGL has acquired a buffer from SurfaceFlinger
(ANativeWindow::dequeueBuffer) but before EGL has released it
(ANativeWindow::enqueueBuffer).
This patch is part of a series for fixing
android.hardware.camera2.cts.RobustnessTest#testAbandonRepeatingRequestSurface
on Chrome OS x86 devices.
Cc: mesa-stable@lists.freedesktop.org
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Add a new bool, _EGLSurface::Lost, and check it in eglMakeCurrent and
eglSwapBuffers. The EGL 1.5 spec says that those functions emit errors
when the native surface is no longer valid.
This patch just updates core EGL. No driver sets _EGLSurface::Lost yet.
I discovered that Mesa failed to detect lost surfaces while debugging an
Android CTS camera test,
android.hardware.camera2.cts.RobustnessTest#testAbandonRepeatingRequestSurface.
This patch doesn't fix the test though, though, because the test expects
EGL_BAD_SURFACE when the surface becomes lost, and this patch actually
complies with the EGL spec. If I interpreted the EGL spec correctly,
EGL_BAD_NATIVE_WINDOW or EGL_BAD_CURRENT_SURFACE is the correct error.
Cc: mesa-stable@lists.freedesktop.org
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We can just use the new CHVLineWidth field rather than an entirely
different generation's packing function.
v2: Inline the function (requested by Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We just add another field to gen8.xml for the Cherryview line width,
rather than trying to replicate the gymnastics done in the Vulkan
driver to use gen9 SF pack functions.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Picked from a different branch. When we stop using the scratch patching,
this function will not be called.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
LLVM 3.8:
- had broken indirect resource indexing
- didn't have scratch coalescing
- was the last user of problematic v16i8
- only supported OpenGL 4.1
This leaves us with LLVM 3.9 and LLVM 4.0 support for Mesa 17.2.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
There is no reason to advertise transfer ability for formats we can't
use for anything else. This stops some CTS tests hitting internal
error for 64-bit types when they see the transfer flags.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
To simplify things for now, since all the gfx shader stages share a
single SSBO state block, only advertise SSBO support for fragment shader
(and compute when we have that). We could possibly use a fixed-
partitioning of the SSBO index space to support SSBOs on other stages
without having to resort to shader variants.
Signed-off-by: Rob Clark <robdclark@gmail.com>
TODO cwabbott pointed out a write-after-read hazzard, which effects both
this and arrays. A write needs to depend on *all* reads since the last
write, not just the last read.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This is equivalent to what mesa/st does in glsl_to_tgsi. For most hw
there isn't a particularly good reason to treat these differently.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Replace all instances of mapi_table with the actual struct _glapi_table.
The former may have been needed when the OpenVG was around. But since
that one is long gone, there' no point in having the current confusing
mix of the two.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Currently we would generate a partial one as we do non-shared glapi.
At the same time since it's local, we don't care that much if we have a
few extra bytes of space in the table.
Drop the guard, which allows us to simplify both build system and code.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The code itself has nothing to do with shared glapi, thus having it
behind GLX_SHARED_GLAPI is misleading. Use GLX_INDIRECT_RENDERING
instead.
The latter macro is set at global scope by the Autotools and Scons build
systems.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Always true, since the dri modules required shared glapi.
With earlier commit (da410e6afa "configure: explicitly require shared
glapi for enable-dri") we even made that explicit during the configure
stage.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Always true, since the dri modules required shared glapi.
With earlier commit (da410e6afa "configure: explicitly require shared
glapi for enable-dri") we even made that explicit during the configure
stage.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
In the early days of Xorg and Mesa we had multiple providers of the
GLAPI. All of those were the ones responsible for dlopening the DRI
module. Hence it was perfectly fine, and actually expected, for the DRI
modules to have unresolved symbols.
Since then we've moved the API to a separate shared library and no other
libraries provide the symbols.
Here comes the picky part:
It's possible that one uses old Xorg (where libglx.so provides the
GLAPI) and new Mesa (with DRI modules linking against libglapi.so).
That should still work, since the the libglx.so symbols will take
precedence over the libglapi.so ones.
I've verified this while running 1.14 series Xorg alongside this (and
next) patch.
It may seem a bit fragile, but that's of reasonably OK since all of the
affected Xorg versions have been EOL for years.
The final one being the 1.14 series, which saw its final bug fix release
1.14.7 in June 2014.
To ensure that the binaries do not have unresolved symbols add
-no-undefined and $(LD_NO_UNDEFINED), just like we do everywhere else
throughout mesa.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98428
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The function cannot return NULL, update the comment accordingly.
Fixes: b546c9d ("anv: anv_gem_mmap() returns MAP_FAILED as mapping error")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This way we'll get an implicit zero initialization of the remaining
members, as required by dri2_add_config().
Fixes: e5efaeb85c ("egl: polish dri2_to_egl_attribute_map[]")
Cc: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Thus we can use the value to explicitly size arrays, instead of
__DRI_ATTRIB_FRAMEBUFFER_SRGB_CAPABLE + 1.
The latter seems magical and is error prone, as we add more dri
attributes.
v2: Fix off by one error (Tomasz)
Cc: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Drop the misleading "will not match the one returned by thread_create"
hunk and provide more clarity as to what/why GetCurrentThread() isn't
the solution we're looking for.
v2: Places brackets after function names (Eric)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Introduce _egl_display::Options::Platforms for private storage.
For X11 platforms we can use it for the screen number as set by
EGL_PLATFORM_X11_SCREEN_EXT.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
According to the spec we get VK_ERROR_OUT_OF_HOST_MEMORY or
VK_ERROR_OUT_OF_DEVICE_MEMORY on vkBindImageMemory failure.
Fixes returned value changed by b546c9d.
Fixes: b546c9d ("anv: anv_gem_mmap() returns MAP_FAILED as mapping error")
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The spec allows memory qualifiers to be used with image variables,
buffers variables and shader storage blocks. This patch also fixes
validate_memory_qualifier_for_type().
Fixes the following ARB_uniform_buffer_object test:
uniform-block-memory-qualifier.frag
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Including structures, interfaces and uniform blocks.
Fixes the following ARB_shader_image_load_store test:
format-layout-with-non-image-type.frag
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It makes more sense to have two separate validate functions,
mainly because memory qualifiers are allowed with members of
shader storage blocks.
validate_memory_qualifier_for_type() will be fixed in a
separate patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It doesn't make sense to prefix them with 'image' because
they are called "Memory Qualifiers" and they can be applied
to members of storage buffer blocks.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Take it into account when checking if the mapping failed.
v2:
- Remove map == NULL and its related comment (Emil)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixes: 6f3e3c715a ("vk/allocator: Add a BO pool")
Fixes: 9919a2d34d ("anv/image: Memset hiz surfaces to 0 when binding memory")
Cc: "17.0 17.1" <mesa-stable@lists.freedesktop.org>
The matrix used for YCbCr to RGB is listed in:
https://en.wikipedia.org/wiki/YCbCr
There was an error in converting the offsets from integers to unorm
values: 0.0625=16/256 should be 16.0/255,and 0.5=128.0/256 should be
128.0/255. With this fix, the CSC result is bit aligned with wikipedia's
conversion result and FFMPeg's result.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100854
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
On this patch, we port:
- brw_polygon_stipple
- brw_polygon_stipple_offset
- brw_line_stipple
- brw_drawing_rect
v2:
- Also emit states for gen4-5 with this code.
v3:
- Style fixes and remove excessive checks (Ken).
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Some code that was placed in brw_draw_upload.c and exported to be used
by gen8+ was also moved to genX_state_upload, and the respective symbols
are not exported anymore.
v2:
- Remove code from brw_draw_upload too
- Emit vertices for gen4-5 too.
- Use helper to setup brw_address (Kristian)
- Use macros for MOCS values.
- Do not use #ifndef NDEBUG on code that is actually used (Ken)
v3:
- Style and code clenup (Ken)
- Keep some of the common code inside brw_draw_upload.c (Ken)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The following states are ported on this patch:
- gen6_gs_push_constants
- gen6_vs_push_constants
- gen6_wm_push_constants
- gen7_tes_push_constants
v2:
- Use helper to setup brw_address (Kristian)
v3:
- Do not use macro for upload_constant_state (Ken)
- Do not re-declare MOCS macro (Ken)
v4: (by Ken)
- Drop more dead code, change brw->gen checks to GEN_GEN, style nits
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Emit 3DSTATE_SCISSOR_STATE_POINTERS using brw_batch_emit, and pack the
scissor states using GENX(SCISSOR_RECT_pack), generated from genxml.
v3:
- Remove old code (Ken)
- Style fixes (Ken)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Emit 3DSTATE_VS on Gen6+ using brw_batch_emit helper, that uses pack
structs from genxml.
v2:
- Use render_bo helper to setup brw_address (Kristian)
v3:
- Bring back some comments for gen6 and remove _NEW_TRANSFORM blocks
from gen7+.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Emit 3DSTATE_PS_EXTRA on Gen8+ using brw_batch_emit helper, that uses
pack structs from genxml.
v3:
- Style fixes and moving code around to be cleaner (Ken)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Emit 3DSTATE_WM on Gen6+ using brw_batch_emit helper, that uses pack
structs from genxml.
v2:
- Use render_bo helper to setup brw_address (Kristian)
- Remove TODO and use BRW_PSCDEPTH_OFF.
v3:
- A couple of style fixes (Ken)
- Enable RASTRULE_UPPER_RIGHT on gen6+ instead of gen8+ (Ken)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Emit 3DSTATE_PS on Gen7+ using brw_batch_emit helper, that uses pack
structs from genxml.
v2:
- Use render_bo helper to setup brw_address (Kristian)
v3:
- Style fixes and code cleanup (Ken)
v4:
- More style fixes and code cleanup missed in v3
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Emit sf state on Gen6+ using brw_batch_emit helper, using pack structs
from genxml.
v3:
- Reorganize code and reduce #if/#endif's (Ken)
- Style fixes (Ken)
- Always set AALINEDISTANCE_TRUE (Ken)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
That helper function returns the line width as a float, and is then used
by brw_get_line_width to return the fixed point width.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Emit clip state on Gen6+ using brw_batch_emit helper, using pack structs
from genxml.
v3:
- Lots style fixes (Ken)
- Do not set CullTestEnableBitMask on Gen8+ (Ken)
v4:
- Do not include brw_defines_common.h.
v5 (Ken): s/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This emits 3DSTATE_WM_DEPTH_STENCIL on Gen8+ or DEPTH_STENCIL_STATE
(and the relevant pointer packets) on Gen6-7.5 from a single function.
v3:
- Watch for BRW_NEW_BATCH too on gen < 8 (Ken)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Make atoms initalization compile conditionally based on the target
platform.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
The Ironlake documentation is terrible, so it's unclear whether or not
this field exists there. It definitely doesn't exist on Sandybridge
and later. It definitely does exist on G45.
We haven't been setting it for our normal vertex attributes - just
the SGVs (VertexID, InstanceID, BaseVertex, BaseInstance, DrawID).
We should be consistent. My guess is that it isn't necessary and
doesn't exist - this patch drops it from the SGVs elements, making
them follow the behavior of most attributes.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
This moves the structs from the data segment to the rodata segment,
which seems like the more correct place for them.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This will allows us to create no error versions of functions
noted by a _no_error suffix. We also need to set a no_error
attribute equal to "true" in the xml.
V3: stop the no_error attribute being overwritten when functions
alias another.
V2: tidy up suggested by Nicolai.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Setting both offset to 0x20 and flat shade results in passthrough
mode instead of the constant.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: f205e19e4f "radv/ac: eliminate unused vertex shader outputs. (v2)"
These macros are defined in brw_defines.h, which contains a lot of
macros that conflict with autogenerated code from genxml. But we need to
use them (the MOCS macros) in some of that same genxml code.
Moving them to brw_context.h solves that problem and we don't have to
include brw_defines.h.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In a previous patch some enums were split out from brw_eu_defines.h, so
they could be used by genxml based code. anv can also benefit from this.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These enums live inside struct brw_wm_prog_data, so it makes sense to
keep them in the same header. It also allows to use them without
including brw_eu_defines.h.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the PRM, Line Stipple Inverse Repeat Count is on dw2, bits 31:16,
format U1.13.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Name the options to "Pixel Location":
- PIXLOC_CENTER -> CENTER
- PIXLOC_UL_CORNER -> UL_CORNER
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes genxml create the right struct types, and generate the right
batch commands.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Both GS and SOL have these fields. Some were ReorderEnable = true,
some were ReorderMode = REORDER_TRAILING, and some were just TRAILING.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Use an alias, so we can set the same value as the #define's.
v3:
- Call it "SO Buffer MOCS" to follow the most common naming scheme.
- Add alias for gen7 and gen75 too (Ken).
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There are two variants:
- Clip Enable
- CLIP Enable (on gen6)
Rename everything to Clip Enable.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add some more details to Gen4 and Gen45 and add what is needed
in Gen5 XML. This commit overwrite the previous work done on Gen4
and Gen45 as it contains more instructions and fixes some mistakes.
However, comments (dword boundaries) are lost in the process.
v3:
- Set the type of some fields, instead of prefix. Also fix the
SAMPLER_BORDER_COLOR_STATE fields of gen5.xml.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This implementation allocates a 4k BO for each semaphore that can be
exported using OPAQUE_FD and uses the kernel's already-existing
synchronization mechanism on BOs.
Reviewed-by: Chad Versace <chadversary@chromium.org>
This just stubs things out. Real external semaphore support will come
with VK_KHX_external_semaphore_fd.
Reviewed-by: Chad Versace <chadversary@chromium.org>
After successful drmGetDevices2() call, drmFreeDevices() needs to be called.
Fixes: 743315f2 "radv: do not open random render node(s)"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
drmGetDevices2 takes count and not size. Probably hasn't caused problems
yet in practice and was missed as setups with more than 8 DRM devices
are not very common.
Fixes: 743315f2 "radv: do not open random render node(s)"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2 (Jason Ekstrand):
- Take a view_mask rather than a whole subpass
- Build the view mask into the VS shader key
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We want to insert more lowering code that may insert system values and
we need to gather info after that lowering.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Shader hashing is very closely related to shader compilation. Putting
them right next to each other in anv_pipeline makes it easier to verify
that we're actually hashing everything we need to be hashing. The only
real change (other than the order of hashing) is that we now hash in the
shader stage.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We always use only single element.
v2: Change single element arrays to variables
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The regioning parameters are now properly set by convert_to_hw_regs()
and we don't need to fix them in the generator. That latter fix
previously done in the generator was strictly speaking wrong for any
non-identity regions.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On gen7, the swizzles used in DF align16 instructions works for element
size of 32 bits, so we can address only 2 consecutive DFs. As we assumed that
in the rest of the code and prepare the instructions for this (scalarize_df()),
we need to set it to two again.
However, for DF align1 instructions, a width of 2 is wrong as we are not
reading the data we want. For example, an uniform would have a region of
<0, 2, 1> so it would repeat the first 2 DFs, when we wanted to access
to the first 4.
This patch sets the default one to 4 and then modifies the width of
align16 instruction's DF sources when we translate the logical swizzle
to the physical one.
v2:
- Remove conditional (Curro).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
From IVB PRM, vol4, part3, "General Restrictions on Regioning
Parameters":
"If ExecSize = Width and HorzStride ≠ 0, VertStride must
be set to Width * HorzStride."
In next patch, we are going to modify the region parameter for
uniforms and vgrf. For uniforms that are the source of
DF align1 instructions, they will have <0, 4, 1> regioning and
the execsize for those instructions will be 4, so they will break
the regioning rule. This will be the same for VGRF sources where
we use the vstride == 0 exploit.
As we know we are not going to cross the GRF boundary with that
execsize and parameters (not even with the exploit), we just fix
the vstride here.
v2:
- Move is_align1_df() (Curro)
- Refactor exec_size == width calculation (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This fixes:
dEQP-VK.glsl.builtin.precision.min.*
dEQP-VK.glsl.builtin.precision.max.*
dEQP-VK.glsl.builtin.precision.clamp.*
The problem is the hw doesn't compare denorms properly,
so we have to flush them, even though the spec says
flushing is optional, if you don't flush the results
should be correct.
The -pro driver changes the shader float mode,
it would be nice if llvm could grow that perhaps.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
SPIR-V defines the f32->f16 operation as flushing denormals to 0,
this compares the class using amd class opcode.
Thanks to Matt Arsenault for figuring it out.
This fix is VI+ only, add a TODO for SI/CIK.
This fixes:
dEQP-VK.spirv_assembly.instruction.compute.opquantize.flush_to_zero
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Having it in the winsys didn't work when multiple devices use
the same winsys, as we then have multiple contexts per queue,
and each context counts separately.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 7b9963a28f "radv: Enable userspace fence checking."
Loop unroll asserts if it hits a sub, we don't really want
to lower subs as llvm handles these things, but do this for
now, until we can fix loop unroll to work with subs.
Fixes: 14ae0bfa5 (radv: Add NIR loop unrolling)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
All of the dynamic states apply to rasterization & fragment processing,
so we don't need to set them if we don't rasterize.
We don't clear the dirty flags for them though, so we don't miss any
updates for the next pipeline with rasterization.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 76603aa90b "radv: Drop the default viewport when 0 viewports are given."
This still doesn't give us complete pWaitDstStageMask support,
but it should provide enough to be correct if not as efficent as
possible.
If we have wait semaphores we must flush between submits and
flush the shaders as well.
This fixes the remaining fails in:
dEQP-VK.synchronization.op.single_queue.semaphore.*ssbo*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I don't see any reasons why vector_elements is 1 for images and
0 for samplers. This increases consistency and allows to clean
up some code a bit.
This will also help for ARB_bindless_texture.
No piglit regressions with RadeonSI.
This time the Intel CI system doesn't report any failures.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will allow Raspbian's ARMv6 builds to take advantage of the new NEON
code, and could prevent problems if vc4 ends up getting used on a v7 CPU
without NEON.
v2: Drop dead NEON_SUFFIX (noted by Erik Faye-Lund)
Android.mk was setting the flag across the entire driver, so we didn't
have non-NEON versions getting built. This was going to be a problem with
the next commit, when I start auto-detecting NEON support and use the
non-NEON version when appropriate.
Reviewed-by: Rob Herring <robh@kernel.org>
I wrote this code with reference to pixman, though I've only decided to
cover Linux (what I'm testing) and Android (seems obvious enough). Linux
has getauxval() as a cleaner interface to the /proc entry, but it's more
glibc-specific and I didn't want to add detection for that.
This will be used to enable NEON at runtime on ARMv6 builds of vc4.
v2: Actually initialize the temp vars in the Android path (noticed by
daniels)
v3: Actually pull in the cpufeatures library (change by robher).
Use O_CLOEXEC. Break out of the loop when we find our feature.
v4: Drop VFP code, which was confused about what it was detecting and not
actually used yet.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
If we are clearing stencil only, we still need to provide a
a valid Z output from the vertex shader, we can't rely
on the depth clear value having any meaning, as we use this
for the position output, and it could get clipped, so we
don't end up clearing anything.
Fixes:
dEQP-VK.renderpass.simple.stencil
since I added S8 support.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The renderonly_scanout holds a reference on its prime pipe resource,
which should be released when it is destroyed. If it was created by
renderonly_create_kms_dumb_buffer_for_resource, the dumb BO also has
to be destroyed.
Fixes: 848b49b288 ("gallium: add renderonly library")
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This ports
0fcb92c17d
anv: wsi: report presentation error per image request
This fixes:
dEQP-VK.wsi.xlib.incremental_present.scale_none.*
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We would be storing this info twice per image, no need to,
remove it from the surface struct.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When we were assembling the unsigned 64-bit query return from its
two signed 32-bit component parts, the lower half was getting
sign-extended into the top half. Be more explicit about what we want to
do.
Fixes gbm_bo_get_modifier() returning ((1 << 64) - 1) rather than
((1 << 56) - 1), i.e. DRM_FORMAT_MOD_INVALID.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
NIR now validates that SSA references use the same number of channels as
are in the SSA value.
v2: Reword commit message, since the commit didn't land before the
validation change did.
Fixes: 370d68babc ("nir/validate: Validate that bit sizes and components always match")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Cc: <mesa-stable@lists.freedesktop.org>
Set the bit in the same stage as the timestamp, instead always at top of pipe.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
The Android native fence in i965 has two fds: _EGLSync::SyncFd and
brw_fence::sync_fd.
The semantics of __DRI2fenceExtensionRec::create_fence_fd are unclear on
whether the DRI driver takes ownership of the incoming fd (which is the
same incoming fd from eglCreateSync). i965 did take ownership, but all
other Mesa drivers do not; instead, they dup the incoming fd. As
a result, _EGLSync::SyncFd and brw_fence::sync_fd were the same fd, and
both egl_dri2 and i965 believed they owned it. On eglDestroySync, that
led to a double-close.
Fix the double-close by making brw_dri_create_fence_fd dup the incoming
fd, just like the other drivers do.
Signed-off-by: Randy Xu <randy.xu@intel.com>
Test: Run Vulkan and GLES stress test and no crash.
Fixes: 6403e37651 ("i965/sync: Implement fences based on Linux sync_file")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
[chadv: Polish the commit message]
Cc: mesa-stable@lists.freedesktop.org
NEON is sufficiently different on arm64 that we can't just reuse this
code. Disable it on arm64 for now.
v2: Use PIPE_ARCH_ARM instead, as __ARM_ARCH may be 8 for a 32-bit build
for a v8 CPU.
Signed-off-by: Eric Anholt <eric@anholt.net>
Cc: <mesa-stable@lists.freedesktop.org>
Samplers are encoded into the instruction word, so there's no need to
make space in the uniform file.
Previously matrix_columns and vector_elements were set to 0, making this
else case a no-op. Commit 75a31a20af changed that, causing malloc
corruption in thousands of tests on i965.
Fixes: 75a31a20af ("glsl: set vector_elements to 1 for samplers")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100871
Considering we cannot make dummy_thread a constant we might as well,
initialise by the same function that handles the actual thread info.
This way we don't need to worry about mismatch between the initialiser
and initialising function.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Annotate the array as static const and use C99 initialiser to populate
it.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
val_bool and val_int are in a union. val_bool gets the first byte, which
happens to work on LE when setting via the int, but breaks on BE. By
setting the value properly, we are able to use DRI3 on BE architectures.
Tested by running glxgears with a NV34 in a G5 PPC.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
[Emil Velikov: squash the vmwgfx hunk]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Add a page that has information which release is expected when and
associated information.
Reference to it from the "Releasing process" and "Release notes" pages.
v2:
- Add Andres for 17.0.5
- Rework table format to include the branch (Eric)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The instance should have 2 cores, yet bumping the jobs to 4 should give
us a minor speed improvement.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Split into OpenCL and others, since the former is quite time consuming.
v2:
- explicitly enable/disable components
- build libvdpau 1.1 requirement
- enable st/vdpau
- build libva 1.6.2 (API 0.38) requirement
v3: Drop ubuntu-toolchain-r-test from sources (Andres)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Split the target to allow faster builds for each run.
The overall build time will be more, yet Travis runs multiple builds in
parallel so we're limited by the slowest one.
Things are split roughly as:
- DRI loaders, classic DRI drivers, classic OSMesa, make check
- All Gallium drivers (minus the SWR) alongside st/dri (mesa)
- The Vulkan drivers - ANV and RADV, make check (anv)
v2:
- rework RUN_CHECK to MAKE_CHECK_COMMAND
- explicitly disable DRI loaders
- generate linux/memfd.h locally and enable ANV
- add libedit-dev
v3: Use printf to create the header (Andres).
v4: Really add the libedit + printf hunks.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
The former does not require any LLVM, while the latter uses LLVM 3.3.
This way we'll quickly catch any LLVM 3.3+ functionality that gets
introduced where it shouldn't.
Add the full list of addons for each build permutation.
v2: Keep libedit-dev, rework check target.
v3: Comment the current check target, add -j4 SCONSFLAGS
v4:
- Remove llvm-toolchain-trusty-3.3 source (Andres)
- Keep check target as-is (Andres)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
With next commits we'll add a couple of more options.
v2: Rework check target.
v3: Comment the current check target, add -j4 SCONSFLAGS
v4: Keep check target as-is, will rework with later patch.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Split the "if test" blocks so that we get more sensible output in case
of a failure.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
We effectively override libdrm-dev and libxcb-dri2-0-dev since we build
and install the package locally.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
According to the manual
"If you are using ccache, use:
language: c # or other C/C++ variants
cache: ccache
to cache $HOME/.ccache and automatically add /usr/lib/ccache to your
$PATH."
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Provides a small, but consistent improvement.
Example numbers of the jobs added later in the series.
"make loaders/classic DRI" - 1s
"scons SWR" - 6s
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
The txc-dxtn library implements the patented S3 Texture Compression
algorithm.
By default it won't be used but we add the possibility of setting the
USE_TXC_DXTN variable to yes in the travis web UI so it will be
installed and used for the scons tests.
Cc: Eric Anholt <eric@anholt.net>
Cc: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
[Emil Velikov: keep the LIB prefix, drop the LD_LIBRARY_PATH, fold URL]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Trusty's LLVM toochain repository was whitelisted some time ago. See:
479067c5e7
Signed-off-by: Andres Gomez <agomez@igalia.com>
[Emil Velikov]
- set sudo to false
- reference the Trusty change (Rhys)
- keep libedit-dev
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Some of the libraries may be dlopened, which may not always work due to
the non-standard prefix that we're using.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
We do not need to restrict WGL_BIND_TO_TEXTURE_RGB_ARB to
RGB visuals only. It can be supported with RGBA visuals as well.
This fixes the early exit of cinebench-r15-test trace.
Tested with cinebench-r15, piglit, glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
If texture is imported and templ format is sRGB, use compatible sRGB format
to the imported texture format while creating surface view.
tested with MTT piglit, glretrace, viewperf and conform
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This function will return compatible svga srgb format for corresponding
linear format
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This patch will allow driver to choose srgb capable FBconfig
if GLX_FRAMEBUFFER_SRGB_CAPABLE_ARB attribute is 1
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
dri3 is a bit sloppy about its format compatibility requirements, so add
a possibility to import xrgb surfaces as argb textures and vice versa.
At the same time, make the svga_texture_from_handle() function a bit more
readable and fix the error path where we leaked a winsys surface.
v2: Addressed review comments by Brian.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Small fetch performance optimization - use gather instruction
for odd format fetch instead of slow emulated code.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Misplaced #endif preventing depth and stencil hot tile pointers
from incrementing in SIMD16 8x2 configuration of BackendPixelRate.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Frontend - reduce simdvertex/simd16vertex stack usage for VS output in
ProcessDraw, fixes stack overflow in some of the deeper call stacks under
SIMD16.
1. Move the vertex store out of PA_FACTORY, and off the stack
2. Allocate the vertex store out of the aligned heap (pointer is
temporarily stored in TLS, but will be migrated to thread pool
along with other frontend temporary buffers).
3. Grow the vertex store as necessary for the number of verts per
primitive, in chunks of 8/4 simdvertex/simd16vertex
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Disabling buffer overrun warning for Assemble(uint32_t slot,
simdvector *verts) due to what looks like a MSVC compiler bug
when compiling the SIMD16 FE.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Ability to allocate space for an arbitrary number (at compile time)
of positions in the vertex layout.
Removes KNOB_NUM_ATTRIBUTES from knobs.h, replaces the VTX slot
number #defines with the SWR_VTX_SLOTS enum (which contains
replacement for NUM_ATTRIBUTES: SWR_VTX_NUM_SLOTS)
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
We already have BRW_NEW_BATCH, which completely covers all the cases
that BRW_NEW_CONTEXT would handle. Drop it.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Gen4-5 and Gen8+ already set this, but Gen6-7.5 did not. We ought to
be consistent - the answer depends on the API, not the hardware generation.
The Sandybridge PRM says about RASTRULE_UPPER_RIGHT:
"To match OpenGL point rasterization rules (round to +infinity, where
this is the upper right direction wrt OpenGL screen origin of lower
left).
So this is likely the one we should use.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We set this unconditionally on every other platform. Zero (Manhattan)
isn't even listed as an option in the Sandybridge docs - only "true".
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The original Broadwater and Crestline platforms computed antialiased
line distances using "manhattan" distance, aka a + b = c. Eaglelake
and Cantiga added "true" distance, which apparently does something
like max(a, b) + min(a, b) / 4. Not exactly "true", but at least
more accurate.
The G45 documentation indicates that the old manhattan distance setting
is "only for debug purposes" and should never be used. The Ironlake
documentation no longer mentions AALINEDISTANCE_MANHATTAN, though it
does still contain the narrative about the feature.
At any rate, we should use the more accurate mode.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Basically, don't load GRID_SIZE or BLOCK_SIZE if they are unused, determine
whether to load BLOCK_ID for each component separately, and set the number
of THREAD_ID VGPRs to load. Now we should get the maximum CS launch wave
rate in most cases.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
LLVM 5.0 removes s_barrier instructions if the max-work-group-size
attribute is not set. What a surprise.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This removes s_load_dword latency for tess rings.
We need just 1 SGPR for the address if we use 64K alignment. The final asm
for recreating the descriptor is:
// s2 is (address >> 16)
s_mov_b32 s3, 0
s_lshl_b64 s[4:5], s[2:3], 16
s_mov_b32 s6, -1
s_mov_b32 s7, 0x27fac
v2: bitcast the descriptor type from v2i64 to v4i32
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The use of PrimID in the pixel shader is too rare to deserve such
a sizable support code.
The initial idea of the VS epilog was to move the clipping code there and
remove it based on states, but optimized variants are now used to do that
and are easier to support, so the VS epilog has turned out to be not so
useful.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tentatively enable it, expecting the scratch buffer support to be done before
the next Mesa release.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The 2nd shader of merged shaders should take a reference of the 1st shader.
The next commit will do that.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We could also remove index_bounds_valid and use max_index != ~0 instead.
Opinions on that are welcome.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This reverts commit 24011ead71.
This causes lots of ES 3.1 CTS tests to fail to compile a bit of code
like:
layout(binding = 0) buffer InOut
{
highp uint inputValues[384];
highp uint outputValues[384];
coherent highp uint groupValues[64]; <-----
} sb_inout;
error: memory qualifiers may only be applied to images
The VMware driver has a limited set of integer texture formats. We
often have to fall back to 4-component formats when 1- or 2-component
formats are missing.
This fixes about 8 integer texture Piglit tests with the VMware driver
on Linux. We've had this code in-house for a long time but I guess it
was never up-streamed to Mesa master.
This shouldn't regress any other drivers since we're either choosing
an earlier format in the list, or failing anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Return as soon as we find an existing color channel that's enabled for
writing. Typically, this allows us to return true on the first loop
iteration intead of doing four iterations.
No piglit regressions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Drop it from x11_anv_wsi_image_create and x11_anv_wsi_image_free. The
functions are used by Wayland WSI too.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
stfb->iface is always non-NULL for an st_framebuffer. These checks
were incorrect, relying on out-of-bounds memory access in the
surface-less case of EGL_KHR_surfaceless_context.
v2: remove redundant stread check (Marek)
Reviewed-by: Marek Olšák <marek@olsak@amd.com> (v2)
The incomplete framebuffer is set for a surfaceless context. This leads to
the following error in piglit spec@egl_khr_surfaceless_context@viewport:
==26703==ERROR: AddressSanitizer: global-buffer-overflow on address 0x7f6886e43240 at pc 0x7f68854db0fd bp 0x7ffca404b3b0 sp 0x7ffca404b3a0
READ of size 8 at 0x7f6886e43240 thread T0
#0 0x7f68854db0fc in st_viewport ../../../mesa-src/src/mesa/state_tracker/st_cb_viewport.c:57
#1 0x556840176cdb in main tests/egl/spec/egl_khr_surfaceless_context/viewport.c:101
#2 0x7f688edcf3f0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x203f0)
#3 0x556840176e19 in _start (/home/nha/amd/piglit/bin/egl-surfaceless-context-viewport+0xe19)
0x7f6886e43240 is located 32 bytes to the left of global variable 'DummyRenderbuffer' defined in '../../../mesa-src/src/mesa/main/fbobject.c:69:31' (0x7f6886e43260) of size 112
0x7f6886e43240 is located 8 bytes to the right of global variable 'IncompleteFramebuffer' defined in '../../../mesa-src/src/mesa/main/fbobject.c:73:30' (0x7f6886e42de0) of size 1112
SUMMARY: AddressSanitizer: global-buffer-overflow ../../../mesa-src/src/mesa/state_tracker/st_cb_viewport.c:57 in st_viewport
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek@olsak@amd.com>
These operations are currently implemented as IR expressions. However,
they cannot be transformed and moved in the way that other IR
expressions can because they have non-trivial interactions with
control-flow.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes the following ARB_shader_image_load_store tests:
format-layout-with-non-image-type.frag
memory-qualifier-with-non-image-type.frag
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The more I think about it the more this seems like a bad idea.
When we were deleting old cache dirs this wasn't so bad as it
was unlikely we would ever hit the actual limit before things
were cleaned up. Now that we only start cleaning up old cache
items once the limit is reached the a percentage based max
cache limit is more risky.
For the inital release of shader cache I think its better to
stick to a more conservative cache limit, at least until we
have some way of cleaning up the cache more aggressively.
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This commit just exposes the memory handle type. There's interesting we
need to do here for images. So long as the user doesn't set any crazy
environment variables such as INTEL_DEBUG=nohiz, all of the compression
formats etc. should "just work" at least for opaque handle types.
v2 (chadv):
- Rebase.
- Fix vkGetPhysicalDeviceImageFormatProperties2KHR when
handleType == 0.
- Move handleType-independency comments out of handleType-switch, in
vkGetPhysicalDeviceExternalBufferPropertiesKHX. Reduces diff in
future dma_buf patches.
Co-authored-with: Chad Versace <chadversary@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This cache allows us to easily ensure that we have a unique anv_bo for
each gem handle. We'll need this in order to support multiple-import of
memory objects and semaphores.
v2 (Jason Ekstrand):
- Reject BO imports if the size doesn't match the prime fd size as
reported by lseek().
Reviewed-by: Chad Versace <chadversary@chromium.org>
This is the trivial implementation that just exposes the extension
string but exposes zero external handle types.
Reviewed-by: Chad Versace <chadversary@chromium.org>
This is a complete but trivial implementation. It's trivial becasue We
support no external memory capabilities yet. Most of the real work in
this commit is in reworking the UUIDs advertised by the driver.
v2 (chadv):
- Fix chain traversal in vkGetPhysicalDeviceImageFormatProperties2KHR.
Extract VkPhysicalDeviceExternalImageFormatInfoKHX from the chain of
input structs, not the chain of output structs.
- In vkGetPhysicalDeviceImageFormatProperties2KHR, iterate over the
input chain and the output chain separately. Reduces diff in future
dma_buf patches.
Co-authored-with: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We're about to have more UUIDs for different things so this one really
needs to be properly labeled.
Reviewed-by: Chad Versace <chadversary@chromium.org>
The command is really operating on a Queue not a command buffer and the
nearest object to that with an allocator is VkDevice.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: "17.0 17.1" <mesa-dev@lists.freedesktop.org>
I don't see any reasons why vector_elements is 1 for images and
0 for samplers. This increases consistency and allows to clean
up some code a bit.
This will also help for ARB_bindless_texture.
No piglit regressions with RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The majority of cache files are less than 1kb this resulted in us
greatly miscalculating the amount of disk space used by the cache.
Using the number of blocks allocated to the file is more
conservative and less likely to cause issues.
This change will result in cache sizes being miscalculated further
until old items added with the previous calculation have all been
removed. However I don't see anyway around that, the previous
patch should help limit that problem.
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Modern disks are extremely large and are only going to get bigger.
Usage has shown frequent Mesa upgrades can result in the cache
growing very fast i.e. wasting a lot of disk space unnecessarily.
5% seems like a more reasonable default.
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
This assert wasn't in the original radeonsi code but I added
it without totally understanding the original code, it caused
some regressions in variable-indexing tessellation shaders.
Fixes: e2659176 radeonsi/ac: move vertex export remove to common code.
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is ported from radeonsi, and I can see at least one
Talos shader drops an export due to this, and saves some
VGPR usage.
v2: use shared code.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since the host pool changes,
Fixes:
dEQP-VK.api.descriptor_pool.out_of_pool_memory
Fixes: 126d5ad "radv: Use host memory pool for non-freeable descriptors."
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Varying types have already been validated in
apply_type_qualifier_to_variable() by this point.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Here get_scalar_type() was just being use to remove the array
after that we converted it back to base_type anyway so just
use the without_array() helper.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
When we ran Viewperf11's Maya-03 test 3 we saw warnings about flushing
the command buffer with mapped buffers. This happened when transitioning
from hardware rendering to a 'draw' fallback path.
The problem is the util_set_vertex_buffers_count() function doesn't do
exactly what we want in svga_hwtnl_vertex_buffers(). In a case such as
dst_count=2, dst={bufA, bufB}, count=1 and src={bufC}, when the function
returns we'll have dst_count=2 and dst={bufC, bufB}. What we really want
is dst_count=1 and dst={bufC, NULL}. As it was, we were telling the svga
device that there were two vertex buffers when in fact we really only
needed one for the subsequent drawing command.
In this particular case, we first did hardware drawing with {bufA, bufB}
then we transitioned to the 'draw' module, consuming vertex data from
bufA and bufB and writing the new vertex data to bufC. bufA and bufB are
mapped for reading when we flush the command buffer but should not be
referenced by the command buffer. The above change fixes that.
No Piglit regressions. Also tested with Viewperf, Google Earth, Heaven,
etc.
VMware bug 1842059
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We only need to construct the debug message if the mapped_sync flag is set.
This should make the function faster since the flag is usually false.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Instead of directly sending the InvalidateGBSurface command,
this patch uses the invalidate_surface interface.
Fixes Linux VM piglit failures including
ext_texture_array-gen-mipmap, fbo-generatemipmap-array S3TC_DXT1
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch revises the fix in commit 606f13afa31c9f041a68eb22cc32112ce813f944
to properly translate the surface format for screen target.
Instead of changing the svga format for PIPE_FORMAT_B5G6R5_UNORM
to SVGA3D_R5G6B5 for all texture surfaces, this patch only restricts
SVGA3D_R5G6B5 for screen target surfaces. This avoids rendering
failures when specify a non-vgpu10 format in a vgpu10 context with
software renderer.
Fixes piglit failures spec@!opengl 1.1@draw-pixels,
spec@!opengl 1.1@teximage-colors gl_r3_g3_b2
spec@!opengl 1.1@texwrap formats
Tested Xorg with 16bits depth.
Also tested with MTT piglit, MTT glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
CinebenchR15 not only binds the same texture for rendering and sampling,
it actually changes the framebuffer buffer attachment very often, causing
a lot of backed surface view to be created and a lot of surface copies
to be done. This patch caches the backed surface handle
in the texture resource and allows the backed surface view to
reuse the backed surface handle. With this patch, the number of
backed surface view reduces from 1312 to 3. Unfortunately, this
does not eliminate all the surface copies. There are still surface
copies involved when we switch from original to backed surface handle
for rendering.
Tested with CinebenchR15, NobelClinicianViewer, Turbine, Lightsmark2008,
MTT glretrace, MTT piglit.
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch adds a timestamp in svga_surface structure to keep track
of when the backing surface is last sync with the original resource.
This helps to avoid unnecessary surface copy from the original
resource to the backing surface if the original resource has not
since been modified.
This reduces the amount of surface copy with CinebenchR15.
Tested with CinebenchR15, mtt glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
For VGPU10, we will render to a backed surface view when
the same resource is used for rendering and sampling.
In this case, we will mark the dirty bit for the backed surface view.
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch moves the rendertarget view related fields from
svga_hw_draw_state to svga_hw_clear_state where all the hw
framebuffer related state resides.
Reviewed-by: Brian Paul <brianp@vmware.com>
Instead of setting the rendered_to flags at set time, this patch
moves the setting of the flags to framebuffer emit time.
Reviewed-by: Brian Paul <brianp@vmware.com>
The debug output in svga_create_sampler_state() was controlled by
DEBUG_VIEWS but that's not consistent with the other debug output for
sampler views. Create/use a new debug flag just for this.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Tested by verifying 3D acceleration works with HWv8 but not earlier.
For HWv7 and older we get the GDI Generic renderer.
Reviewed-by: Neha Bhende<bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
If for some reason kernel is not able to create surface,
when no buffer was provided the function
vmw_svga_winsys_surface_create should return NULL.
This patch fixes the issue where the code was not following the
clean up path in case of error, which used to cause SIGSEGV.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
shader-db results on GK106 (Thanks Karol):
total instructions in shared programs : 3931608 -> 3929463 (-0.05%)
total gprs used in shared programs : 481255 -> 479014 (-0.47%)
total local used in shared programs : 27481 -> 27381 (-0.36%)
total bytes used in shared programs : 36031256 -> 36011120 (-0.06%)
local gpr inst bytes
helped 14 1471 1309 1309
hurt 1 88 384 384
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
The main goal of this pass to merge temporary registers in order
to reduce the total number of registers and also to produce
optimal TGSI code.
In fact, compilers seem to be confused when temporary variables
are already merged, maybe because it's done too early in the
process.
Skipping the pass, reduce both the register pressure and the code
size, at least for Nouveau and RadeonSI because they have a real
backend compiler.
Found by luck while fixing an issue in the TGSI dead code elimination
pass which affects tex instructions with bindless samplers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Because the buffer is new, it can't be referenced by any CS.
This can save few CPU cycles by skipping the whole
PIPE_TRANSFER_UNSYNCHRONIZED if in amdgpu_bo_map().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There are 2 major hw changes:
- The address must always point to the address of level 0. GFX9 tiling
modes don't allow binding to a non-0 level.
- 3D must always be bound as 3D, because 2D and 3D use entirely different
tiling modes, and the texture target determines which set of modes is
used.
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Shared context support for VAOs was dropped in 0b2750620b.
From the ARB_vertex_array_object spec:
"This extension differs from GL_APPLE_vertex_array_object
in that client memory cannot be accessed through a
non-zero vertex array object. It also differs in that
vertex array objects are explicitly not sharable between
contexts."
Nobody should be using this extension over
ARB_vertex_array_object anymore so just drop it rather than
adding locking back just for VAOs created from these
functions.
For reference the Nvidia blob doesn't expose this extension.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2: - Added some error handling.
- memset the buffer to 0.
v3: Added assert for buffer size.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
In order to cleanly eliminate exports rewrite the
code first to mirror how radeonsi works for now.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These need to be ordered as per shader enum ordering, I'll
rewrite this soon, but this is a bug fix.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
due to the lack of pipe_resource wrapping, we can get this call from inside
of driver calls, which would try to lock an already-locked mutex.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If r300g is the only radeon driver built, the Android build fails to
build:
ninja: error:
'out/target/product/linaro_x86_64/obj/STATIC_LIBRARIES/libmesa_pipe_radeon_intermediates/export_includes',
needed by
'out/target/product/linaro_x86_64/obj/SHARED_LIBRARIES/gallium_dri_intermediates/import_includes',
missing and no known rule to make it
This is because the path to build libmesa_pipe_radeon was only getting
added for r600g and radeonsi, but the library dependency was added for
all radeon drivers. As libmesa_pipe_radeon is not needed for r300g, drop
the library dependency.
Cc: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
From Chapter 5 'Shared Objects and Multiple Contexts' of
the OpenGL 4.5 spec:
"Objects which contain references to other objects include
framebuffer, program pipeline, query, transform feedback,
and vertex array objects. Such objects are called container
objects and are not shared"
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Just return earlier in that case. Also set prefix to an empty string, so
we don't get to use it undefined.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to emit BLEND_STATE, which size is 1 + 2 * nr_draw_buffers
dwords (on gen8+), but the BLEND_STATE struct length is always 17. By
marking it size 1, which is actually the size of the struct minus the
BLEND_STATE_ENTRY's, we can emit a BLEND_STATE of variable number of
entries.
For gen6 and gen7 we set length to 0, since it only contains
BLEND_STATE_ENTRY's, and no other data.
With this change, we also change the code for blorp and anv to emit only
the needed BLEND_STATE_ENTRY's, instead of always emitting 16 dwords on
gen6-7 and 17 dwords on gen8+.
v2:
- Use designated initializers on blorp and remove 0 from
initialization (Jason)
- Default entries to disabled on Vulkan (Jason)
- Rebase code.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the 'dwords' dict is empty, max(dwords.keys()) throws an exception.
This case could happen when we have an instruction that is only an array
of other structs, with variable length.
v2:
- Add another clause for empty dwords and make it work with python 3
(Dylan)
- Set the length to 0 if dwords is empty, and do not declare dw
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before this commit, when a group with count="0" is found, only one field
is added to the struct representing the instruction. This causes only
one entry to be printed by aubinator, for variable length groups.
With this commit we "detect" that there's a variable length group
(count="0") and store the offset of the last entry added to the struct
when reading the xml. When finally reading the aubdump file, we check
the size of the group and whether we have variable number of elements,
and in that case, reuse the last field to add the remaining elements.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The section of the PRM mentioned in the code comment above this table
says that this format supports the render target write message. Internal
documentation says that this format also supports alpha blending. As a
side effect, this allows CCS_D buffers to be created for images with
this format.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Until now the spilling cost calculation was neglecting the amount of
data read from the register during the spilling cost calculation.
This caused it to make suboptimal decisions in some cases leading to
higher memory bandwidth usage than necessary.
Improves Unigine Heaven performance by ~4% on BDW, reversing an
unintended FPS regression from my previous commit
147e71242c with n=12 and statistical
significance 5%. In addition SynMark2 OglCSDof performance is
improved by an additional ~5% on SKL, and a Kerbal Space Program
apitrace around the Moho planet I can provide on request improves by
~20%.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is what we use later on to compute the number of registers that
will actually get spilled to memory, so it's more likely to match
reality than the current open-coded approximation.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Curro pointed out that I should not just check for MACH, but use
the reads_accumulator_implicitly() helper, which would also prevent
the same bug with MAC and SADA2 (if we ever decide to use them).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Fixes following building errors due to missing include paths:
external/mesa/src/amd/common/ac_shader_info.c:23:10: fatal error: 'nir/nir.h' file not found
^
external/mesa/src/compiler/nir/nir.h:48:10: fatal error: 'nir_opcodes.h' file not found
^
Fixes: 224cf29 "radv/ac: add initial pre-pass for shader info gathering"
Acked-by: Dave Airlie <Airlied@redhat.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
This just updates this to use the same flags as radeonsi
for consistency.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
IVB is running into some spilling issues in piglit with the
loop removed. However those tests are not really reflective
of a real world use case, also fp64 is brand new to IVB
so we leave the spilling issues to be resolved at a later
time.
Run time for shader-db on my machine goes from ~795 seconds to
~665 seconds.
shader-db results BDW:
total instructions in shared programs: 12969459 -> 12968891 (-0.00%)
instructions in affected programs: 1463154 -> 1462586 (-0.04%)
helped: 3622
HURT: 3326
total cycles in shared programs: 246453572 -> 246504318 (0.02%)
cycles in affected programs: 208842622 -> 208893368 (0.02%)
helped: 24029
HURT: 35407
total loops in shared programs: 2931 -> 2931 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 14560 -> 14498 (-0.43%)
spills in affected programs: 2270 -> 2208 (-2.73%)
helped: 17
HURT: 2
total fills in shared programs: 19671 -> 19632 (-0.20%)
fills in affected programs: 2060 -> 2021 (-1.89%)
helped: 17
HURT: 2
LOST: 17
GAINED: 40
Most of the hurt shaders are 1-2 instructions, with what looks like a max of 7.
I've looked at the worst cycles regressions and as far as I can tell its just
a scheduling difference.
Acked-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If packing doesn't cross locations we can easily make use of
ARB_enhanced_layouts to do packing rather than using the GLSL IR
lowering pass lower_packed_varyings().
Shader-db Broadwell results:
total instructions in shared programs: 12977822 -> 12977819 (-0.00%)
instructions in affected programs: 1871 -> 1868 (-0.16%)
helped: 4
HURT: 3
total cycles in shared programs: 246567288 -> 246567668 (0.00%)
cycles in affected programs: 1370386 -> 1370766 (0.03%)
helped: 592
HURT: 733
Acked-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently the NIR backends depend on GLSL IR copy propagation to
fix up the interpolateAt* function params after varying packing
changes the shader input to a global. It's possible copy propagation
might not always do what we need it too, and we also shouldn't
depend on optimisations to do this type of thing for us.
I'm not sure if the same is true for TGSI, but the following
commit should re-enable packing for most cases in a safer way,
so we just disable it everywhere.
No change in shader-db for i965 (BDW)
Acked-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These should be lowered away in GLSL IR but if we don't get dead
code to clean them up it causes issues in glsl_to_nir.
We wan't to drop as many GLSL IR opts in future as we can so this
makes glsl_to_nir just ignore the vars if it sees them.
In future we will want to just use the nir lowering pass that
Vulkan currently uses.
Acked-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This shuffles constants down in the reverse of what the previous
patch does and applies some simpilifications that may be made
possible from doing so.
Shader-db results BDW:
total instructions in shared programs: 12980814 -> 12977822 (-0.02%)
instructions in affected programs: 281889 -> 278897 (-1.06%)
helped: 1231
HURT: 128
total cycles in shared programs: 246562852 -> 246567288 (0.00%)
cycles in affected programs: 11271524 -> 11275960 (0.04%)
helped: 1630
HURT: 1378
V2: mark float opts as inexact
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
V2: mark float opts as inexact
If one of the inputs to an mul/add is the result of another
mul/add there is a chance that we can reuse the result of that
mul/add in other calls if we do the multiplication in the right
order.
Also by attempting to move all constants to the top we increase
the chance of constant folding.
For example it is a fairly common pattern for shaders to do something
similar to this:
const float a = 0.5;
in vec4 b;
in float c;
...
b.x = b.x * c;
b.y = b.y * c;
...
b.x = b.x * a + a;
b.y = b.y * a + a;
So by simply detecting that constant a is part of the multiplication
in ffma and switching it with previous fmul that updates b we end up
with:
...
c = a * c;
...
b.x = b.x * c + a;
b.y = b.y * c + a;
Shader-db results BDW:
total instructions in shared programs: 13011050 -> 12967888 (-0.33%)
instructions in affected programs: 4118366 -> 4075204 (-1.05%)
helped: 17739
HURT: 1343
total cycles in shared programs: 246717952 -> 246410716 (-0.12%)
cycles in affected programs: 166870802 -> 166563566 (-0.18%)
helped: 18493
HURT: 7965
total spills in shared programs: 14937 -> 14560 (-2.52%)
spills in affected programs: 9331 -> 8954 (-4.04%)
helped: 284
HURT: 33
total fills in shared programs: 20211 -> 19671 (-2.67%)
fills in affected programs: 12586 -> 12046 (-4.29%)
helped: 286
HURT: 33
LOST: 39
GAINED: 33
Some of the hurt will go away when we shuffle things back down to the
bottom in the following patch. It's also noteworthy that almost all of the
spill changes are in Deus Ex both hurt and helped.
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Didn't turn out as useful as I'd hoped, but it will help alot more on
i965 by reducing regressions when we drop brw_do_channel_expressions()
and brw_do_vector_splitting().
I'm not sure how much sense 'is_not_used_by_conditional' makes on
platforms other than i965 but since this is a new opt it at least
won't do any harm.
shader-db BDW:
total instructions in shared programs: 13029581 -> 13029415 (-0.00%)
instructions in affected programs: 15268 -> 15102 (-1.09%)
helped: 86
HURT: 0
total cycles in shared programs: 247038346 -> 247036198 (-0.00%)
cycles in affected programs: 692634 -> 690486 (-0.31%)
helped: 183
HURT: 27
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Blob won't render to this format, and sampling from it it uses the same
fmt value for r8g8b8_snorm and r8g8b8a8_snorm. But this is what is what
blocks us from jumping from gl30/gles20 to gl31/gles30. So a hack it
is!
Signed-off-by: Rob Clark <robdclark@gmail.com>
Support supertiled textures on hardware that has the appropriate
feature flag SUPERTILED_TEXTURE.
Most of the scaffolding was already in place in etna_layout_multiple:
case ETNA_LAYOUT_SUPER_TILED:
*paddingX = 64;
*paddingY = 64;
*halign = TEXTURE_HALIGN_SUPER_TILED;
So this is just a matter of allowing it.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
It is always the draw ring. Except for a5xx queries like time-elapsed,
where we will eventually want to emit cmds into both binning and draw
rings.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Some queries on a4xx and all queries on a5xx can do result accumulation
on CP so we don't need to track per-tile samples. We do still need to
handle pausing/resuming while switching batches (in case the query is
active over multiple draws which are executed out of order).
So introduce new accumulated-query helpers for these sorts of queries,
since it doesn't really fit in cleanly with the original query infra-
structure.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Move a bit more of the logic shared by all query types (active tracking,
etc) into common code. This avoids introducing a 3rd copy of that logic
for a5xx.
Signed-off-by: Rob Clark <robdclark@gmail.com>
For a5xx (and actually some queries on a4xx) we can accumulate results
in the cmdstream, so we don't need this elaborate mechanism of tracking
per-tile query results. So make it into vfuncs so generation specific
backend can use it when it makes sense.
Signed-off-by: Rob Clark <robdclark@gmail.com>
opt_register_coalesce() was optimizing sequences such as:
mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D
mach(8) vgrf5.xy:D, attr18.xyyy:D, attr19.xyyy:D
mov(8) m4.zw:F, vgrf5.xxxy:F
into:
mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D
mach(8) m4.zw:D, attr18.xxxy:D, attr19.xxxy:D
This doesn't work - if we're going to reswizzle MACH, we'd need to
reswizzle the MUL as well. Here, the MUL fills the accumulator's .zw
components with attr18.yy * attr19.yy. But the MACH instruction expects
.z to contain attr18.x * attr19.x. Bogus results ensue.
No change in shader-db on Haswell. Prevents regressions in Timothy's
patches to use enhanced layouts for varying packing (which rearrange
code just enough to trigger this pre-existing bug, but were fine
themselves).
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
From Chapter 5 'Shared Objects and Multiple Contexts' of
the OpenGL 4.5 spec:
"Objects which contain references to other objects include
framebuffer, program pipeline, query, transform feedback,
and vertex array objects. Such objects are called container
objects and are not shared"
For we leave locking in place for framebuffer objects because
the EXT fbo extension allowed sharing.
We could maybe just replace the hash with an ordinary hash table
but for now this should remove most of the unnecessary locking.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This pattern was only useful when we used mutex locks, which the previous
commit removed.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
From Chapter 5 'Shared Objects and Multiple Contexts' of
the OpenGL 4.5 spec:
"Objects which contain references to other objects include
framebuffer, program pipeline, query, transform feedback,
and vertex array objects. Such objects are called container
objects and are not shared"
For we leave locking in place for framebuffer objects because
the EXT fbo extension allowed sharing.
V2: (Timothy Arceri)
- rebased and dropped changes to framebuffer objects
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We should never get here if this is 0 unless there is a
bug. Replace the check with an assert.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
As mentioned in the manual - comparing pthread_t handles via the C
comparison operator is incorrect and pthread_equal() should be used
instead.
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Fixes: d8d81fbc31 ("mesa: Add infrastructure for a worker thread to process GL commands.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
As pointed out by compiler
./llvm/codegen.hpp:52:22: error: ‘<::’ cannot begin a template-argument list [-fpermissive]
./llvm/codegen.hpp:52:22: note: ‘<:’ is an alternate spelling for ‘[’. Insert whitespace between ‘<’ and ‘::’
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
This function is actually a wrapper for component_slots()
and it always returns 1 (or N) for samplers. Since
component_slots() now return 1 for samplers, it can go.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It looks inconsistent to return 1 for image types and 0 for
sampler types. Especially because component_slots() is mostly
used by values_for_type() which always returns 1 for samplers.
For bindless, this value will be bumped to 2 because the
ARB_bindless_texture states that bindless samplers/images
should consume two components.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The OpenGL extension KHR_no_error is exposed since commit
d42d150ad2 by Timothy Arceri. Therefore it
should be marked as "started" in the features.txt
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This reverts commit 4d4558411d.
This was a wrong call, while it fixed issue with 3DMark it
actually introduced regression elsewhere.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
This enables support on GM200+ for:
- GL_AMD_vertex_shader_layer
- GL_AMD_vertex_shader_layer_viewport_index
- GL_ARB_shader_viewport_layer_array
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
[lyude: add relnotes/TES cap]
Signed-off-by: Lyude <lyude@redhat.com>
[imirkin: move relnotes to right place, add features.txt]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
EMIT only applies to geometry shaders. For everything else, we want to
export the viewport normally.
Signed-off-by: Lyude <lyude@redhat.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This breaks the guts of MI_MATH (the instruction part) out into its own
structure with proper named values.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed by: Iago Toral Quiroga <itoral@igalia.com>
Looking at some Talos shaders vs radeonsi, I noticed they use
tex_lz in a few places, so we should be able to as well.
Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is nicer on caches, and the next commit will need to access
the structure from a different place.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
min/max_index are typically hints for the u_vbuf module, not the driver.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Most drivers don't need it and shouldn't need it because it can't be used
in some cases (indirect draws, primitive restart, count from streamout).
Reviewed-by: Brian Paul <brianp@vmware.com>
It helps to find the envvar option you are looking for.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Show the commit hash and the title in a way that it is easier to copy
and paste in the bin/.cherry-ignore-extra file if we want to ignore
those commits for the future.
v2:
- Use printf instead echo (Eric Engestrom)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Both scripts does not use a file with the commits to ignore. So if we
have handled one of the suggested commits and decided we won't pick it,
the scripts will continue suggesting them.
v2:
- Mark the candidates in bin/get-extra-pick-list.sh (Juan A. Suarez)
- Use bin/.cherry-ignore to store rejected patches (Emil)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gives me approximately a 2% perf increase in bot dota2 & talos.
Having descriptors (both sets and vertex buffers) prefetched
didn't help so I didn't include that.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
With this we don't have any operations on a pool with non-freeable
descriptors left that have O(#descriptors) complexity.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
v2: Handle out of pool memory error.
v3: Actually use VK_ERROR_OUT_OF_POOL_MEMORY_KHR for the error condition.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Bas Nieuwenhuizen <basni@google.com>
If VDPAU is installed in the non-default location, we'll fail to find
the headers and error at build time.
../../src/gallium/include/state_tracker/vdpau_dmabuf.h:37:25: fatal error: vdpau/vdpau.h: No such file or directory
#include <vdpau/vdpau.h>
^
Fixes: faba96bc60 ("st/vdpau: add new interop interface")
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Resolves build issues like the following:
src/gallium/winsys/sw/dri/dri_sw_winsys.c:203:31: error: pointer of type ‘void *’ used in arithmetic [-Werror=pointer-arith]
data = dri_sw_dt->data + (dri_sw_dt->stride * box->y) + box->x * blsize;
^
src/gallium/winsys/sw/dri/dri_sw_winsys.c:203:62: error: pointer of type ‘void *’ used in arithmetic [-Werror=pointer-arith]
data = dri_sw_dt->data + (dri_sw_dt->stride * box->y) + box->x * blsize;
^
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The macro is introduced with pkgconfig v0.28 which isn't universally
available. Thus it will error at configure stage.
Reported-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Fixes: ce562f9e3f ("EGL: Implement the libglvnd interface for EGL (v3)")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
These checks do not generate any errors. Move them so we can add
KHR_no_error support and still make sure we do these checks.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We also move _mesa_update_array_format() into the caller.
This gets these functions ready for KHR_no_error support.
V2: Updated function comment as suggested by Brian.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When any count[i] is negative, we must skip all draws.
Moving to vbo makes the subsequent change easier.
v2:
- provide the function in all contexts, including GLES
- adjust validation accordingly to include the xfb check
v3:
- fix mix-up of pre- and post-xfb prim count (Nils Wallménius)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes the build after:
commit 399ebd2a84
Author: Dave Airlie <airlied@redhat.com>
Date: Wed Apr 19 06:18:23 2017 +1000
radv/meta: add common shader vertex generation function
Signed-off-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes the build after:
commit 224cf2906a
Author: Dave Airlie <airlied@redhat.com>
Date: Mon Apr 17 13:01:52 2017 +1000
radv/ac: add initial pre-pass for shader info gathering
Signed-off-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The vs vertex generate and fs noop shaders are used in a few places,
so refactor them out.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For depth clears we have to pass the depth in the 2nd
component, we can use push constants for some of this
later to drop the vertex buffer completely
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This removes the vertex buffer, and just generates the values
in the shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of passing in the same 1.0, -1.0 combinations via
vertex buffers, we can just use vertex id to have the vertex
shader build them. This function introduces the generator
code needed, later patches will use this.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some of the shaders could just generate the vertex data in the
shader, so add helpers to allow us to move to doing that.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This bumps the limit to the number of sets to 32, now that
we have proper support for it. It also uses 1u in a few places
to make things a bit safer.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We want to expose more descriptor sets to the applications,
but currently we have a 1:1 mapping between shader descriptor
sets and 2 user sgprs, limiting us to 4 per stage. This commit
check if we don't have enough user sgprs for the number of
bound sets for this shader, we can ask for them to be indirected.
Two sgprs are then used to point to a buffer or 64-bit pointers
to the number of allocated descriptor sets. All shaders point
to the same buffer.
We can use some user sgprs to inline one or two descriptor sets
in future, but until we have a workload that needs this I don't
think we should spend too much time on it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds an initial implementation to allocate the user
sgprs and make sure we don't run out if we try to bind
a bunch of descriptor sets.
This can be enhanced further in the future if we add
support for inlining push constants.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
mostly documenting things, since with modern llvm we always have the
spill enabled.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In practice this will probably just drop draw id in a few places.
v2: just do draw_id for now. (Bas)
it might be possible to do something more if we need it in the
future. (nha)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
There is some radv specific info we need to gather from shaders
before we get into converting nir->llvm, so we can make
better decisions especially around user sgpr allocation.
This is just an initial placeholder to gather if sample positions
are required in the frag shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In particular, move per-shader-stage info out to a seperate array of
enum's indexed by shader stage. This will make it easier to add more
shader stages as well as new per-stage state (like SSBOs).
Signed-off-by: Rob Clark <robdclark@gmail.com>
a3xx/a4xx use the generic u_blitter path, which will make state dirty
bits be set appropriately thanks to the automagic of generic code
setting generic state in the driver. And a5xx has a blit/dma engine
(actually, two) so it doesn't need these extra dirty bits set.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This makes it easier to deal with adding additional stages which have
their own driver-params. The duplicated code this introduces can be
refactored out after a later patch moves to per-shader-stage dirty
flags.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Note that this involves juggling around a bit when we emit and clear
texture state. So split out from the patch that adds the helper to set
all state dirty.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Make this an array indexed by shader stage, as is done elsewhere for
other per-shader-stage state. This will simplify things as more shader
stages are eventually added.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Each of the ir3 users has *basically* the same logic for comparing the
previous and current shader key, to see which, if any, shader state
needs to be marked dirty due to shader variant change.
The difference between gen's was just that some lowering flags never get
set on certain generations. But it doesn't really hurt to include the
extra checks (because both keys would have false).
Signed-off-by: Rob Clark <robdclark@gmail.com>
So atexit() is horrible and 4aea8fe7 is probably not a good idea. But
add an extra layer of duct-tape to the problem. Otherwise we hit a
situation where app using an atexit() handler that runs later than ours
doesn't hang when trying to tear down a context.
(gdb) bt
#0 util_queue_killall_and_wait (queue=queue@entry=0x52bc80) at ../../../src/util/u_queue.c:264
#1 0x0000007fb6c380c0 in atexit_handler () at ../../../src/util/u_queue.c:51
#2 0x0000007fb7730e2c in __run_exit_handlers () from /lib64/libc.so.6
#3 0x0000007fb7730e5c in exit () from /lib64/libc.so.6
#4 0x0000007fb7ce17dc in piglit_report_result (result=PIGLIT_PASS) at /home/robclark/src/piglit/tests/util/piglit-util.c:267
#5 0x0000007fb7ef99f8 in process_next_event (x11_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:139
#6 0x0000007fb7ef9a90 in enter_event_loop (winsys_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:153
#7 0x0000007fb7ef8e50 in run_test (gl_fw=0x432c20, argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:88
#8 0x0000007fb7edb890 in piglit_gl_test_run (argc=1, argv=0x7ffffff588, config=0x7ffffff400) at /home/robclark/src/piglit/tests/util/piglit-framework-gl.c:203
#9 0x0000000000401224 in main (argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/bugs/drawbuffer-modes.c:46
(gdb) c
Continuing.
[Thread 0x7fb67580c0 (LWP 3471) exited]
^C
Thread 1 "drawbuffer-mode" received signal SIGINT, Interrupt.
0x0000007fb72dda34 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
(gdb) bt
#0 0x0000007fb72dda34 in pthread_cond_wait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1 0x0000007fb6c38304 in cnd_wait (mtx=0x5bdc90, cond=0x5bdcc0) at ../../../include/c11/threads_posix.h:159
#2 util_queue_fence_wait (fence=0x5bdc90) at ../../../src/util/u_queue.c:106
#3 0x0000007fb6daac70 in fd_batch_sync (batch=0x5bdc70) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:233
#4 batch_reset (batch=batch@entry=0x5bdc70) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:183
#5 0x0000007fb6daa5e0 in batch_flush (batch=0x5bdc70) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:290
#6 fd_batch_flush (batch=0x5bdc70, sync=<optimized out>) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch.c:308
#7 0x0000007fb6daba2c in fd_bc_flush (cache=0x461220, ctx=0x52b920) at ../../../../../src/gallium/drivers/freedreno/freedreno_batch_cache.c:141
#8 0x0000007fb6dac954 in fd_context_flush (pctx=0x52b920, fence=0x0, flags=<optimized out>) at ../../../../../src/gallium/drivers/freedreno/freedreno_context.c:54
#9 0x0000007fb6b43294 in st_glFlush (ctx=<optimized out>) at ../../../src/mesa/state_tracker/st_cb_flush.c:121
#10 0x0000007fb69a84e8 in _mesa_make_current (newCtx=newCtx@entry=0x0, drawBuffer=drawBuffer@entry=0x0, readBuffer=readBuffer@entry=0x0) at ../../../src/mesa/main/context.c:1654
#11 0x0000007fb6b7ca58 in st_api_make_current (stapi=<optimized out>, stctxi=0x0, stdrawi=0x0, streadi=0x0) at ../../../src/mesa/state_tracker/st_manager.c:827
#12 0x0000007fb6cc87e8 in dri_unbind_context (cPriv=<optimized out>) at ../../../../../src/gallium/state_trackers/dri/dri_context.c:217
#13 0x0000007fb6cc80b0 in driUnbindContext (pcp=0x5271e0) at ../../../../../../src/mesa/drivers/dri/common/dri_util.c:591
#14 0x0000007fb7d1da08 in MakeContextCurrent (dpy=0x433380, draw=0, read=0, gc_user=0x0) at ../../../src/glx/glxcurrent.c:214
#15 0x0000007fb7a8d5e0 in glx_platform_make_current () from /lib64/libwaffle-1.so.0
#16 0x0000007fb7a894e4 in waffle_make_current () from /lib64/libwaffle-1.so.0
#17 0x0000007fb7ef8c60 in piglit_wfl_framework_teardown (wfl_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_wfl_framework.c:628
#18 0x0000007fb7ef939c in piglit_winsys_framework_teardown (winsys_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:238
#19 0x0000007fb7ef9c30 in destroy (gl_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:212
#20 0x0000007fb7edb7c4 in destroy () at /home/robclark/src/piglit/tests/util/piglit-framework-gl.c:184
#21 0x0000007fb7730e2c in __run_exit_handlers () from /lib64/libc.so.6
#22 0x0000007fb7730e5c in exit () from /lib64/libc.so.6
#23 0x0000007fb7ce17dc in piglit_report_result (result=PIGLIT_PASS) at /home/robclark/src/piglit/tests/util/piglit-util.c:267
#24 0x0000007fb7ef99f8 in process_next_event (x11_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:139
#25 0x0000007fb7ef9a90 in enter_event_loop (winsys_fw=0x432c20) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_x11_framework.c:153
#26 0x0000007fb7ef8e50 in run_test (gl_fw=0x432c20, argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/util/piglit-framework-gl/piglit_winsys_framework.c:88
#27 0x0000007fb7edb890 in piglit_gl_test_run (argc=1, argv=0x7ffffff588, config=0x7ffffff400) at /home/robclark/src/piglit/tests/util/piglit-framework-gl.c:203
#28 0x0000000000401224 in main (argc=1, argv=0x7ffffff588) at /home/robclark/src/piglit/tests/bugs/drawbuffer-modes.c:46
(gdb) r
Fixes: 4aea8fe7 ("gallium/u_queue: fix random crashes when the app calls exit()")
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Similar to st_convert_image(), will be useful for bindless. While
we are at it, rename convert_sampler() to convert_sampler_from_unit()
and make 'st' a const argument.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The description under RENDER_SURFACE_STATE::RedClearColor says,
For Sampling Engine Multisampled Surfaces and Render Targets:
Specifies the clear value for the red channel.
For Other Surfaces:
This field is ignored.
This means that the sampler on BDW doesn't support CCS.
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Commit bfee9866 "radv: Use RELEASE_MEM packet for MEC timestamp query."
added WriteTimestamp handling for compute queues but forgot to flip
the flag.
Tested with DOOM (by me) and CTS (by Bas), but without verification
that these tests actually use timestamps on compute queues.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For compute shaders, we need to be able to allocate some "high"
registers (r48.x to r55.w). (Possibly these are global to all threads
in a warp?) Add a new register class to handle this.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The layout of CP_LOAD_STATE packet is slightly different on a4xx+.
Switch to the a4xx+ specific CP_LOAD_STATE4 to get the correct encoding.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The warning should be printed only when one explicitly uses the
deprecated configure toggle.
Fixes: 7748c3f5eb ("configure.ac: deprecate --with-egl-platforms over
--with-platforms")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Currently the former controls more than just EGL. With follow-up commits
we'll unwind and fix things so that one can build the different drivers
with said platform support.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
The configure option is used by more than just EGL and with next commit
we'll rename it accordingly. Thus having the check will (and is atm)
incorrect.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
We are not using either of these. The respecive xcb packages are used
instead.
v2: Rebase, reword commit message.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The new interface mostly just sits on top of the existing library.
The only change to the existing EGL code is to split the client
extension string into platform extensions and everything else. On
non-glvnd builds, eglQueryString will just concatenate the two strings.
The EGL dispatch stubs are all generated. The script is based on the one
used to generate entrypoints in libglvnd itself.
v2: [Kyle]
- Rebased against master.
- Reworked the EGL makefile to use separate libraries
- Made the EGL code generation scripts work with Python 2 and 3.
- Change gen_egl_dispatch.py to use argparse for the command line arguments.
- Assorted formatting and style cleanup in the Python scripts.
v3: [Emil Velikov]
- Rebase
- Remove separate glvnd glx/egl configure toggles
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
We should check the presence in order to determine if we should
[implicitly] set the CFLAGS/LIBS
v2: Drop spurious OMX hunk (Eric)
Cc: Eric Anholt <eric@anholt.net>
Reported-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Earlier commit bumped the requirement for the SWR driver.
v2: Fold the note with the LLVM 3.9 one (Tim)
Fixes: 3c52a7316a ("swr: [configure.ac/scons] require c++14")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
I accidentally moved the bo->bufmgr dereference above the NULL check
when cleaning up this code.
While passing NULL to free() is a common pattern...passing NULL to
unmap seems pretty bad. You really ought to know whether you have
a buffer or not. We don't want to paper over bugs like that. So,
just drop the NULL check altogether.
CID: 1405006
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Starting positions >= 32 are not part of the header, rather than >.
Caught by Coverity, which found that "bits <<= field->start" may shift
by 32, which has undefined behavior.
CID: 1404968
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
If ret is 0, we return. If ret is not 0, we return. This is dead.
CID: 1405013 (Structurally dead code (UNREACHABLE))
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This hides the overhead of everything in the driver after the CS flush and
before returning from pipe_context::flush.
Only microbenchmarks will benefit.
+2% FPS for glxgears.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Not possible with GL and it will make future gallium rework easier.
(also it's something I wouldn't like to support)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This moves the hashing of shader source for the cache lookup to before
the preprocessor. In our experience, shaders are unlikely to hash the
same after preprocessing if they didn't hash the same before, so we can
skip preprocessing for cache hits.
Improves Deus Ex start-up times with a warm cache from ~30 seconds to
~22 seconds.
Also fixes the leaking of state.
V2: fix indentation
v3: add the value of MESA_EXTENSION_OVERRIDE to the hash of the shader.
Tested-by (v2): Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Due to a max limit of 65,536 entries on the index table that we use to
decide if we can skip compiling individual shaders, it is very likely
we will have collisions.
To avoid doing too much work when the linked program may be in the
cache this patch delays calling the optimisations until link time.
Improves cold cache start-up times on Deus Ex by ~20 seconds.
When deleting the cache index to simulate a worst case scenario
of collisions in the index, warm cache start-up time improves by
~45 seconds.
V2: fix indentation, make sure to call optimisations on cache
fallback, make sure optimisations get called for XFB.
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This prevents a user from using a cache created on one hardware
generation on a different one. Of course, with Intel hardware, this
requires moving their drive from one machine to another but it's still
possible and we should prevent it.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
This adds native fence fd support to etnaviv, similarly to commit
0b98e84e9b ("freedreno: native fence fd"), enabled for kernel
driver version 1.1 or later.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
v2 (Andreas Boll):
- Mark GL 4.1 as supported by i965/gen7+
- Mark GL_ARB_shader_precision as supported by i965/gen7+
- Update release notes
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This patch adds support for the SINGLE_BUFFER feature on GC3000
GPUs, which allows rendering to a single buffer using multiple pixel
pipes.
This feature is always used when it is available, which means that
multi-tiled formats are no longer being used in that case, and all
buffers will be normal (super)tiled. This mimics the behavior of the
blob on GC3000.
- Because the same format can be used to render to and texture from,
this avoids an extra resolve pass when rendering to texture.
- i.MX6qp includes a PRE which can scan-out directly from tiled formats,
avoiding untiling overhead.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Update to etna_viv commit 8486a97.
austriancoder: changed patch to include isa redefinition fix.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Request chipMinorFeatures bitfields 4 and 5 from the
drm driver.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
When passing render buffers from EGL clients to a wayland compositor,
the resource tile status must be resolved because otherwise the tile
status is lost in the transfer and cleared parts of the buffer will
contain old contents.
The same applies when sampling directly from a renderable resource.
lst: Add seqno tracking, to skip flush when not needed.
Fixes: aadcb5e94b35 ("etnaviv: enable TS, but disable autodisable")
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Before resolving a resource into its scanout prime buffer, check that
the prime resource is actually older. If it is not, the resolve is an
expensive no-op, and we better skip it.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Add polygon stipple functionality to the fragment shader.
Explicitly turn off polygon stipple for lines and points, since we
do them using tris.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
In commit c35fa7a, we changed the "width" of DF source registers to 2,
which is conceptually fine. Unfortunately a VertStride of 2 is not
allowed by align16 instructions on IVB/BYT, and the regular VertStride
of 4 works fine in any case.
See generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-round-double.shader_test
for example:
cmp.ge.f0(8) g18<1>DF g1<0>.xyxyDF -g8<2>DF { align16 1Q };
ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed
cmp.ge.f0(8) g19<1>DF g1<0>.xyxyDF -g9<2>DF { align16 2N };
ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed
v2:
- Add spec quote (Curro).
- Change the condition to only BRW_VERTICAL_STRIDE_2 (Curro)
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This is required for correctness in presence of multiple 4-wide flag
writes (e.g. 4-wide instructions with a conditional mod set) which
update a different portion of the same 8-bit flag subregister.
Right now we keep track of flag dataflow with 8-bit granularity and
consider flag writes to have killed any previous definition of the
same subregister even if the write was less than 8 channels wide,
which can cause live flag register updates to be dead
code-eliminated incorrectly.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
horiz_offset() shouldn't be doing anything for scalar registers,
because all channels of any SIMD instructions will end up reading or
writing the same component of the register, so shifting the register
offset would be wrong.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Re-implement in terms of is_uniform() for
simplicity. Pass argument by const reference. Clarify commit
message. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Otherwise for a pack_double_2x32_split opcode, we emit:
vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134
mov(8) g5<1>UD g5<4>.xUD { align16 1Q compacted };
mov(8) g7<2>UD g5<4,4,1>UD { align1 1Q };
ERROR: When the destination spans two registers, the source must span two registers
(exceptions for scalar source and packed-word to packed-dword expansion)
mov(8) g8<2>UD g5.4<4,4,1>UD { align1 2N };
ERROR: The offset from the two source registers must be the same
mov(8) g5<1>UD g6<4>.xUD { align16 1Q compacted };
mov(8) g7.1<2>UD g5<4,4,1>UD { align1 1Q };
ERROR: When the destination spans two registers, the source must span two registers
(exceptions for scalar source and packed-word to packed-dword expansion)
mov(8) g8.1<2>UD g5.4<4,4,1>UD { align1 2N };
ERROR: The offset from the two source registers must be the same
The intention was to emit mov(4)s for the instructions that have ERROR
annotations.
See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test
for example.
v2 (Samuel):
- Instead of setting the exec size to a fixed value, don't double it
(Curro).
- Add PICK_{HIGH,LOW}_32BIT to the condition.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Trivial rebase changes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop useless vec4_visitor dependencies. Demote to
static stand-alone function. Don't write unused components in the
result. Use vec4_builder interface for register allocation. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Take into account offset values less than a full register (32 bytes)
when getting the var from register.
This is required when dealing with an operation that writes half of the
register (like one d2x in IVB/BYT, which uses exec_size == 4).
v2:
- Take in account this offset < 32 in liveness analysis too (Curro)
v3:
- Change formula in var_from_reg() (Curro)
- Remove useless changes (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On IVB, DF instructions have lowered the SIMD width to 4 but the
exec_size will be later doubled. Fix the assert to avoid crashing in
this case.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Simplify assert. Except for the 'inst->group % 4
== 0' part the assertion was redundant with the previous assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This way we can set the destination type as double to all these new opcodes,
avoiding any optimizer's confusion that was happening before.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop no_spill workaround originally needed due to
the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When doing a 64-bit to a smaller data type size conversion, the destination should
be aligned to 64-bits. Because of that, we need to gather the data after the
actual conversion.
Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but
now we split them explicitely in two different instructions:
VEC4_OPCODE_FROM_DOUBLE just do the conversion and
VEC4_OPCODE_PICK_LOW_32BIT will gather the data.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
In the generator we must generate slightly different code for
Ivybridge/Baytrail, because of the way the stride works in
this hardware.
v2:
- Use stride and don't need to fix dst (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Keep the original type when dealing with null registers. Especially
because we do no want to introduce an implicit conversion between
types that could affect the conditional flags.
This affects especially when the original type is DF, and we are working
on Ivybridge/Baytrail.
v2 (Curro)
- Fix typo.
- Use retype() instead of applying the type directly.
- Remove unneeded retype.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We need to split DF instructions in two on IVB/BYT as it needs an
execsize 8 to process 4 DF values (one GRF in total).
v2:
- Rename helper and make it static inline function (Matt).
- Fix indention and add braces (Matt).
v3:
- Don't edit IR instruction when doubling exec_size (Curro)
- Add comment into the code (Curro).
- Manage ARF registers like the others (Curro)
v4:
- Add get_exec_type() function and use it to calculate the execution
size.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Take
destination type as execution type where there is no valid source.
Assert-fail if the deduced execution type is byte. Clarify comment
in get_lowered_simd_width(). Move SIMD width workaround outside of
'if (...inst->size_written > REG_SIZE)' conditional block, since the
problem should be independent of whether the amount of data written
by the instruction is greater or lower than a GRF. Drop redundant
is_ivb_df definition. Drop bogus inst->exec_size < 8 check.
Simplify channel group assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The hardware applies the same channel enable signals to both halves of
the compressed instruction which will be just wrong under non-uniform
control flow. Fix this by splitting those instructions to SIMD4.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
According to the IVB and HSW PRMs:
"2.When the destination requires two registers and the sources are
indirect, the sources must use 1x1 regioning mode."
So for DF instructions the execution size is not limited by the number
of address registers that are available, but by the EU decompression
logic not handling VxH indirect addressing correctly.
This patch limits the SIMD width to 4 in this case.
v2:
- Fix typo (Matt).
- Fix condition (Curro)
v3:
- Add spec quote (Curro)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When converting a DF to 32-bit conversions, we set dst stride to 2,
to fulfill alignment restrictions because the upper Dword of every
Qword will be written with undefined value.
But in IVB/BYT, this is not necessary, as each DF conversion already
writes 2, the first one the real value, and the second one a 0.
That is, IVB/BYT already set stride = 2 implicitly, so we must set it to
1 explicitly to avoid ending up with stride = 4.
v2:
- Fix typo (Matt)
v3:
- Fix stride in the destination's brw_reg, don't modity IR (Curro)
v4:
- Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro)
- Fix comment (Curro).
- Relax hstride assert (Curro)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Minor spelling fixes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This reverts commit 7dccd38b40.
d2x pass fixes SEL instructions when there is a type conversion
by doing a SEL without type conversion and then convert the result.
This pass also takes into account the non-uniform control flow.
Then, 7dccd38b40 is not needed anymore.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Generalize it to lower any unsupported narrower conversion.
v2 (Curro):
- Add supports_type_conversion()
- Reuse existing intruction instead of cloning it.
- Generalize d2x to narrower and equal size conversions.
v3 (Curro):
- Make supports_type_conversion() const and improve it.
- Use foreach_block_and_inst to process added instructions.
- Simplify code.
- Add assert and improve comments.
- Remove redundant mov.
- Remove useless comment.
- Remove saturate == false assert and add support for saturation
when fixing the conversion.
- Add get_exec_type() function.
v4 (Curro):
- Use get_exec_type() function to get sources' type.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On HSW+, scalar DF sources can be accessed using the normal <0,1,0>
region, but on IVB and BYT DF regions must be programmed in terms of
floats. A <0,2,1> region accomplishes this.
v2:
- Apply region <0,2,1> in brw_reg_from_fs_reg() (Curro).
v3:
- Added comment explaining the reason (Curro).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Then the SIMD lowering pass will get rid of any compressed instructions with scalar
source (whether force_writemask_all or not) and we avoid hitting the Gen7 region
decompression bug.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
In IVB and BYT, both regioning parameters and execution sizes are measured as
32-bits element size.
So when we have something like:
mov(8) g2<1>DF g3<4,4,1>DF
We are not actually moving 8 doubles (our intention), but 4 doubles.
We need to double the parameters to cope with this issue. However,
horizontal strides don't behave as they're supposed to on IVB
for DF regions, they will cause each 32-bit half of DF sources to be
strided individually, and doubling the value won't make any difference.
v2:
- Use devinfo directly (Matt).
- Use Baytrail instead of Valleview (Matt).
- Use IvyBridge instead of Ivy (Matt)
- Double the exec_size in code emission (Curro)
v3:
- Change hstride doubling by an assert and fix commit log (Curro).
- Substitute remaining compiler->devinfo by devinfo (Curro).
v4:
- Fix comment (Curro).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The execution data size is the biggest type size of any instruction
operand.
We will use it to know if the instruction deals with DF, because in Ivy
we need to double the execution size and regioning parameters.
v2:
- Fix typo in commit log (Matt)
- Use static inline function instead of fs_inst's method (Curro).
- Define the result as a constant (Curro).
- Fix indentation (Matt).
- Add braces to nested control flow (Matt).
v3 (Curro):
- Add get_exec_type() and other auxiliary functions and use them to
calculate its size.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Fix deduced
execution type for integer vector types. Take destination type as
execution type where there is no valid source. Assert-fail if the
deduced execution type is byte. Move into brw_ir_fs.h header for
consistency with the VEC4 back-end. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On IVB/BYT, region parameters and execution size for DF are in terms of
32-bit elements, so they are doubled. For evaluating the validity of an
instruction, we halve them.
v2 (Sam):
- Add comments.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
4-wide DF operations where NibCtrl applies require and execsize of 8
in IvyBridge/BayTrail.
v2:
- Refactor NibCtrl printing (Matt)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Fix the accounting for memory usage of userptr buffers, which has been wrong
forever (or at least for a long time).
Also initialize flags. Without this initialization, the sparse buffer flag
might end up being set, which leads to staging buffers being used unnecessarily
(and incorrectly) in transfers to or from userptr buffers.
This works around VM faults that occur with the radeon kernel module when
running piglit ./bin/amd_pinned_memory decrement-offset map-buffer -auto
Fixes: e077c5fe65 ("gallium/radeon: transfers and invalidation for sparse buffers")
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Use push descriptors instead of temp descriptor sets.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This allows meta to use push descriptors without disturbing user
push descriptors.
radv_meta_push_descriptor_set differs from vkCmdPushDescriptorSetKHR
in that partial updates are not supported; all descriptors used in
subsequent draw commands must be pushed at the same time.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The Vulkan driver was originally written under the assumption that
VK_ATTACHMENT_UNUSED was basically just for depth-stencil attachments.
However, the way things fell together, VK_ATTACHMENT_UNUSED can be used
anywhere in the subpass description. The blorp-based clear and resolve
code has a bunch of places where we walk lists of attachments and we
weren't handling VK_ATTACHMENT_UNUSED everywhere. This commit should
fix all of them.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
We're about to start requiring it in yet another case and calculating
exactly when one is needed is starting to get prohibitively expensive.
A single surface state doesn't take up that much space so we may as well
create one all the time.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Depending on pipe caps they can be writable in all vertex processing
stages, but only the output of the last stage counts.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Enable code sanitizers by adding -fsanitize=$foo flags for the compiler
and linker.
In addition, this also disables checking for undefined symbols: running
the address sanitizer requires additional symbols which should be provided
by a preloaded libasan.so (preloaded for hooking into malloc & friends
globally), and the undefined symbols check gets tripped up by that.
Running the tests works normally via `make check`, but shows additional
failures with the address sanitizer due to memory leaks that seem to be
mostly leaks in the tests themselves. I believe those failures should
really be fixed. In the mean-time, you can set
export ASAN_OPTIONS=detect_leaks=0
to only check for more serious error types.
v2:
- fail reasonably when an unsupported sanitize flag is given (Eric Engestrom)
Reviewed-by: Bartosz Tomczyk <bartosz.tomczyk86@gmail.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This makes it much easier to throw together a bit of dynamic state. It
also automatically handles flushing so you don't accidentally forget.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This patch enables multisample antialiasing in the OpenSWR software renderer.
MSAA is a proof-of-concept/work-in-progress with bug fixes and performance
on the way. We wanted to get the changes out now to allow several customers
to begin experimenting with MSAA in a software renderer. So as not to
impact current customers, MSAA is turned off by default - previous
functionality and performance remain intact. It is easily enabled via
environment variables, as described below.
It has only been tested with the glx-lib winsys. The intention is to
enable other state-trackers, both Windows and Linux and more fully support
FBOs.
There are 2 environment variables that affect behavior:
* SWR_MSAA_FORCE_ENABLE - force MSAA on, for apps that are not designed
for MSAA... Beware, results will vary. This is mainly for testing.
* SWR_MSAA_MAX_SAMPLE_COUNT - sets maximum supported number of
samples (1,2,4,8,16), or 0 to disable MSAA altogether.
(The default is currently 0.)
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
Removed unnecessary and probably wrong PIPE_BIND_SCANOUT and PIPE_BIND_SHARED
flags in favor of check on single PIPE_BIND_DISPLAY_TARGET flag.
Reference llvmpipe change <bee4c7718a3bd57e3d99f0913d9081cd13fe5fd>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
The context now contains SIMD vectors which must be aligned (specifically
samplePositions in the rastState in the derived state). Failure to align
can result in segv crash on unaligned memory access in vector
instructions.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
The kernel returns frequency in kHz, so to convert to nanosecond
interval that Vulkan uses the dividend should be 1000000.0 and not
100000.0.
This fixes the GPU graph in DOOM and matches the amdgpu-pro blob.
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Probably should have flipped the switch a long time ago, since it
doesn't seem to cause any problems and is a nice perf boost in a number
of cases.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Want to move one of these under ir3_block, so that gives a reason to
migrate the remaining malloc/realloc to ralloc.
Signed-off-by: Rob Clark <robdclark@gmail.com>
When publishing this spec on the OpenGL ES registry, Jon Leech noticed
that it didn't actually mention what the ES dependencies and
interactions were. I looked at extensions_table.h and noted that we
expose it in ES 3.0 contexts, and he added the obvious spec texts.
The updated copy also contains our official extension number.
https://github.com/KhronosGroup/OpenGL-Registry/issues/3
Acked-by: Matt Turner <mattst88@gmail.com>
Needed if we want to allow them taking more than 64 KiB. The calculations
of these already used 32 bits.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This bumps it to the same level as amdgpu-pro, it also
moves a bunch of dEQP-VK.geometry.instanced.* from
NotSupported to Pass.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
gl_BaseVertex is supposed to be 0 in non-indexed draws. Unfortunately, the
way they're implemented, the VGT always generates indices starting at 0,
and the VS prolog adds the start index.
There's a VGT_INDX_OFFSET register which causes the VGT to start at a
driver-defined index. However, this register cannot be written from
indirect draws.
So fix this unlikely case by setting a bit to tell the VS whether the
draw is indexed or not, so that gl_BaseVertex can be adjusted accordingly
when used.
Fixes a bug in
KHR-GL45.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters.*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Especially with subsequent changes, this makes it easier to see the
sequence of state emits at the higher level.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Most trace points happen after an operation, so add a trace point
at the start of the command buffer.
Furthermore, add one after a CmdUpdateBuffer using CP_DMA as that
didn't emit one yet.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
timestamp and pipeline_statistics only do something on begin & end,
so they don't need any action.
Occlusion queries only do something to enable/disable and that
register is set nowhere else so that doesn't need extra support either.
(We technically should fix it to update the reg with the number of
samples, but that hasn't happened yet, so we only change it to
enable/disable counting)
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is only relevant with 0 attachments. In that case we do nothing
on subpass switch already, and the pipeline is the authoritative
source of the number of samples, so this shouldn't change anything.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Flush the HUD value streams to the dump files after every newline.
v2: check that fopen succeeded (Julien)
Reviewed-and-Tested-by: Julien Isorce <jisorce@oblong.com>
The addrlib import meant we'd return after we attempted
to setup the no stencil bits for an S8_UINT, now we break
and use the stencil level info when creating stencil DB
info.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is ported from radeonsi, and avoids the bug in the
addrlib code. This should probably be something addrlib
does for us, but for now this fixes the regression without
changing addrlib and aligns us with radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
state_tracker/st_atom_framebuffer.c:208:27: warning: comparison of constant 4294967295 with expression of type 'uint16_t' (aka 'unsigned short') is always false [-Wtautological-constant-out-of-range-compare]
if (framebuffer->width == UINT_MAX)
~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~
state_tracker/st_atom_framebuffer.c:210:28: warning: comparison of constant 4294967295 with expression of type 'uint16_t' (aka 'unsigned short') is always false [-Wtautological-constant-out-of-range-compare]
if (framebuffer->height == UINT_MAX)
~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~
2 warnings generated.
Fixes: eb0fd0e5f8 ("gallium: decrease the size of pipe_framebuffer_state - 96 -> 80 bytes")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes the following Clang warning.
In file included from radeon_debug.c:32:
./radeon_common_context.h:500:19: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]
extern const char const *radeonVendorString;
v2: - do not remove the duplicate 'const' qualifier, fix it
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Fixes the following Clang warning.
vmw_screen_dri.c:130:1: warning: unused function 'vmw_dri1_intersect_src_bbox' [-Wunused-function]
vmw_dri1_intersect_src_bbox(struct drm_clip_rect *dst,
^
1 warning generated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes the following Clang warnings.
lp_setup_tri.c:55:1: warning: unused function 'subpixel_snap' [-Wunused-function]
subpixel_snap(float a)
^
lp_setup_tri.c:61:1: warning: unused function 'fixed_to_float' [-Wunused-function]
fixed_to_float(int a)
^
v2: - do not remove subpixel_snap() (use !PIPE_ARCH_SSE instead)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Fixes the following Clang warning.
sp_fs_exec.c:56:1: warning: unused function 'sp_exec_fragment_shader' [-Wunused-function]
sp_exec_fragment_shader(const struct sp_fragment_shader_variant *var)
^
1 warning generated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes the following Clang warning.
sp_quad_fs.c:60:1: warning: unused function 'quad_shade_stage' [-Wunused-function]
quad_shade_stage(struct quad_stage *qs)
^
1 warning generated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes the following Clang warning.
sp_tex_sample.c:802:1: warning: unused function 'get_texel_quad_2d' [-Wunused-function]
get_texel_quad_2d(const struct sp_sampler_view *sp_sview,
^
CC sp_tile_cache.lo
1 warning generated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes the following Clang warnings.
main/pack.c:470:1: warning: unused function 'clamp_float_to_uint' [-Wunused-function]
clamp_float_to_uint(GLfloat f)
^
main/pack.c:477:1: warning: unused function 'clamp_half_to_uint' [-Wunused-function]
clamp_half_to_uint(GLhalfARB h)
^
2 warnings generated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes the following Clang warning.
virgl_screen.c:60:12: warning: enumeration value 'PIPE_CAP_DOUBLES' not handled in switch [-Wswitch]
switch (param) {
^
1 warning generated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This removes one level of indentation and will improve readability
for bindless images.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
readInvocationARB() and readFirstInvocationARB() need SHFL.IDX
instruction which is introduced in Kepler.
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: Check if each channel is masked in TGSI_OPCODE_BALLOT (Ilia Mirkin)
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Implementation of readFirstInvocationARB() on nvidia hardware needs a
ballotARB(true) used to decide the first active thread. This expressed
in gm107 asm as (supposing output is $r0):
vote any $r0 0x1 0x1
To model the always true input, which corresponds to the second 0x1
above, we make OP_VOTE accept immediate value 0/1 and emit "0x1" and
"not 0x1" in the src field respectively.
v2: Make sure that asImm() is not NULL (Samuel Pitoiset)
v3: (Ilia Mirkin)
Make the handling more symmetric with predicate version in gm107
Use i->getSrc(s)
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: Make sure that asImm() is not NULL (Samuel Pitoiset)
v3: Check the range of immediate in OP_SHFL (Ilia Mirkin)
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: (Samuel Pitoiset)
Add an assertion to check if the target is Kepler
Make sure that asImm() is not NULL
v3: (Ilia Mirkin)
Check the range of immediate value of OP_SHFL
Use the new setPDSTL API
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
GF100's ISA encoding has a weird form of predicate destination where its
3 bits are split across whole the instruction. Use a dedicated setPDSTL
function instead of original defId which is incorrect in this case.
v2: (Ilia Mirkin)
Change API of setPDSTL() to handle cases of no output
Fix setting of the highest bit in setPDSTL()
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: Emit the original hard-coded 0x1c03 when OP_SHFL is used in gm107's
lowering (Samuel Pitoiset)
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
clang::LangAS::Offset is gone, the behaviour is as if it was 0.
v2: Introduce and use clover::llvm::compat::lang_as_offset (Francisco
Jerez)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
A few functions related to FBOs/renderbuffers should only be used with
window-system buffers, not user-created FBOs. Assert for that.
Add additional comments. No piglit regressions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We only do on-demand renderbuffer allocation for window-system FBOs,
not user-created FBOs. So put the loop inside a conditional.
Plus, add some comments. No piglit regressions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
According to the Vulkan spec, VkPipelineInputAssemblyStateCreateInfo's
primitiveRestartEnable flag should only apply to indexed draws, however
it was being enabled regardless of the type of draw. This could cause
problems for non-indexed draws with >=65535 vertices if the previous
indexed draw used 16-bit indices.
Fixes corruption of the credits text in Mad Max.
v2: Reset primitive restart state after executing a secondary command
buffer.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Makes more sense when we hash the layout for the pipeline cache.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Context: _mesa_add_parameter is sometimes[0] called with a
NULL name as a mean of an unnamed parameter.
Allowing NULL pointer as a name means that it must be NULL checked
each access. So far it isn't always[1] true.
Parameter name is only used for debug purpose (printf) and
to lookup the index/location of the program by the application.
Conclusion, there is no valid reason to use a NULL pointer instead of
an empty string. So it was decided to use an empty string which avoid all
issues related to NULL pointer
[0]: texture gather offsets glsl opcode and st_init_atifs_prog
[1]: at least shader cache, st_nir_lookup_parameter_index and some printfs
Issue found by piglit 'texturegatheroffsets' tests on Nouveau
v4: new patch based on Nicolai/Timothy/ilia discussion
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
unsigned long is a terrible type for a bitfield - if you need fewer
than 32 bits, it wastes 4 bytes. If you need more, things break on
32-bit builds. Just use unsigned.
Even that's a bit ridiculous as we only have one flag today.
Still, it's at least somewhat better.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The drm_i915_gem_create ioctl structure uses a __u64 for the size,
so we should probably use uint64_t to match. In theory, we could
probably have a BO larger than 4GB, using a 48-bit PPGTT - it just
wouldn't be mappable in the CPU's 32-bit address space.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Theoretically, with a 48-bit address space, we could have buffers
with an alignment of >= 4GB. It's a bit silly, but the exec_object
structs (drm_i915_gem_exec_object2) use a __u64 for this, so we may
as well use the same type as the kernel API.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
struct drm_i915_gem_set_tiling's stride field is a __u32.
intel_mipmap_tree::stride is a uint32_t. Using unsigned long just
doesn't make sense. Switching also lets us drop many pointless
locals that only existed to deal with the type mismatch.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The ioctl structs contain __u64 offset and size fields, so make them
uint64_t rather than unsigned long.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
For some reason we passed tiling by pointer, through several layers,
even though the functions only read the initial value, and never
actually change it. We even had a do-while loop that executed until
the tiling mode matched - except it always did, so it only ran once.
We then had bogus error handling in case it changed the tiling mode
to something nonsensical...which it never did.
Drop all this nonsense.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
These calls look like leftover from fallback texture support first
being added to the st in 8f6d9e12be and then later being added
to core mesa in 00e203fe17.
The piglit test fp-incomplete-tex continues to work with this
change.
Reviewed-by: Brian Paul <brianp@vmware.com>
The individual branches of an if/else/endif construct will be executed
some unknown number of times between 0 and 1 relative to the parent
block. Use some factor in between as weight while approximating the
cost of spill/fill instructions within a conditional if-else branch.
This favors spilling registers used within conditional branches which
are likely to be executed less frequently than registers used at the
top level.
Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on
my SKL GT4e. Should have a comparable effect on other platforms. No
significant regressions.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We check these bitfields when computing the Haswell max GL version.
We need to set them ahead of time, or they won't exist, and all our
checks will fail. That sets the max core profile GL version to 4.2.
This introduces the bizarre situation where asking for a GL context
with version 4.3+ fails, but asking for a GL core profile context
with version <= 4.2 actually promotes you a 4.5 context.
GLX_MESA_query_renderer also reported the bogus 4.2 value.
Now it shows 4.5.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reported-and-tested-by: Rafael Ristovski <rafael.ristovski@gmail.com>
Autodisable seems to cause missed rendering in some cases, but
otherwise TS seems to work properly.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Fixes a performance issue with imported winsys buffers as those are
marked with binding sampler view.
This might require a TS flush on single pipe chips that directly
sample from the rendered buffer, but otherwise seems to work fine.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
The TS surface gets cleared by a tiled RS fill. If the chip has
more than 1 pixel pipe the size of the TS surface needs to be
aligned so that each pipe address matches a tile start, otherwise
the RS will hang.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
The TS is only valid after it has been initialized by a fast
clear, so it should not be taken into account when blitting
resources that haven't been cleared. Also the blit itself
invalidates the destination TS, as it's not updated and will
retain data from the previous rendering after the blit.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
The devil is in the shader again, otherwise this is
fairly straightforward.
The CTS contains no pipeline statistics copy to buffer
testcases, so I did a basic smoketest.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
For using them with both occlusion and pipeline statistics queries.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The buffer sizes are specified just a few lines earlier, so don't
repeat ourselves.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Use the new occlusion query copy shader.
We don't use the shader for the waiting as a polling loop ineracts badly
with having caching enabled. I noticed on my GPU (Tonga) that the values
are written out in order, so I just use a WAIT_REG_MEM on the last value.
If it turns out other chips don't do that we may need to look a bit more
into this. Having 8 WAIT_REG_MEM packets per query doesn't sound ideal.
This also restricts the availability word in the pool to timestamp queries
only, as occlusion queries don't use it, and pipeline statistic queries
likely won't either.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Adds a shader for writing occlusion query results to a buffer, as the
CP packet isn't support on SI or secondary buffers, and doesn't handle
the availability bit (or partial results) nor truncation to 32-bit.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
If the buffer is being used, we should wait for those uses to be
complete before returning the map.
Fixes: GL45-CTS.direct_state_access.buffers_functional
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
We currently don't pass the low byte of the address via the surface
info, so in order to work with images, these have to implicitly be
aligned to 256. The proprietary driver also doesn't go out of its way to
provide lower alignment.
Fixes GL45-CTS.texture_buffer.texture_buffer_texture_buffer_range
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This restores the performance warnings removed in:
i965: Drop brw_bo_map[_gtt] wrappers which issue perf warnings.
but adds them for nearly all BO mapping, and also for wait_rendering.
Because we add this to the core bufmgr, we automatically get stall
warnings in all callers, unlike before where only a few callsites used
the wrappers that gave stall warnings.
We also do it a bit differently: we simply measure how long set_domain
takes (the part that stalls), and complain if it's more than 0.01 ms.
We don't bother calling brw_bo_busy(), and we don't measure the mmap
time (which doesn't stall). This should be more accurate.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In theory gcc is free to re-load them, and if a concurrent
execbuf races and updates bo->offset64 then we have a problem:
execbuffer api requires that the ->presumed_offset and the one
we used for the reloc matches. It does not require that the value
is sensible, which means no locks needed, just a consistent load.
Ken said his next series will nuke this, so just hand-roll the
kernel's READ_ONCE idea inline.
FIXME: Most callers of brw_emit_reloc recompute the relocation
themselves, which means this doesn't really fix the race. But the long
term plan is to move to per-context relocation handling, which will
fix this all properly. So leave this for now as just a reminder.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was done because the kernel has 1 global address space, shared
with all render clients, for gtt mmap offsets, and that address space
was only 32bit on 32bit kernels.
This was fixed in
commit 440fd5283a87345cdd4237bdf45fb01130ea0056
Author: Thierry Reding <treding@nvidia.com>
Date: Fri Jan 23 09:05:06 2015 +0100
drm/mm: Support 4 GiB and larger ranges
which shipped in 4.0. Of course you still want to limit the bo cache
to a reasonable size on 32bit apps to avoid ENOMEM, but that's better
solved by tuning the cache a bit. On 64bit, this was never an issue.
On top, mesa never set this, so it's all dead code. Collect an trash it.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
is_reusable was needed by uxa because it couldn't keep track of its
scanout buffers and used this as a proxy. Disabling reuse is a silly
idea, we set this once at start. Remove both.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Iirc this was used by uxa for persistent mmpas of the frontbuffer. For
mesa all the set_domain stuff needed before a synchronized mmap is handled
within the bufmgr, so no reason ever to call this.
Inline the implementation into its only internal user.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Entirely unused, and really shouldn't be used. The alloc functions already
take care of this. And even in a future where we're not going to
h/v-align tiled buffers in the bufmgr, but only in isl, I think we
still want to adjust the tiling mode in the bufmgr, since that ties in
closely to mmaps and stuff like that.
get_tiling is still needed for the import paths (until we have modifiers
everywhere).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
indent -i3 -nut -br -brs -npcs -ce --no-tabs -Tuint32_t -Tuint64_t
plus some manual fixes because those aren't quite the right settings.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The bacon is all gone.
This renames both the class and the related functions. We're about to
run indent on the bufmgr code, so no need to worry about fixing bad
indentation.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The stupid reason for eliminating these functions is that I'm about
to rename drm_bacon_bo_map() to brw_bo_map(), which makes the real
function have the short name, rather than the wrapper.
I'm also planning on reworking our mapping code soon, so we use WC
mappings and proper unsynchronized mappings on non-LLC platforms.
It will be easier to do that without thinking about the stall
warnings and wrappers.
My eventual hope is to put the performance warnings in the BO map
function itself, so all callers gain the warning.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
drm_bacon_context is a malloc'd struct containing a uint32_t context ID
and a pointer back to the bufmgr. The bufmgr pointer is pretty useless,
as everybody already has brw->bufmgr. At that point...we may as well
just use the ctx_id handle directly. A number of places already had to
call drm_bacon_gem_context_get_id() to extract the ID anyway. Now they
just have it.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
We're going to get rid of drm_bacon_context shortly, so we'd have to
change the interface slightly. It's basically just an ioctl wrapper
that isn't terribly bufmgr-related, so We may as well just combine it
with the code in brw_reset.c that actually uses it.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The only difference is that it takes an explicit bufmgr rather than
using bo->bufmgr, but there is only one bufmgr per screen so they
should be identical anyway.
Chris says this was added primarly to avoid bo/bo_gem casting,
which was inconvenient.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The separate class gives us a bit of extra encapsulation, but I don't
know that it's really worth the boilerplate. I think we can reasonably
expect the rest of the driver to be responsible.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
These fields are the same value. In the bad old days, bo->handle could
have been an identifier from the pre-GEM fake bufmgr, but that's long
gone. Keep the "gem_handle" name for clarity.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The execbuf2 kernel API requires us to construct two kinds of lists.
First is a "validation list" (struct drm_i915_gem_exec_object2[])
containing each BO referenced by the batch. (The batch buffer itself
must be the last entry in this list.) Each validation list entry
contains a pointer to the second kind of list: a relocation list.
The relocation list contains information about pointers to BOs that
the kernel may need to patch up if it relocates objects within the VMA.
This is a very general mechanism, allowing every BO to contain pointers
to other BOs. libdrm_intel models this by giving each drm_intel_bo a
list of relocations to other BOs. Together, these form "reloc trees".
Processing relocations involves a depth-first-search of the relocation
trees, starting from the batch buffer. Care has to be taken not to
double-visit buffers. Creating the validation list has to be deferred
until the last minute, after all relocations are emitted, so we have the
full tree present. Calculating the amount of aperture space required to
pin those BOs also involves tree walking, which is expensive, so libdrm
has hacks to try and perform less expensive estimates.
For some reason, it also stored the validation list in the global
(per-screen) bufmgr structure, rather than as an local variable in the
execbuffer function, requiring locking for no good reason.
It also assumed that the batch would probably contain a relocation
every 2 DWords - which is absurdly high - and simply aborted if there
were more relocations than the max. This meant the first relocation
from a BO would allocate 180kB of data structures!
This is way too complicated for our needs. i965 only emits relocations
from the batchbuffer - all GPU commands and state such as SURFACE_STATE
live in the batch BO. No other buffer uses relocations. This means we
can have a single relocation list for the batchbuffer. We can add a BO
to the validation list (set) the first time we emit a relocation to it.
We can easily keep a running tally of the aperture space required for
that list by adding the BO size when we add it to the validation list.
This patch overhauls the relocation system to do exactly that. There
are many nice benefits:
- We have a flat relocation list instead of trees.
- We can produce the validation list up front.
- We can allocate smaller arrays and dynamically grow them.
- Aperture space checks are now (a + b <= c) instead of a tree walk.
- brw_batch_references() is a trivial validation list walk.
It should be straightforward to make it O(1) in the future.
- We don't need to bloat each drm_bacon_bo with 32B of reloc data.
- We don't need to lock in execbuffer, as the data structures are
context-local, and not per-screen.
- Significantly less code and a better match for what we're doing.
- The simpler system should make it easier to take advantage of
I915_EXEC_NO_RELOC in a future patch.
Improves performance in Synmark 7.0's OglBatch7:
- Skylake GT4e: 12.1499% +/- 2.29531% (n=130)
- Apollolake: 3.89245% +/- 0.598945% (n=35)
Improves performance in GFXBench4's gl_driver2 test:
- Skylake GT4e: 3.18616% +/- 0.867791% (n=229)
- Apollolake: 4.1776% +/- 0.240847% (n=120)
v2: Feedback from Chris Wilson:
- Omit explicit zero initializers for garbage execbuf fields.
- Use .rsvd1 = ctx_id rather than i915_execbuffer2_set_context_id
- Drop unnecessary fencing assertions.
- Only use _WR variant of execbuf ioctl when necessary.
- Shrink the arrays to be smaller by default.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I'm about to rewrite how relocation handling works, at which point
drm_bacon_bo_emit_reloc() and drm_bacon_bo_mrb_exec() won't exist
anymore. This code is already largely not using the batchbuffer
infrastructure, so just go all the way and handle relocations, the
validation list, and execbuffer ourselves. That way, we don't have
to think the weird case where we only have a screen, and no context,
when redesigning the relocation handling.
v2: Write reloc.presumed_offset + reloc.delta into the batch, rather
than duplicating the comment, so it's obvious that they match
(suggested by Chris). Also add a comment about why we don't do
any error checking.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This is the threshold after which drm_intel_bufmgr_check_aperture_space
returns -ENOSPC, signalling that it thinks an execbuf is likely to fail
and we need to roll back and flush the batch.
We'll need this when we rewrite aperture space checking, shortly.
In the meantime, we can also use it in GLX_MESA_query_renderer.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
I'm about to make brw_emit_reloc do actual work, so everybody needs
to start using it and not the raw drm_bacon function.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This renames intel_batchbuffer_reloc to brw_emit_reloc and changes the
parameter naming and ordering to match drm_intel_bo_emit_reloc().
For now, it's a trivial wrapper that accesses batch->bo. When we
rework relocations, it will start doing actual work.
target_offset should be expanded to a uint64_t to match the kernel,
but for now we leave it as its original 32-bit type.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This is only useful when doing an incoherent CPU mapping of the current
scanout buffer. That's a terrible plan, so we never do it. We always
use an uncached GTT map.
So, this is useless. Drop the code.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This functionality was added by libdrm commit
743af59669386cb6e063fa4bd85f0a0b2da86295 (intel: make bufmgr_gem
shareable from different API) in an attempt to solve libva/mesa buffer
sharing problems. Specifically, this was working around an issue hit
by Chromium, which used the same drm_fd for multiple APIs, and shared
buffers between them.
This code attempted to work around that issue by using the same bufmgr
for both libva and Mesa. It worked because libdrm_intel was loaded by
both libraries. However, now that Mesa has forked, we don't have a
common library, and this code cannot work.
The correct solution is to have each API open its own file descriptor
(and get a corresponding buffer manager), and then use PRIME export
and import to share BOs across those APIs. Then the kernel can manage
those shared resources. According to Chris, the kernel will pass back
the same handle for a prime FD if the lookup is from the same device FD.
We believe Chromium has since moved to this model.
In Mesa, there is already only one screen per FD, and so there will
only be one bufmgr per FD. We don't need any of this code.
v2: Add a big warning comment written by Chris Wilson.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
No performance data has been gathered about this choice. I just don't
want that many hash tables. Chris points out that this is not
performance critical - we should not be recreating that many handles
from scratch. In the past we used a linear list, which became
unreasonable in stress tests that used hundreds of thousands of BOs.
In real usage, it shouldn't matter that much.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Mesa doesn't use this yet. We'll almost certainly want to, but we can
add the functionality back after we clean up the messy drm code.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
We may want this eventually, but simplify for now. We can add it back
later when we actually intend to use it.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
We'll want userptr support for GL_AMD_pinned_memory support someday,
and possibly some other upload optimizations. Chris says "not in this
form" though. Drop it and simplify for now - we can add it back later
when we're ready to hook it up fully.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This is basically handholding to prevent a bogus caller from trying to
execbuffer on a bogus engine. i965 already does this correctly.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This moves the PCI ID detection to intel_screen.c and makes
drm_bacon_bufmgr_gem_init() take a devinfo pointer.
We also drop the HAS_LLC query stuff - devinfo has that info already,
without kernel queries, and it makes no sense to have two has_llc flags
set by different mechanisms.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The wait-ioctl was introduced in kernel v3.6 (20120930) and that is our
current minimum requirement for screen creation.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
The distinction was required when the bufmgr was virtualised, now there
is only one class, we no longer need the distraction of pretending it is
a subclass.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This moves us one step closer to killing off intel_bufmgr_priv.h.
We might want to nuke it altogether, since it's basically just a
uint32_t handle, but for now, let's focus on removing files.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
libdrm_bacon used to have a GEM-based bufmgr and a legacy fake bufmgr,
but that's long since dead (and we never imported it to i965). So,
drop the extra layer of function pointers.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Eliminates some API around this, and more importantly, the last
field in one bufmgr class.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Replace the duplicated macros imported from libdrm:
ARRAY_SIZE, MAX2, ALIGN, STATIC_ASSERT
and remove unused ROUND_UP_TO and ROUND_UP_TO_MB.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
ROUND_UP_TO handles a NPOT alignment, but all the alignments we use
are power of two anyway, so there's no need.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Since gen4, we do not use fence registers for any GPU access and so
never have to account for the fence during batch construction. All the
related fence functions are unused.
Based on Kristian Høgsberg's patch; commit message by Chris Wilson.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Mesa doesn't use these functions or macros, so we can delete them,
and save work refactoring and cleaning them up. We'll delete a lot
more later, too.
Based on a patch by Kristian Høgsberg.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Drop xf86atomic.h in favor of Mesa's util/u_atomic.h. We replace the
atomic_t wrapper struct with a bare integer, switch to the 'p_atomic'
naming conventions, and move over the one extra helper.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Using drm_intel_* as a prefix is hazardous - we don't want to conflict
with the actual libdrm_intel symbols. In particular, I think we could
get into trouble during the final megadrivers linking.
So, rename everything to an different yet arbitrary prefix. bacon and
intel are the same number of characters, so we don't have to reindent
the world. It's also an homage to Ian's "Bacon Trail" platform.
I was going to use "drm_relic" to poke fun at libdrm being ancient,
and so we could explain the name with a "historical reasons" pun,
but it sounds too much like ralloc.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
i965 doesn't use drm_intel_get_aperture_sizes(), so we can delete
support for it. This avoids a build dependency on libpciaccess.
Chris also notes:
"There's a really old bug that hopefully has been closed already
(although as far as I can tell, it has never been fixed) about
how using libpciaccess from libdrm_intel breaks the world (since
libpciaccess uses a singleton that is torn down at the first request
rather than upon the last user)."
This bug should go away in two commits when we switch over to our
internal copy of libdrm_intel.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84325
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
typeof doesn't seem to exist, so this won't compile (but we don't yet
try). Define it to __typeof__. This code is going to die soon anyway.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
We never imported any of this code, so drop the prototypes, unused
enums, and defines.
Based on patches by Emil Velikov.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This imports commit 19c4cfc54918d361f2535aec16650e9f0be667cd of
libdrm/intel/*.[ch], minus a few files that we're never going to use
(and would immediately delete), plus a few necessary dependencies.
We rename intel_bufmgr.h to brw_bufmgr.h to avoid #include conflicts.
We also fix UTF-8 symbol problems in intel_bufmgr_gem.c comments
because vim keeps trying to fix that every time I edit the file,
and we may as well fix it right away.
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Using an incoherent CPU map on the active scanout buffer is really
sketchy - we may need extra flushing via GEM_SW_FINISH, or using
drmModeDirtyFB() and kernel commit a6a7cc4b7db6d (4.10+).
Chris suggests "never ever do that", which seems like a wise plan!
intel_miptree_map_raw() uses CPU maps on linear buffers.
Having a linear scanout buffer should be really rare, and mapping the
front buffer should be similarly rare. Together, it should basically
never happen. But, in case it does somehow...make sure that mapping
the scanout buffer always goes through an uncached GTT map.
v2: Add a giant comment written by Chris Wilson.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
On modern systems with 4GB apertures, the size in bytes is 4294967296,
or (1ull << 32). The kernel gives us the aperture size as a __u64,
which works out great.
Unfortunately, libdrm "helpfully" returns the data as a size_t, which
on 32-bit systems means it truncates the aperture size to 0 bytes.
We've happily reported this value as 0 MB of video memory via
GLX_MESA_query_renderer since it was originally exposed.
This patch bypasses libdrm and calls the ioctl ourselves so we can
use a proper uint64_t, avoiding the 32-bit integer overflow. We now
report a proper video memory size on 32-bit systems.
Chris points out that the aperture size (CPU mappable size limit)
isn't really the right thing to be checking. But libdrm_intel uses
it to fail execbuffer, so it is an actual limit for now. Once that's
fixed we can probably move to something else. In the meantime, fix
the obvious typecasting bug.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Only the Radeon kernel driver exposed the GPU temperature and
the shader/memory clocks, this implements the same functionality
for the AMDGPU kernel driver.
These queries will return 0 if the DRM version is less than 3.10,
I don't explicitely check the version here because the query
codepath is already a bit messy.
v2: - rebase on top of master
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The idea is taken from radeonsi. The code mostly was already checking for null
pixel shader, so little checks had to be added.
Interestingly, acc. to testing with GTAⅣ, though binding of null shader happens
a lot at the start (then just stops), but draw_vbo() never actually sees null
ps.
v2: added a check I missed because of a macros using a prefix to choose
a shader.
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Taken from radeonsi, required to remove dummy pixel shader in the next patch
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The idea is taken from radeonsi. The code lacks some checks for null vs,
and I'm unsure about some changes against that, so I left it in place.
Some statistics for GTAⅣ:
Average tesselation bind skip per frame: ≈350
Average geometric shaders bind skip per frame: ≈260
Skip of binding vertex ones occurs rarely enough to not get into per-frame
counter at all, so I just gonna say: it happens.
v2: I've occasionally removed an empty line, don't do this.
v3: return a check for null tes and gs back, while I haven't figured out
the way to move stride assignment to r600_update_derived_state() (as it
is in radeonsi).
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
We're about to replace blorp's emit code with ISL and it emits them in
the other order. This makes diffing the aubs easier.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Since the inclusion in 7f160efcde
the header used x_biased, while the implementation used y_biased.
This changes the header to macth the implementation since the
uses of the function seems to expect y_biased.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.
Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Bartosz Tomczyk <bartosz.tomczyk86@gmail.com>
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.
Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.
Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.
Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.
Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.
Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This avoids locking in the reference calls and fixes a leak after the
RefCount initialisation was change from 0 to 1.
Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This will be used to take ownership of freashly created renderbuffers,
avoiding the need to call the reference function which requires
locking.
V2: dereference any existing fb attachments and actually attach the
new rb.
v3: split out validation and attachment type/complete setting into
a shared static function.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Bartosz Tomczyk <bartosz.tomczyk86@gmail.com>
The nv50 ir is scalar. Perhaps this was from some early attempts to
integrate the simd aspects of nv30. However at this point it's entirely
unused.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The API/entry point in mesa already checks the correct behavior,
however, it's possible to be handled by another implementation and those
implementations should not be able to abuse a weird combination of count
and pointer.
This fixes CID 1403193
Cc: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Along the way, add missing GL_ONE source support and drop non-existing
GL_ZERO and GL_ONE operand support.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Instead of computing it once again using _mesa_tex_target_to_index.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Change it into filter_fp_input_mask transform function that instead of
returning a mask, transforms input.
Also, simplify the case of vertex program handling by assuming that
fp_inputs is always a combination of VARYING_BIT_COL* and VARYING_BIT_TEX*.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Its only usage is easily replaced by nr_enabled_units. As for cache key
part, unit[i].enabled should be enough.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Since fixed-function shaders are restricted to MAX_TEXTURE_COORD_UNITS
texture units, use this constant instead of MAX_TEXTURE_UNITS. This
reduces the array size from 32 to 8.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This gets rid of one piece of ugliness with the way ISL handles surface
emitting surface states. I've never liked that hand-rolled table but it
was the best we had at the time.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The helper block is extremely general. It takes an string property name
and an object that supports three methods: has_prop, iter_prop, and
get_prop. This way we can easily generalize it to emit more different
types of getter functions.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We tend to try to reduce the number of allocation calls the Vulkan
driver uses by doing a single allocation whenever possible for a data
structure. While this has certain downsides (usually code complexity),
it does mean error handling and cleanup is much easier. This commit
adds a nice little helper struct for getting rid of some of that
complexity.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
We only need to update it if something changes. Also
_mesa_bind_vertex_buffer() will update the mask when binding to a
NULL or default buffer so no need to do that update here.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
radeonsi added stricter checking for correct swizzles in debug builds.
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes: 4cf2942777 ("radeonsi: support 64-bit system values")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In preparation for enabling MSAA in OpenSWR, the state trackers need to
be aware of multisample pixel formats for software renderers. This patch
allows glx-xlib to query the renderer for support of pixel
formats with multisample, and create multisample resources.
This change is benign to softpipe and llvmpipe, as is_format_supported
returns FALSE for any sample_count > 1. OpenSWR does the same at the
moment, but that will change soon.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
With this patch, we will specify the current context
when we invalidate the surface before the surface is
put back to the recycled surface pool. This allows the
winsys layer to use the specified context to do the
invalidation rather than using the last context that
referenced the surface. This prevents race condition if
the last referenced context is now made current in another thread.
Tested with MTT glretrace, NobelClinicianViewer.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
If two contexts wanted to access the same buffer at the same time, it would
end up on two validation lists simultaneously, which might cause a
PIPE_ERROR_RETRY when trying to validate it from one context while the other
context already had it validated but not yet fenced.
In that situation we could spin until the error goes away, or apply various
more or less expensive locking schemes to save cpu.
Here we use a scheme that briefly locks after fencing but avoids locking on
validation in the non-contended case.
v2:
Make sure we broadcast not only on releasing buffers after fencing, but also
after releasing buffers in the pb_validate_validate error path.
v3:
Don't broadcast on PIPE_ERROR_RETRY because that would increase the chance
of starvation.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
3D wasn't officially supported before virtual HW version 8 so we can
remove this old code.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Currently, surface propagation for colliding render target resource is
done at framebuffer emit time for vgpu10. This patch
adds the surface propagation for non-vgpu10 path to emit_fb_vgpu9()
and removes the redundant surface copy at set time.
Tested with MTT glretrace, piglit, NobelClinicianViewer, Turbine, Cinebench.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
The zslice index to svga_texture_copy_handle_resource() is not adjusted
and should be a signed integer.
This patch fixes piglit tests for non-vgpu10 including
spec@arb_framebuffer_object@fbo-generatemipmap-3d
spec@glsl-1.20@execution@tex-miplevel-selection gl2:texture* 3d
Tested with MTT piglit and glretrace
This implementation is based on querying the time just before swap/present
and doing a Sleep() if needed. There is no sync to vblank or actual
coordination with the GPU. This isn't perfect, but basically works.
We've had some request for this functionality, and it sounds like there
are some Windows GL apps that refuse to start if the driver doesn't
advertise this extension.
Note: NVIDIA's Windows OpenGL driver advertises the WGL_EXT_swap_control
string both with wglGetExtensionsStringEXT() and with
glGetString(GL_EXTENSIONS). We're only advertising it with the former at
this time.
Tested with asst. Mesa demos, Google Earth, Lightsmark, etc.
VMware bug 1591534.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
When a backing surface is reused, it is possible that
the original surface has been changed. So before the backing surface
is bound again, we need to sync up the surface.
This patch creates a new helper function svga_texture_copy_handle_resource()
to sync up the backing surface resource.
This patch, together with the backing surface dirty bit fix, fixes
the rendering corruption in NobelClinicianViewer when rotating the model.
Also tested with MTT glretrace, piglit, Cinebench, Turbine.
Reviewed-by: Brian Paul <brianp@vmware.com>
The reset flag specifies if the dirty bit needs to be reset
after the surface is propagated to the texture. This is used
to make sure that the dirty bit is not reset and stay unset
before the surface is unbound.
Reviewed-by: Brian Paul <brianp@vmware.com>
The new has_backed_views flag specifies if any of the render target
views or depth stencil view is a backing surface view.
The flag is used in svga_propagate_rendertargets() so it can return early
if there is no surface to propagate.
Reviewed-by: Brian Paul <brianp@vmware.com>
A texture can be destroyed from a different context from which it is
created, but destroying the render target view from a different context
will cause svga device errors. Similar to shader resource view,
this patch skips destroying render target view or depth stencil view
from a non-parent context.
Fixes driver errors running NobelClinician Viewer application.
Tested with NobelClinician Viewer, MTT piglit, glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
With this patch, rasterization will be disabled if the
rasterizer_discard flag is set or the fragment shader
is undefined due to missing position output from the
vertex/geometry shader.
Tested with piglit test glsl-1.50-geometry-primitive-id-restart.
Also tested with full MTT glretrace and piglit.
v2: As suggested by Roland, to properly disable rasterization, besides
setting FS to NULL, we will also need to disable depth and stencil test.
v3: As suggested by Brian, set SVGA_NEW_DEPTH_STENCIL_ALPHA dirty bit
in svga_bind_rasterizer_state() if the rasterizer_discard flag is
changed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Emulating wide points in geometry shader when doing transform feedback
is problematic. This patch disables the emulation.
Tested with piglit test ext_transform_feedback-points.
Also tested with MTT glretrace, mesa demos pointblast and spriteblast.
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit b2c97bc789 which made us start
using a busy-wait for individual query results also messed up cache
flushing on !LLC platforms. For one thing, I forgot the mfence after
the clflush so memory access wasn't properly getting fenced. More
importantly, however, was that we were clflushing the whole query range
and then waiting for individual queries and then trying to read the
results without clflushing again. Getting the clflushing both correct
and efficient is very subtle and painful. Instead, let's side-step the
problem by just snooping.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
To work with addr2line.sh we also need the relative offset within the
DSO. And addr2line.sh gets confused by the leading stackframe number.
Signed-off-by: Rob Clark <robdclark@gmail.com>
We don't need to call _mesa_reference_renderbuffer() for the first
assignment as refCount starts at 1. For swrast we work around the
fact we will indirectly call _mesa_reference_renderbuffer() by
resetting refCount to 0.
Fixes: 32141e53d1 (mesa: tidy up renderbuffer RefCount initialisation)
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
On Broadwell we still need to do a resolve between the subpass
that writes and the subpass that reads when there is a
self-dependency because HW could not see fast-clears and works
on the render cache as if there was regular non-fast-clear surface.
Fixes 16 tests on BDW:
dEQP-VK.renderpass.formats.*.input.clear.store.self_dep*
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This appears to be a leftover from an earlier version of this function.
Nothing is emitted into the CS.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
All offsets and strides are precomputed by
radv_CreateDescriptorUpdateTemplateKHR and stored in the template.
v2: Move the new struct declarations from radv_descriptor_set.h
to radv_private.h (Bas)
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Replace the !binding_layout->immutable_samplers assertion in
radv_update_descriptor_sets with a conditional.
The Vulkan specification does not say that it is illegal to update
a sampler descriptor when it is immutable; only that pImageInfo is
ignored.
This change is also needed for push descriptors, because valid
descriptors must be pushed for all bindings accessed by shaders,
including immutable sampler descriptors.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Move the implementation into a separate function that takes a
cmd_buffer and a dstSetOverride parameter.
When cmd_buffer is not NULL, radv_update_descriptor_sets calls
cs_add_buffer directly instead of updating the buffer list.
This will be used to implement VK_KHR_push_descriptor.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Commit f938354362 recently increased the
alignment on vertex buffer data from 32 to 64. This caused us to
consume a bit more batch than we were before and we now go over the
estimate by a small amount on certain blits on gen8+. This commit bumps
then gen8 batch estimate by a bit to compensate. Haswell and older
still seems to be well within the limit.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100582
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Decoding with aubinator encountered a command of 0xffffffff. With the
previous code, it caused aubinator to jump 255 + 2 dwords to start
decoding again.
Instead we can attempt to detect the known instruction formats. If the
format is not recognized, then we can advance just 1 dword.
v2:
* Update aubinator_error_decode
* Actually convert the length variable returned into a *signed* integer
in aubinator.c, intel_batchbuffer.c and aubinator_error_decode.c.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The system value only has an X component, and radeonsi started
checking that in debug builds.
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes: 4cf2942777 ("radeonsi: support 64-bit system values")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Before, we were just looking at whether or not the user wanted us to
wait and waiting on the BO. Some clients, such as the Serious engine,
use a single query pool for hundreds of individual query results where
the writes for those queries may be split across several command
buffers. In this scenario, the individual query we're looking for may
become available long before the BO is idle so waiting on the query pool
BO to be finished is wasteful. This commit makes us instead busy-loop on
each query until it's available.
This significantly reduces pipeline bubbles and improves performance of
The Talos Principle on medium settings (where the GPU isn't overloaded
with drawing) by around 20% on my SkyLake gt4.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
mako is already a mesa build requirement, extra copy not needed.
Tested building against mesa build baseline (mako-0.8.0).
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This avoids validation and looking up the buffer target for a second time.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This allows internal users to pass buffer objects directly and
allows for KHR_no_error support to be more easily added.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Possibly more efficient, either way it makes the code easier to
follow.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
42aaa548 changed the renderbuffer initialisation of RefCount from
1 to 0.
This is inconsitent with how we use RefCount elsewhere. Also every
driver implementation of NewRenderbuffer() calls
_mesa_init_renderbuffer() so its safe to set it there.
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit 658568941d.
With the help of shader variants we can render to rb-swapped
formats now. Fixes about 60 piglits.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
If we render to rb swapped format we will create a shader variant doing
the involved swizzing in the pixel shader.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
If shader-db run, create a standard variant immediately
(as otherwise nothing will trigger the shader to be
actually compiled).
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
In the long run the compiler needs to know the specifc variant
'key' in order to compile appropriate assembly. With this commit
the variant knows its shader and we are able pass the preallocated
variant into etna_compile_shader(..). This saves us from passing
extra ptrs everywhere.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
This commit adds some basic infrastructure to handle shader
variants. We are still creating exactly one shader variant
for each shader.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Introduce stubs to anv_gem_stub.c that match the anv_gem.c ones.
Otherwise we may get link-time errors, when building the tests.
v2: Introduce all the missing stubs at once.
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Vinson Lee <vlee@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100574
Fixes: c964f0e485 ("anv: Query the kernel for reset status")
Fixes: 651ec926fc ("anv: Add support for 48-bit addresses")
Fixes: 060a6434ec ("anv: Advertise larger heap sizes")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
---
I've intentionally kept the order the same identical to the anv_gem.c.
This way we can easily grep & diff in the future ;-)
Require LLVM 5.0 or later because LLVM 4.0 is easily fooled into
putting the lane select of llvm.amdgcn.readlane into a VGPR and then
fails to continue to compile.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Notably, llvm.amdgcn.readfirstlane and llvm.amdgcn.icmp may be hoisted
out of loops or if/else branches in cases like
if (cond) {
v = readFirstInvocationARB(x);
... use v ...
} else {
v = readFirstInvocationARB(x);
... use v ...
}
===>
v = readFirstInvocationARB(x);
if (cond) {
... use v ...
} else {
... use v ...
}
The optimization barrier is a heavy hammer to stop that until LLVM
is taught the semantics of the intrinsic properly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
LLVM will lift inline assembly out of if-else-blocks if both paths have
the same inline assembly. Prevent this by adding an irrelevant unique
text to the assembly.
This requires the LLVM assembly parser to be initialized.
Furthermore, allow forcing subsequent computations to happen after the
optimization barrier by defining a data dependency.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For simplicitly, always store system values as 32-bit values or arrays
of 32-bit values. 64-bit values are unpacked and packed accordingly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2 (Nicolai):
- BALLOT isn't per-channel
- expand the documentation (also for VOTE_*)
v3:
- only BALLOT returns a 64-bit lanemask (Boyan)
- relax the requirement on READ_INVOC: the invocation number to read
from must be uniform within a sub-group. This matches the
GL_ARB_shader_ballot spect (and the v_readlane instruction of AMD
GCN)
v4:
- hopefully really fix the doc of VOTE_* returns (Ilia)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
VM faults cannot be disabled for SDMA on <= VI.
We could still use SDMA by asking the winsys about which parts of the
buffers are committed. This is left as a potential future improvement.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We never add fences to backing buffers during submit. When we free a
backing buffer, it must inherit the sparse buffer's fences, so that it
doesn't get re-used prematurely via the cache.
v2:
- remove pipe_mutex_*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
... and implement the corresponding fence handling.
v2:
- add missing bit in amdgpu_bo_is_referenced_by_cs_with_usage
- remove pipe_mutex_*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is the bulk of the buffer allocation logic. It is fairly simple and
stupid. We'll probably want to use e.g. interval trees at some point to
keep track of commitments, but Mesa doesn't have an implementation of those
yet.
v2:
- remove pipe_mutex_*
- fix total_backing_pages accounting
- simplify by using the new VA_OP_CLEAR/REPLACE kernel interface
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This probably has only minor performance effects, but it simplifies some
subsequent code slightly.
Ideally, it could also be used to simplify the handling of slab buffers
in the same way, but unfortunately that's not possible as long as we need
indices for relocations.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Instead of just advertising the aperture size, we do something more
intelligent. On systems with a full 48-bit PPGTT, we can address 100%
of the available system RAM from the GPU. In order to keep clients from
burning 100% of your available RAM for graphics resources, we have a
nice little heuristic (which has received exactly zero tuning) to keep
things under a reasonable level of control.
Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
This commit adds support for using the full 48-bit address space on
Broadwell and newer hardware. Thanks to certain limitations, not all
objects can be placed above the 32-bit boundary. In particular, general
and state base address need to live within 32 bits. (See also
Wa32bitGeneralStateOffset and Wa32bitInstructionBaseOffset.) In order
to handle this, we add a supports_48bit_address field to anv_bo and only
set EXEC_OBJECT_SUPPORTS_48B_ADDRESS if that bit is set. We set the bit
for all client-allocated memory objects but leave it false for
driver-allocated objects. While this is more conservative than needed,
all driver allocations should easily fit in the first 32 bits of address
space and keeps things simple because we don't have to think about
whether or not any given one of our allocation data structures will be
used in a 48-bit-unsafe way.
Reviewed-by: Kristian H. Kristensen <krh@bitplanet.net>
This fixes issues seen when adding support for full 48-bit addresses.
The 48-bit addresses themselves have nothing to do with it other than
that it caused the kernel to place buffers slightly differently so they
interacted differently with the caches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
When a client causes a GPU hang (or experiences issues due to a hang in
another client) we want to let it know as soon as possible. In
particular, if it submits work with a fence and calls vkWaitForFences or
vkQueueQaitIdle and it returns VK_SUCCESS, then the client should be
able to trust the results of that rendering. In order to provide this
guarantee, we have to ask the kernel for context status in a few key
locations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's possible that the device could have been lost while we were
waiting. We should let the user know if this has happened.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When the shader does not set one of these values, they are supposed to
get a default value of 0. We have hardware bits in 3DSTATE_CLIP for
this but haven't been setting them. This fixes the intermittent failure
of dEQP-VK.geometry.layered.3d.render_to_default_layer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
We already provide a default LOD for textureQueryLevels and texture() on
non-fragment stages. However, there are more cases where one is needed
such as textureSize(gsampler2DMS*) in SPIR-V. Instead of trying to list
out all of the cases one at a time, just provide the default for all TXS
and TXL operations. This fixes a shader validation error in the new
Sascha deferredmultisampling demo which uses textureSize(gsampler2DMS).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100391
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
This patch makes glCopyImageSubData require mipmap completeness when the
texture object's built-in sampler object has a mipmapping MinFilter.
Fixes (on i965):
dEQP-GLES31.functional.debug.negative_coverage.*.buffer.copy_image_sub_data
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Fix linking error.
CXXLD libGL.la
../../../../src/gallium/auxiliary/.libs/libgallium.a(u_debug_stack.o): In function `debug_backtrace_capture':
src/gallium/auxiliary/util/u_debug_stack.c:59: undefined reference to `_Ux86_64_getcontext'
src/gallium/auxiliary/util/u_debug_stack.c:60: undefined reference to `_ULx86_64_init_local'
src/gallium/auxiliary/util/u_debug_stack.c:62: undefined reference to `_ULx86_64_step'
src/gallium/auxiliary/util/u_debug_stack.c:71: undefined reference to `_ULx86_64_get_proc_info'
src/gallium/auxiliary/util/u_debug_stack.c:73: undefined reference to `_ULx86_64_get_proc_name'
src/gallium/auxiliary/util/u_debug_stack.c:65: undefined reference to `_ULx86_64_step'
Fixes: 70c272004f ("gallium/util: libunwind support")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100562
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Adding the actual table from the docs makes it clearer exactly what the
restrictions are. In particular, it becomes clear that compressed
textures ignore the alignment parameters in RENDER_SURFACE_STATE.
Reviewed-by: Chad Versace <chadversary@chromium.org>
This fixes the stripes of garbage rendered on the floor of the vehicle
assembly building among other rendering issues. The reason for the
misrendering seems to be that some of the GLSL shaders used by the
application use variables before initializing them, incorrectly
assuming that they will be implicitly set to zero by the
implementation.
Acked-by: Matt Turner <mattst88@gmail.com>
This is pretty much the same tool as what i-g-t has, only with a more
fancy decoding of the instructions/registers. It also doesn't support
anything before gen4.
v2 (from Matt): Drop authors
Remove undefined automake variable
v3: Fix incorrect offsets for dword > 1 (Jordan)
v4: Fix decompression error with large blobs (Jordan)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Specifically, non-line primitives skipped, and defaulting to reset on
each packet.
The skip of non-line primitives saves ≈110 resetting of
PA_SC_LINE_STIPPLE register per frame in Kane&Lynch2.
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Also change gs_output_prim type: unsigned → pipe_prim_type. The idea of
the code is mostly taken from radeonsi. The new code operating on
prev/curr rast_primitives saves ≈15 reloads of PA_SC_LINE_STIPPLE per
frame in Kane&Lynch2
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Note: si_shader.h has also "type" variable that should be changed to
"enum pipe_prim_type", however it triggers a bunch of warnings about
unhandled switches, so due not knowing the correct way to handle them, I
decided to leave it as is.
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
udivmod64 appears in src/compiler/glsl/builtin_int64.h and src/compiler/glsl/udivmod.h
The second file seems unused.
Fix commit 6b03b345eb
This change doesn't affect shader-db.
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Highlights:
- Display needs tiled pitch alignment to be at least 32 pixels
- Implement Addr2ComputeDccAddrFromCoord().
- Macro-pixel packed formats don't support Z swizzle modes
- Pad pitch and base alignment of PRT + TEX1D to 64KB.
- Fix support for multimedia formats
- Fix a case "PRT" entries are not selected on SI.
- Fix wrong upper bits in equations for 3D resource.
- We can't support 2d array slice rotation in gfx8 swizzle pattern
- Set base alignment for PRT + non-xor swizzle mode resource to 64KB.
- Bug workaround for Z16 4x/8x and Z32 2x/4x/8x MSAA depth texture
- Add stereo support
- Optimize swizzle mode selection
- Report pitch and height in pixels for each mip
- Adjust bpp/expandX for format ADDR_FMT_GB_GR/ADDR_FMT_BG_RG
- Correct tcCompatible flag output for mipmap surface
- Other fixes and cleanups
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Recent changes in Makefile.sources merged the aubinator files in
a unique list of generated files and genxml/genX_xml.h is now needed
to avoid the following building error:
ninja: error: '.../genxml/genX_xml.h', needed by '.../genxml/genX_xml.h',
missing and no known rule to make it
build/core/ninja.mk:148: recipe for target 'ninja_wrapper' failed
Fixes: 0f83c05 "intel: genxml: compress all gen files into one"
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We supported more generally. Decreased the dynamic buffers though, as
we only support 16 for uniform+storage.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Address sanitizer reports lot of misaligned access:
SUMMARY: AddressSanitizer: undefined-behavior main/marshal.c:276:31 in
main/marshal.c:276:31: runtime error: load of misaligned address 0x631000104866 for type
'const GLuint' (aka 'const unsigned int'), which requires 4 byte alignment
0x631000104866: note: pointer points here
92 88 00 00 00 00 00 00 4a 03 0c 00 93 88 00 00 00 00 00 00 02 01 0c 00 40 8d 00 00 00 00 00 00
^
SUMMARY: AddressSanitizer: undefined-behavior main/marshal_generated.c:28725:12 in
main/marshal_generated.c:28726:12: runtime error: member access within misaligned address 0x6310003fc874 for type
'struct marshal_cmd_VertexAttribPointer', which requires 8 byte alignment
0x6310003fc874: note: pointer points here
01 00 00 00 7a 02 20 00 00 00 00 00 be be be be be be be be be be be be be be be be be be be be
^
SUMMARY: AddressSanitizer: undefined-behavior main/marshal_generated.c:28726:12 in
main/marshal_generated.c:28726:12: runtime error: store to misaligned address 0x6310003fc87c for type
'GLint' (aka 'int'), which requires 8 byte alignment
0x6310003fc87c: note: pointer points here
00 00 00 00 be be be be be be be be be be be be be be be be be be be be be be be be be be be be
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We want the guardband_x/y to be the largerst scalars such that each
viewport scaled by that amount is still a subrange of [-32767, 32767].
The old code has a couple of issues:
1) It used scissor instead of viewport_scissor, potentially taking into
account a viewport that is too small and therefore selecting a scale
that is too large.
2) Merging the viewports isn't ideal, as for example viewports with
boundaries [0,1] and [1000, 1001] would allow a guardband scale of ~30k,
while their union [0, 1001] only allows a scale of ~32.
The new code just determines the guardband per viewport and takes the minimum.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Just enabling the driver-independent implementation that Jason did.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Interp at sample needs to use the center, since the sample
positions it retrieves are relative to the center.
This fixes a bunch of CTS tests with multisample_interpolation.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The current code was broken, and I decided to redesign it instead.
This puts the sample positions for all samples into the queue
constant descriptor buffer after all the spill/ring descriptors.
It then uses a single offset register to point how far into the
samples the samples for num_samples are. This saves one user sgpr
and means we only generate the sample position data in the rare
single case where we need it currently.
This doesn't fix the failing CTS tests without the followup
fix.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some packets like 3DSTATE_VF_STATISTICS, 3DSTATE_DRAWING_RECTANGLE,
3DPRIMITIVE, PIPELINE_SELECT, etc... have configurable fields in
dword0, we probably want to print those.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The ordering NIR gives us is correct for the hw, this fixes:
dEQP-VK.glsl.texture_functions.texturegrad.* (mainly trigged
on isampler/usampler 3d textures.).
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes:
dEQP-VK.glsl.texture_functions.texture.samplercubearray*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
To silence
C:\Users\Brian\projects\mesa\src\util/u_vector.h(41) : warning C4146: unary
minus operator applied to unsigned type, result still unsigned
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Otherwise, we were getting the definition for 'inline' by chance from
some other preceeding #include.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
There are still some distributions trying to support unfortunate people
with old or exotic CPUs that don't have 64bit atomic operations. When
compiling for such a machine, gcc conveniently inserts a library call to
a helper, but it's implementation is missing and we get a linker error.
This allows us to provide our own implementation, which is marked weak
to prefer a better implementation, should one exist.
v2: changed copyright, some style adjustments
v3: [mattst88] Print results with AC_MSG_CHECKING/AC_MSG_RESULT
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93089
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It's kinda sad that (a) we don't have debug_backtrace support on !X86
and that (b) we re-invent our own crude backtrace support in the first
place. If available, use libunwind instead. The backtrace format is
based on what xserver and weston use, since it is nice not to have to
figure out a different format.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Prep work for next patch.
Ideally 'struct debug_stack_frame' would be opaque, but it is embedded
in a bunch of places. But at least we can treat it opaquely.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Should be used by the state tracker when glGetImageHandleARB()
is called in order to create a pipe_image_view template.
v3: - move the comment to *.c
v2: - make 'st' const
- describe the function
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Call it directly when batch queue is empty. This avoids costly thread
synchronisation. This commit improves performance of games that have
previously regressed with mesa_glthread=true.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We could re-enable it also but I haven't tested that yet, and I'm
not sure we care much anyway.
V2: don't disable it from with the call itself. We need a custom
marshalling function or we get stuck waiting for thread to
finish.
V3: tidy up redundant code copied from generated version.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
All the -Wunused-but-set-variable ones.
Found a way to do it with a oneliner.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
si_state.c: In function ‘si_make_texture_descriptor’:
si_state.c:3240:25: warning: ‘num_format’ may be used uninitialized
si_state.c:3240:12: warning: ‘data_format’ may be used uninitialized
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
1-st is obvious because of assert, 2-nd stolen frmo si_draw_vbo(),
and 3-rd is just a small refactoring.
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
It removes a need to copy whole struct every call for no reason. Comparing
objdump -d output for original and this patch compiled with -O2, shows reduce
of the function by 16 bytes.
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Needed to get rid of a separate struct allocation in the next patch, because
the one in argument is a constant, and don't allow changing its fields.
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Clears can happen before a rast is set, which can in turn cause scissors
and fragprog to be validated. Make sure that we handle this case.
Reported-by: Andrew Randrianasulu <randrianasulu@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
y is vert, x is horiz.
Noticed in visual inspection compared to radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Will be more convenient for bindless because the 64bit handle is
actually the base_ptr of the descriptor (ie. 'list' will be fetched
from TGSI_FILE_CONSTANT/TGSI_FILE_TEMPORARY instead).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
lp_build_emit_fetch() is useful when the source type can be
infered from the instruction opcode.
However, for bindless samplers/images we can't do that easily
because tgsi_opcode_infer_src_type() returns TGSI_TYPE_FLOAT for
TEX instructions, while we need TGSI_TYPE_UNSIGNED64 if the
resource register is bindless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Since 63684a9a ("glsl: Combine many instruction lowering passes
into one.", Thu Nov 18 2010), we no longer have anything called
ir_explog_to_explog2. So it's only confusing to have those
references there.
Update with the appropriate method, so people can grep for it in
the current tree if they encounter it.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
geom was removed in e968975 ("gallium: remove the geom_flags param
from is_format_supported", Tue Mar 8 00:01:58 2011 +0100), but the
documentation of it was left over. Let's bring the documentation up
to date.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
st_finalize_texture always accesses image at face 0, but it may not be
set if we are working with cubemap that had other face set.
This fixes crash in piglit
same-attachment-glFramebufferTexture2D-GL_DEPTH_STENCIL_ATTACHMENT.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Helps Feral-ported games, due to their use of fma()
shader-db changes:
total instructions in shared programs : 3934925 -> 3934327 (-0.02%)
total gprs used in shared programs : 481563 -> 481563 (0.00%)
total local used in shared programs : 27469 -> 27469 (0.00%)
total bytes used in shared programs : 36061888 -> 36056504 (-0.01%)
local gpr inst bytes
helped 0 0 228 228
hurt 0 0 0 0
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: renamed commit
reordered modifiers
add assert(dst == src2)
v3: reordered modifiers again
v5: no rounding bit for limms
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
changes for GpuTest /test=pixmark_piano /benchmark /no_scorebox /msaa=0
/benchmark_duration_ms=60000 /width=1024 /height=640:
score: 1026 -> 1045
changes for shader-db:
total instructions in shared programs : 3943335 -> 3934925 (-0.21%)
total gprs used in shared programs : 481563 -> 481563 (0.00%)
total local used in shared programs : 27469 -> 27469 (0.00%)
total bytes used in shared programs : 36139384 -> 36061888 (-0.21%)
local gpr inst bytes
helped 0 0 3587 3587
hurt 0 0 0 0
v2: removed TODO
reorderd to show changes without RA modification
removed stale debugging print() call
v3: remove predicate checks
enable only for gf100 ISA
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
we might want to add more folding passes here, so make it a bit more generic
v2: leave the comment and reword commit message
v4: rename it to PostRaLoadPropagation
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Helps mainly Feral-ported games, due to their use of fma()
shader-db changes:
total instructions in shared programs : 3941587 -> 3940749 (-0.02%)
total gprs used in shared programs : 481511 -> 481460 (-0.01%)
total local used in shared programs : 27469 -> 27481 (0.04%)
total bytes used in shared programs : 36123344 -> 36115776 (-0.02%)
local gpr inst bytes
helped 2 48 243 243
hurt 2 3 32 32
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
I hit an assert in the emiter while toying around with optimizations, because
ConstantFolding immediated a big int into a mad.
There is special handling for FMA/MAD in insnCanLoad, which is broken. With
this patch the special path should be not hit anymore. Anyway, the constraints
for the LIMMS can't be guarenteed in SSA form and I have patches pending to
use it via a post-SSA optimization pass.
As a result, immediates get immediated for int mad/fmas as well.
changes in shader-db:
total instructions in shared programs : 3943335 -> 3941587 (-0.04%)
total gprs used in shared programs : 481563 -> 481511 (-0.01%)
total local used in shared programs : 27469 -> 27469 (0.00%)
total bytes used in shared programs : 36139384 -> 36123344 (-0.04%)
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
[imirkin: remove extra bit from insnCanLoad as well]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This enables support for the GL_NV_fill_rectangle extension on the
GM200+ for Desktop OpenGL.
Signed-off-by: Lyude <lyude@redhat.com>
Changes since v1:
- Fix commit message
- Add note to reldocs
Changes since v2:
- Remove unnessecary parens in nvc0_screen_get_param()
- Fix sorting in release notes
- Don't execute FILL_RECTANGLE method on pre-GM200+ GPUs
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Changes since v1:
- Add pipe caps for etnaviv, freedreno, swr and virgl
Signed-off-by: Lyude <lyude@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Since we don't have the bits required to support this in OpenGLES yet,
this only enables support for Desktop OpenGL
Signed-off-by: Lyude <lyude@redhat.com>
Changes since v1:
- Simply _mesa_PolygonMode() a little bit
- Fix formatting in OpenGL spec excerpts
- Move polygon mode checking into _mesa_valid_to_render()
Changes since v3:
- Improve error message for invalid drawings with GL_FILL_RECTANGLE_NV
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This enables tessellation shaders and sets some values for
the maximums.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This seems to get lost in the rebases, should fix
the tessellation demos, crash in llvm.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This emits the tessellation shaders and state to the command stream.
It contains the logic to emit the LS/HS shaders.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
So tess shaders have some circular dependencies,
TCS needs the TES primitive mode
TES needs the TCS vertices out
This builds the nir for each shader first to get the
info, executes a tes specific nir pass, then builds
the LLVM shaders.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This ports the code from radeonsi to build the if/endif,
and ports the tess factor emission code. This code has
an optimisation TODO that we can deal with later.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support for the tessellation inputs/outputs to the
shader compiler, this is one of the main pieces of the patch.
It is very similiar to the radeonsi code (post merge we should
consider if there are better sharing opportunities). The main
differences from radeonsi, is that we can have "compact" varyings
for clip/cull/tess factors, and we have to add special handling
for these.
This consists of treating the const index from the deref different
depending on the compactness.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds support for the nir intrinsics that tessellation uses.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This hooks up the tessellation shader info to the nir values
and ctx generated ones.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This calculates the pipeline state for tessellation.
It moves the gs ring calculation down to below
where the tessellation shaders will be compiled,
as it needs the info from those shaders.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This add support for tessellation patch inputs to the code
that finds the unique parameter index.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This ports the VGT_VERTEX_REUSE register settings
for Polaris GPUs from radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just configures all the register inputs for the tessellation
related stages.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just sets up the necessary pointers on the compiler
side for the rings needed for tessellation.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch adds support for the offchip rings for storing
tessellation factors and attribute data.
It includes the register setup for the TF ring
v2: always do tess ring size calcs (Bas)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the tess pieces for shader keys and shader info,
it adds the necessary bits to the vertex key/info as well.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds support for tess to the shader stage conversion
and emits the per-stage descriptors/constants for tess stages.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some versions of MinGW-w64 such as 5.3.1 and 6.2.0 produce bad code
with -O2 or -O3 causing a random driver crash when running programs
that use GLSL. Most Mesa demos in the glsl/ directory trigger the
bug, but not the fragcoord.c test.
Use a #pragma to force -O1 for this file for later MinGW versions.
Luckily, this is basically one-time setup code. I suspect the bug
is related to the sheer size of this file.
This should let us move to newer versions of MinGW-w64 for Mesa.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
To avoid dereferencing a null pointer in case wglMakeCurrent() wasn't
called. Found while debugging SWKOTOR game.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
for threaded gallium, which can't use pipe_context in create_surface
v2: don't add a new decompress helper function
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Fix a bug that was caused by a type mismatch in the shift count between
GLSL and TGSI. I briefly considered adjusting the TGSI semantics, but
since both LLVM and AMD GCN require both arguments to be of the same type,
it makes more sense to keep TGSI as-is -- it reflects the underlying
implementation better.
I'm also sending out piglit tests that expose this error.
v2: use the right number of components for the temporary register
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The commit mentioned below required the __DRI2FlushExtension to have
version 4 or above, for GBM functionality. That broke GBM with some
classic dri drivers. Relax that requirement so that we only flush
after unmap if we have version 4 or above. Drivers that require the flush
for correct functionality should implement the desired version.
Fixes: ba8df228 ("gbm/dri: Flush after unmap")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Dylan Baker <dylan@pnwbakers.com>
This allows us to run 32bit Vulkan apps on Android, ftruncate
call would fail on 2GB (max size being 2GB - 1).
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is to fix following compile error with libmesa_isl:
mesa/src/intel/isl/isl.c:28:10: fatal error: 'genxml/genX_bits.h' file not found
Fixes: f0eaf38 ("genxml: New generated header genX_bits.h (v6)")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emli Velikov <emil.velikov@collabora.com>
This is confusing because is only applys to GL_ARB_vertex/fragment_program,
and because of that its also not very useful.
If someone requires this for debugging they can just make an ad-hoc
code change.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Combining all the files into a single string didn't make any
difference in the size of the aubinator binary.
With this change we now also embed gen4/4.5/5 descriptions, which
increases the aubinator size by ~16Kb.
v2 (Lionel): rebase makefiles
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Vulkan Clipping is defined in terms of vertices, the scissor based
clipping happens on pixels. There is a difference with points and
lines, as a vertex can be outside the viewport while some pixels are in.
On Vulkan thoise pixels shouldn't be drawn, while they would be with
the guardband.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
For opcodes such as the nir_op_pack_64_2x32 for which all sources and
destinations have explicit sizes, the bit_size parameter to the evaluate
function is pointless and *should* do nothing. Previously, we were
always switching on the bit_size and asserting if it isn't one of the
sizes in the list. This generates way more code than needed and is a
bit cruel because it doesn't let us have a bit_size of zero on an ALU op
which shouldn't need a bit_size.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Eric renamed these from dri_bufmgr_* and intel_bufmgr_* to drm_intel_*
in libdrm commit 4b9826408f65976a1a13387beda748b65e03ec52, circa 2008,
but we've been using the legacy names this whole time.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Only common/decoder.[ch] requires it [for intel_aub.h].
v2: The code was moved to from intel/tools to intel/common,
update accordingly.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The only part which requires libdrm_intel tools/aubinator is not built
on Android.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
/home/marek/dev/mesa-main/src/gallium/drivers/softpipe/sp_compute.c:178:
warning: 'grid_size' may be used uninitialized in this function
[-Wmaybe-uninitialized]
/home/marek/dev/mesa-main/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c:3598:
warning: 'level' may be used uninitialized in this function [-Wmaybe-uninitialized]
out1 = lp_build_cmp(&leveli_bld, PIPE_FUNC_GREATER, level, last_level);
^
All tests pass on Fiji now. This prevents DCC disablement due to
incompatible DCC formats due to the fallback.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Create new function to get correct alignment based on Asics, and change
the corresponding decode message buffer and dpb buffer size calculations
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The border color swizzle logic was copied from Vulkan. It doesn't make any
sense to me, but it passes all piglits except the stencil ones.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Both GFX6 and GFX9 fields are printed next to each other in parsed IBs.
The Python script parses both headers like one stream and tries to merge
all definitions.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
the DATA_FORMAT and NUM_FORMAT fields are the same, but some of the enums
differ, thus add GFX6 and GFX9 suffixes, so that the IB parser can show
enums for both.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Add _GFX6 and _GFX9 suffixes to conflicting definitions.
sid.h and gfx9d.h can now be included in the same file.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This resolves trivial conflicts with gfx9d.h caused by different formatting.
Some fields are also renamed.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The usage should be client first call AddrComputeSurfaceInfo() on
depth surface with flag "matchStencilTilecfg", AddrLib will use
2DThin1 tile index for depth as much as possible and do not down grade
unless alignment requirement cannot be met.
1. If there is a matched 2DThin1 tile index for stencil which make
sure they will share same tile config parameters, then return the
stencil 2DThin1 tile index as well.
2. If using 2DThin1 tile mode cannot make sure such thing happen, and
TcCompatible flag was set, then ignore this flag then try 2DThin1 tile
mode for depth and stencil again.
3. If 2DThin1 tile mode cannot make sure depth and stencil to have
same tile config parameters, then down grade depth surface tile mode
to 1DThin1.
4. If depth surface's tile mode was 1DThin1, then return 1DThin1 tile
index for stencil.
5. If depth surface's tile mode is PRT, then return invalid tile index
to stencil since their tile config parameters will never be met.
Client driver then check the returned tile index of stencil -- if it
is not invalid tile index, then call AddrComputeSurfaceInfo() on
stencil surface with the returned stencil tile index to get full
output information. Please note, client needs to set flag
"useTileIndex" when AddrLib get created.
1) minimizePadding - Use 1D tile mode if padded size of 2D is bigger
than 1D
2) maxBaseAlign - Force PRT tile mode if macro block size is bigger than
requested alignment.
Also, related changes to tile mode optimization for needEquation.
1. Add new surface flags needEquation for client driver use to force
the surface tile setting equation compatible. Override 2D/3D macro
tile mode to PRT_* tile mode if this flag is TRUE and num slice > 1.
2. Add numEquations and pEquationTable in ADDR_CREATE_OUTPUT structure
to return number of equations and the equation table to client driver
3. Add equationIndex in ADDR_COMPUTE_SURFACE_INFO_OUTPUT structure to
return the equation index to client driver
Please note the use of address equation has following restrictions:
1) The surface can't be splitable
2) The surface can't have non zero tile swizzle value
3) Surface with > 1 slices must have PRT tile mode, which disable
slice rotation
Sometimes client driver passes valid tile info into address library,
in this case, the tile index is computed in function
HwlPostCheckTileIndex instead of CiAddrLib::HwlSetupTileCfg.
We need to call HwlPostCheckTileIndex to calculate the correct tile
index to get tile split bytes for this case.
When clients queries tile Info from tile index and expects accurate
tileSplit info, bits per pixel info is required to be provided since
this is necessary for computing tileSplitBytes; otherwise Addrlib will
return value of "tileBytes" instead if bpp is 0 - which is also
current logic. If clients don't need tileSplit info, it's OK to pass
bpp with value 0.
Kaveri (2-pipe) macro tiling mode table was initially set to all
4-aspect-ratio so the swizzling path did not work for it and then we
chose to pad the offset. We now discover the root cause is that if
ratio > 2, the swizzling path does not work. So we can safely use the
same path for Kaveri.
Even if surface info input flag "tcComaptible" is enabled, tc
compatible may be not supported if tile split happens for depth
surfaces. Add a new flag in output structure to notify client to
disable tc compatible in this case.
Carrizo row size is 1K, while tileSplitBytes is 2K for a 4xAA 32bpp
depth surface. Remove the sanity check that tileSplitBytes must be
greater than row size. There could be performance loss but may be
covered by non-split depth which enables tc-compatible read.
Change the logic to compute tc compatible stencil info via depth's
tileIndex instead of using depth's tileInfo. So the clients can get
the stencil's tileInfo computed from macroModeTable. If the stencil
tileInfo is same as depth tileInfo, then stencil is tc compatible;
otherwise, stencil is not tc compatible. The current suggestion is to
create another stencil buffer with the tc compatible tileInfo, use
depth-to-color copy to decompress and tile convert the rendered
stencil to tc compoatible stencil (And use the new stencil buffer to
program TC).
Debian, Ubuntu set default build flag: -Werror=format-security
CC state_tracker/st_cb_texturebarrier.lo
state_tracker/st_cb_eglimage.c: In function ‘st_egl_image_get_surface’:
state_tracker/st_cb_eglimage.c:64:7: error: format not a string literal and no format arguments [-Werror=format-security]
_mesa_error(ctx, GL_INVALID_VALUE, error);
^~~~~~~~~~~
state_tracker/st_cb_eglimage.c:71:7: error: format not a string literal and no format arguments [-Werror=format-security]
_mesa_error(ctx, GL_INVALID_OPERATION, error);
^~~~~~~~~~~
Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Fixes: 83e9de25f3 ("st/mesa: EGLImageTarget* error handling")
These two functions do the exact same thing. One returns a uint64_t,
and the other takes the same uint64_t and truncates it to a uint32_t.
We only need the uint64_t variant - the caller can truncate if it wants.
This patch gives us one function, intel_batchbuffer_reloc, that does
the 64-bit thing.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Doing this before tessellation makes doing some bits of
tessellation a bit cleaner. It also cleans up a bit of the
llvm generator code.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix codegen build break that was introduced earlier
v2: update rules for gen_knobs.cpp and gen_knobs.h
v3: Introduce bldroot and revert generator file changes, making patch simpler.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
The old code would sync and then throw a cryptic error message.
There is no need for a custom error, we can just fallback to
the real function and have it do proper validation.
Fixes piglit test:
glsl-uniform-out-of-bounds
Which was returning the wrong error code.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Stop trying to specify texture or renderbuffer objects for unsupported
EGL images. Generate the error codes specified in the OES_EGL_image
extension.
EGLImageTargetTexture2D and EGLImageTargetRenderbuffer would call
the pipe driver's create_surface callback without ever checking that
the given EGL image is actually compatible with the chosen target
texture or renderbuffer. This patch adds a call to the pipe driver's
is_format_supported callback and generates an INVALID_OPERATION error
for unsupported EGL images. If the EGL image handle does not describe
a valid EGL image, an INVALID_VALUE error is generated.
v2: fixed get_surface to actually use the usage and error parameters
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The only callers are here, and we will add generation of GL errors in
the following patch. Rename the function to st_egl_image_get_surface,
pass the gl_context instead of st_context, and move the cast from
GLeglImageOES to void* into st_egl_image_get_surface.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Technically those hw operations are only available on gen7, as gen8+
support the conversion on the MOV. But, when using the builder to
implement nir operations (example: nir_op_fquantize2f16), it is not
needed to do the gen check. This check is done later, on the final
emission at brw_F32TO16 (brw_eu_emit), choosing between the MOV or the
specific operation accordingly.
So in the middle, during optimization phases those hw operations can
be around for gen8+ too.
Without this patch, several (at least 95) vulkan-cts quantize tests
crashes when using INTEL_DEBUG=optimizer. For example:
dEQP-VK.spirv_assembly.instruction.graphics.opquantize.too_small_vert
v2: simplify the code using GEN_GE (Ilia Mirkin)
v3: tweak brw_instruction_name instead of changing opcode_descs
table, that is used for validation (Matt Turner)
Reviewed-by: Matt Turner <mattst88@gmail.com>
No performance testing has been done, because it makes sense to make this
change regardless of that. Also, _NEW_TEXTURE is still used in many places,
but the obvious occurences are replaced here.
It's now possible to split _NEW_TEXTURE_OBJECT further.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
MSVC has been including a xtime definition in thr/xtimec.h ever since
MSVC 2013 (which is the minimum we require for building Mesa), and
including it prevents duplicate definitions when it gets included by
LLVM.
In fact, it looks that MSVC has been including a partial C11 threads
implementation too for some time, which we should consider migrating to
once we eliminate the use of _MTX_INITIALIZER_NP in our tree.
Thanks to the anonymous helper from
https://bugs.freedesktop.org/show_bug.cgi?id=100201#c4 for spotting
this.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100201
CC: "17.0" <mesa-stable@lists.freedesktop.org>
Drivers may queue dma operations on the context at unmap time so we need
to flush to make sure the data gets to the bo. Ideally the application
would take care of this, but since there appears to be no exported gbm
flush functionality we need to explicitly flush at unmap time.
This fixes a problem where kmscube on vmwgfx in rgba textured mode would
render using an uninitialized texture rather than the intended
rgba pattern.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Per the Vulkan spec, memory objects may be deleted before the buffers
and images using them are deleted, although those resources then
cannot be used except for deletion themselves.
For the virtual buffers, we need to access them on resource destruction
to unmap the regions, so this results in a use-after-free. Implement
reference counting to avoid this.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
v2: - Added comments.
- Fixed a double unmap bug.
- Actually unmap the non-edge old ranges.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
While having the _3d and _gpgpu versions is nice, there's no reason why
we need to have duplicated logic for tracking the current pipeline.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The programming note that says we need to do this still exists in the
SkyLake PRM and, from looking at the bspec, seems like it may apply to
all hardware generations SNB+. Unfortunately, this isn't particularly
clear cut since there is also language in the bspec that says you can
skip the flushing and stall to get better throughput. Experimentation
with the "Car Chase" benchmark in GL seems to indicate that some form of
flushing is still needed. This commit makes us do the full set of
flushes regardless of hardware generation. We can always reduce the
flushing later.
Reported-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
A bunch of code was indented in such a way that it looked like it went
with the if statement above but it definitely didn't.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
We're not using anything in it, and we don't want to inherit struct
definitions from some other package anyway.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Like done in si_state_draw.c::si_draw_vbo
u_upload_alloc can fail, i.e. set output param *ptr to NULL, for 2 reasons:
alloc fails or map fails. For both there is already a fprintf/stderr in
radeon_create_bo and radeon_bo_do_map.
In src/gallium/drivers/ it is a common usage to just avoid to crash by doing
a silent check. But defer fprintf where the error comes from, libdrm calls.
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
All callers of isl_surf_init() that set 'min_row_pitch' wanted to
request an *exact* row pitch, as evidenced by nearby asserts, but isl
lacked API for doing so. Now that isl has an API for that, update the
code to use it.
v2: Assert that isl_surf_init() succeeds because the callers assume
it. [for jekstrand]
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> (v1)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
The caller does so by setting the new field
isl_surf_init_info::row_pitch.
v2: Validate the requested row_pitch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
Validate that isl_surf::row_pitch fits in the below bitfields,
if applicable based on isl_surf::usage.
RENDER_SURFACE_STATE::SurfacePitch
RENDER_SURFACE_STATE::AuxiliarySurfacePitch
3DSTATE_DEPTH_BUFFER::SurfacePitch
3DSTATE_HIER_DEPTH_BUFFER::SurfacePitch
v2:
-Add a Makefile dependency on generated header genX_bits.h.
v3:
- Test ISL_SURF_USAGE_STORAGE_BIT too. [for jekstrand]
- Drop explicity dependency on generated header. [for emil]
v4:
- Rebase for new gen_bits_header.py script.
- Replace gen_10x with gen_device_info*.
v5:
- Drop FINISHME for validation of GEN9 1D row pitch. [for jekstrand]
- Reformat bit tests. [for jekstrand]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v4)
genX_bits.h contains the sizes of bitfields in genxml instructions,
structures, and registers. It also defines some functions to query those
sizes.
isl_surf_init() will use the new header to validate that requested
pitches fit in their destination bitfields.
What's currently in genX_bits.h:
- Each CONTAINER::Field from gen*.xml that has a bitsize has a macro
in genX_bits.h:
#define GEN{N}_CONTAINER_Field_bits {bitsize}
- For each set of macros whose name, after stripping the GEN prefix,
is the same, genX_bits.h contains a query function:
static inline uint32_t __attribute__((pure))
CONTAINER_Field_bits(const struct gen_device_info *devinfo);
v2 (Chad Versace):
- Parse the XML instead of scraping the generated gen*_pack.h headers.
v3 (Dylan Baker):
- Port to Mako.
v4 (Jason Ekstrand):
- Make the _bits functions take a gen_device_info.
v5 (Chad Versace):
- Fix autotools out-of-tree build.
- Fix Android build. Tested with git://github.com/android-ia/manifest.
- Fix macro names. They were all missing the "_bits" suffix.
- Fix macros names more. Remove all double-underscores.
- Unindent all generated code. (It was floating in a sea of whitespace).
- Reformat header to appear human-written not machine-generated.
- Sort gens from high to low. Newest gens should come first because,
when we read code, we likely want to read the gen8/9 code and ignore
the gen4 code. So put the gen4 code at the bottom.
- Replace 'const' attributes with 'pure', because the functions now
have a pointer parameter.
- Add --cpp-guard flag. Used by Android.
- Kill class FieldCollection. After Jason's rewrite, it was just
a dict.
v6 (Chad Versace):
- Replace `key not in d.keys()` with `key not in d`. [for dylan]
Co-authored-by: Dylan Baker <dylan@pnwbakers.com>
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v5)
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> (v6)
Move common codegen functions into gen_common.py.
v2: change gen_knobs.py to find the template file internally, like
the rest of the gen scripts.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
When using an overlayfs system (like a Docker container), rmrf_local()
fails because part of the files to be removed are in different mount
points (layouts). And thus cache-test fails.
Letting crossing mount points is not a big problem, specially because
this is just for a test, not to be used in real code.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Otherwise manual invokation of the script from elsewhere than
`dirname $0` will fail.
With these all the artefacts should be created in the correct location,
and thus we can remove the old (and slighly strange) clean-local line.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Current definitions work fine for the manual invokation of the script,
although the whole script does not consider that one can run it OOT.
The latter will be handled with latter patches, although it will be
extensively using the two variables.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Rather than hardcoding the binary location (which ends up wrong in a
number of occasions) in the python script, pass it as argument.
This allows us to remove a couple of dirname/basename workarounds that
aimed to keep this working, and succeeded in the odd occasion.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
At the moment we look for generator script(s) in builddir while they
are in srcdir, and we proceed to generate the tests and expected output
in srcdir, which is not allowed.
To untangle:
- look for the generator script in the correct place
- generate the files in builddir, by extending create_test_cases.py to
use --outdir
With this in place the test passes `make check' for OOT builds - would
that be as standalone or part of `make distcheck'
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Now that we have srcdir we can use it to correctly manage/point to the
script. Effectively fixing OOT invokation of `make check'.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
With later commits we'll fix the generators to produce the files in the
correct location. That in itself will cause an issue since the files
will be left dangling and make distcheck will fail.
v2: Use -r only as needed (Eric)
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
we anyway allow for multiple slices
v2: do not remove assert to check for buf->size
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Only a small tail needs to be uploaded manually.
This is only partly a performance measure (apps are expected to use
aligned access). Mostly it is preparation for sparse buffers, which the
old code would incorrectly have attempted to map directly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This allows the next patches to be simple while still being able
to make use of SDMA even in some unusual cases.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
With tess this becomes a bit more complex. so move to pipeline
for now.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just a precursor for tess support, which needs to
pass different values here.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In order to faciliate adding tess support, split the vs/es
output info into a separate block, so we make it easier to
have the tess shaders export the same info.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The logic was different than radeonsi, fix it up before adding
tess support.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If rasterization is disabled, we can get a NULL multisample
state.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
It reads @ writes the DB cache, and we haven't flushed dst caches yet,
so DB cache may be stale. Also the user might be shader read (and probably is),
so also flush after.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Yf/Ys tiling never got used in i965 due to not delivering
the expected performance benefits. So, this patch is deleting
this dead code in favor of adding it later in ISL when we
actually find it useful. ISL can then share this code between
vulkan and GL.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fast copy blit was primarily added to support Yf/Ys detiling.
But, Yf/Ys tiling never got used in i965 due to not delivering
the expected performance benefits. Also, replacing legacy blits
with fast copy blit didn't help the benchmarking numbers. This
is probably due to a h/w restriction that says "start pixel for
Fast Copy blit should be on an OWord boundary". This restriction
causes many blit operations to skip fast copy blit and use legacy
blits. So, this patch is deleting this dead code in favor of
adding it later when we actually find it useful.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We've already required Kernel 3.6 on Gen6+ since Mesa 9.2 (May 2013,
commit 92d2f5acfa). It seems reasonable
to require it for Gen4-5 as well, bumping the requirement from 2.6.39.
This is necessary for glClientWaitSync with a timeout to work, which
is a feature we expose on Gen4-5. Without it, we would fall back to an
infinite wait, which is pretty bad.
See kernel commit 172cf15d18889313bf2c3bfb81fcea08369274ef in 3.6+.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously we would just escape the loop and move everything
following the loop inside the if to the else branch of a new if
with a return flag conditional. However everything outside the
if the loop was nested in would still get executed.
Adding a new return to the then branch of the new if fixes this
and we just let a follow pass clean it up if needed.
Fixes:
tests/spec/glsl-1.10/execution/vs-nested-return-sibling-loop.shader_test
tests/spec/glsl-1.10/execution/vs-nested-return-sibling-loop2.shader_test
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
If we had no rasterization, we'd emit SPI color
format as all 0's the hw dislikes this, add the workaround
from radeonsi.
Found while debugging tessellation
v2: handle at pipeline stage, we have to handle
it after we process the fragment shader. (Bas)
v3: simplify even further, remove old fallback.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix 'make check' linking error with glibc < 2.17.
CXXLD main-test
../../../../src/mesa/.libs/libmesa.a(libmesautil_la-u_queue.o): In function `u_thread_get_time_nano':
src/util/../../src/util/u_thread.h:84: undefined reference to `clock_gettime'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
SEL can only convert between a few integer types, which we basically
never do.
Fixes fs/vs-double-uniform-array-direct-indirect-non-uniform-control-flow
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
We should use anv_get_layerCount() to access layerCount of VkImageSub-
resourceRange in anv_CmdClearColorImage and anv_CmdClearDepthStencil-
Image, which handles the VK_REMAINING_ARRAY_LAYERS (~0) case.
Test: Sample multithreadcmdbuf from LunarG can run without crash
Signed-off-by: Xu Randy <randy.xu@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
From page 45 (page 52 of the PDF) of the GLSL ES 3.00 v.6 spec:
" When instance names are present on matched block names, it is
allowed for the instance names to differ; they need not match for
the blocks to match.
From page 51 (page 57 of the PDF) of the GLSL 4.30 v.8 spec:
" When instance names are present on matched block names, it is
allowed for the instance names to differ; they need not match for
the blocks to match."
Therefore, no cross linking validation is needed for the instance name
of an Interface Block.
This patch will make that no link error will be reported on a program
like this:
"# VS
layout(binding = 1) Block1 {
vec4 color;
} uni_block;
...
# FS
layout(binding = 2) Block2 {
vec4 color;
} uni_block;
..."
Fixes GL45-CTS.enhanced_layouts.ssb_layout_qualifier_conflict
Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
From page 140 (page 147 of the PDF) of the GLSL ES 3.10 v.4 spec:
" 9.2 Matching of Qualifiers
The following tables summarize the requirements for matching of
qualifiers. It applies whenever there are two or more matching
variables in a shader interface.
Notes:
1. Yes means the qualifiers must match.
...
9.2.1 Linked Shaders
| Qualifier | Qualifier | in/out | Default | uniform | buffer|
| Class | | | Uniforms | Block | Block |
...
| Layout | binding | N/A | Yes | Yes | Yes |"
From page 93 (page 110 of the PDF) of the GL 4.2 (Core Profile) spec:
" 2.11.7 Uniform Variables
...
Uniform Blocks
...
When a named uniform block is declared by multiple shaders in a
program, it must be declared identically in each shader. The
uniforms within the block must be declared with the same names and
types, and in the same order. If a program contains multiple
shaders with different declarations for the same named uniform
block differs between shader, the program will fail to link."
From page 129 (page 150 of the PDF) of the GL 4.3 (Core Profile) spec:
" 7.8 Shader Buffer Variables and Shader Storage Blocks
...
When a named shader storage block is declared by multiple shaders
in a program, it must be declared identically in each shader. The
buffer variables within the block must be declared with the same
names, types, qualification, and declaration order. If a program
contains multiple shaders with different declarations for the same
named shader storage block, the program will fail to link."
Therefore, if the binding qualifier differs between two linked Uniform
or Shader Storage Blocks of the same name, a link error should happen.
This patch will make that a link error will be reported on a program
like this:
"# VS
layout(binding = 1) Block {
vec4 color;
} uni_block1;
...
# FS
layout(binding = 2) Block {
vec4 color;
} uni_block2;
..."
Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
While it's legal to have an active blocks count > 0 on link failure.
Unless we actually assign memory for the blocks array we can end up
segfaulting in calls such as glUniformBlockBinding().
To avoid having to NULL check these api calls we simply reset the
block count to 0 if the array was not created.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
A resolve is not needed on Skylake in this case. We were forcing
a resolve because we set the input_aux_usage to ISL_AUX_USAGE_NONE.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The checks were only looking at the first byte, while the intention
seems to be to check if the whole sha1 is zero. This prevented all
shaders with first byte zero in their sha1 from being saved.
This shaves around a second from Deus Ex load time on a hot cache.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Even though the programs themselves stay in cache and are loaded, the
shader keys can be evicted separately. If that happens, unnecessary
compiles are caused that waste time, and no matter how many times the
program is re-run, performance never recovers to the levels of first
hot cache run. To deal with this, we need to refresh the shader keys
of shaders that were recompiled.
An easy way to currently observe this is running Deux Ex, then piglit
and Deux Ex again, or deleting just the cache index. The later is
causing over a minute of lost time on all later Deux Ex runs, with this
patch it returns to normal after 1 run.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Surfaces and Volumes can be freed in the worker thread.
Without this patch, pending_uploads_counter could be non-zero
in the Surfaces or Volumes dtor, leading to deadlock.
Instead decrease properly the counter before releasing the
item.
Also avoid another potential deadlock if the item is not
properly unlocked: Do not call UnlockRect which will cause deadlock,
but free directly using the deadlock safe
nine_context_get_pipe_multithread.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99246
CC: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Tested-by: James Harvey <lothmordor@gmail.com>
Fix regression caused by
abb1c645c4
The patch made csmt use context.pipe instead of
secondary_pipe, leading to thread safety issues.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
These generated source files depend not only upon gl_and_es_API.xml, but
all other XML files that are included by it.
This change updates the generation rules to depend on all gen/*.xml
files, like done for other SCons generation rules, and should fix
incremental broken SCons builds due to missing dependencies.
Trivial.
Fix 'make check' linking errors with glibc < 2.17.
CXXLD glsl/glsl_test
glsl/.libs/libglsl.a(libmesautil_la-u_queue.o): In function `u_thread_get_time_nano':
src/util/../../src/util/u_thread.h:84: undefined reference to `clock_gettime'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
If we get EOF earlier than expected, the current read loops will
deadlock. This may easily happen if the disk cache gets corrupted.
Fix it by using a helper function that handles EOF.
Steps to reproduce (on a build with asserts disabled):
$ glxgears
$ find ~/.cache/mesa/ -type f -exec truncate -s 0 '{}' \;
$ glxgears # deadlock
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
While at it, also fix up a failure message to not reference timestamp
and gpu dirs as those are no longer being made.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Drop ir3_compiler_destroy(), since it is only ralloc_free() and we
shouldn't really have an ir3 dependency in core. If some future hw
has a new compiler, as long as all it's resources are ralloc()d then
things will all just work.
(In practice, I suppose you never really see this leak, but removing
it at least cleans up some noise in valgrind.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Some field names had extra spaces and some had places where we should
have had a space but didn't.
Reviewed-by: Chad Versace <chadversary@chromium.org>
We've never used it, it only exists on gen8, and the name of the struct
contains piles of bad characters.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Otherwise blitter would still hold a ref to, for example, sampler-
views.
To reproduce:
glmark2 -b desktop:duration=2 --run-forever
Fixes: a8e6734 ("freedreno: support for using generic clear path")
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
It's indexed by buffer, not stream. BRW_MAX_SOL_BUFFERS and
MAX_VERTEX_STREAMS happen to both be 4, so there's no actual bug.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We create the BO when creating a transform feedback object, and only
destroy it when deleting that object. So it won't be NULL.
CID: 1401410
Reviewed-by: Matt Turner <mattst88@gmail.com>
The state tracker no longer uploads those attributes for us,
so we must conservatively upload the size of the largest
attribute, which is a dvec4.
Fixes a regression of GL45-CTS.gpu_shader_fp64.varyings and
GL45-CTS.vertex_attrib_64bit.limits_test.
Fixes: 9b91e0b54c ("radeonsi: allow unaligned vertex buffer offsets and strides on CIK-VI")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Earlier commit unintentionally dropped the mkdir, as it was rebased.
Some versions of autotools will not create the output directory for
generated sources. Thus the issue went unnoticed by the original author.
Cc: Dylan Baker <dylan@pnwbakers.com>
Cc: Steven Newbury <steve@snewbury.org.uk>
Reported-by: Steven Newbury <steve@snewbury.org.uk> Fixes:
Fixes: 1610b3dede ("anv: don't pass xmlfile via stdin anv_entrypoints_gen.py")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
We don't need to make the caller (CmdCopyQueryPoolResults) aware of the
problem since compute_query_result() only emits state. The caller is also
expected to hit OOM in this scenario right after calling this function, but
it is already handling it safely.
Fixes:
dEQP-VK.api.out_of_host_memory.cmd_copy_query_pool_results
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We need to know if sample shading has been requested during shader
compilation since that affects the way fragment coordinates are
computed.
Notice that the semantics of fragment coordinates only depend on
whether sample shading has been requested, not on whether more
than one sample will actually be produced (that is,
minSampleShading and rasterizationSamples do not affect this
behavior).
Because this setting affects the code we generate for the shader, we also
need to include it in the WM prog key. Notice we don't need to alter the
OpenGL code because it doesn't ever use this behavior, so they key's
value is always false (the default).
Fixes:
dEQP-VK.glsl.builtin_var.fragcoord_msaa.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
According to section 14.6 of the Vulkan specification:
"When sample shading is enabled, the x and y components of FragCoord
reflect the location of the sample corresponding to the shader
invocation."
So add a boolean parameter to the lowering pass to select this behavior
when we need it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If we know the device has been lost we should return this error code for
any command that can report it before we attempt to do anything with the
device.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The Vulkan specs say:
"A logical device may become lost because of hardware errors, execution
timeouts, power management events and/or platform-specific events. This
may cause pending and future command execution to fail and cause hardware
resources to be corrupted. When this happens, certain commands will
return VK_ERROR_DEVICE_LOST (see Error Codes for a list of such commands).
After any such event, the logical device is considered lost. It is not
possible to reset the logical device to a non-lost state, however the lost
state is specific to a logical device (VkDevice), and the corresponding
physical device (VkPhysicalDevice) may be otherwise unaffected. In some
cases, the physical device may also be lost, and attempting to create a
new logical device will fail, returning VK_ERROR_DEVICE_LOST."
This means that we need to track if a logical device has been lost so we can
have the commands referenced by the spec return VK_ERROR_DEVICE_LOST
immediately.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
So that we don't have to do things like rolling back address relocations in
case that we ran into OOM after computing them, etc
Also, make sure that if the queue submission comes with a fence, we set it up
correctly so it behaves according to the spec after returning
VK_ERROR_DEVICE_LOST.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
GL_AMD_pinned_memory requires memory to be aligned correctly, so
we skip marshalling in this case. Also copying the data defeats
the purpose of EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD.
Fixes GL_AMD_pinned_memory piglit tests when glthread is enabled.
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
This can be used to deal with key hash collisions from different
versions (should we find that to actually happen) and to find
which mesa version produced the cache entry.
V2: use blob created at cache creation.
v3: remove left over var from v1.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
This allows to get rid of the arch and gpu name directories.
v2: (Timothy Arceri) don't use an opaque data type to store
pointer size and gpu name.
v3: (Timothy Arceri) use blob to store driver keys just make sure
to store null terminator for strings, and make sure blob is
defined by disk_cache and not it's users.
v4: (Timothy Arceri) fix typo, and make ptr_size a uint8_t.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Instead of using a directory, hash the timestamps into the cache keys
themselves. Since there is no more timestamp directory, there is no more
need for deleting the cache of other mesa versions and we rely on
eviction to clean up the old cache entries. This solves the problem of
using several incarnations of disk_cache at the same time, where one
deletes a directory belonging to the other, like when both OpenGL and
gallium nine are used simultaneously (or several different mesa
installations).
v2: using additional blob instead of trying to clone sha1 state
v3: (Timothy Arceri) don't use an opaque data type to store
timestamp.
V4: (Timothy Arceri) use blob to store driver keys just make sure
to store null terminator for strings, and make sure blob is
defined by disk_cache and not it's users.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100091
We want to be able to check the progress of each pass and dump the NIR
for debugging purposes if it changed.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Workaround an unknown bug with inside the transfer_map for certain
ASIC, also tested with un-affected ASICs, the performance actually
improved slightly.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The actual offset returned is uint32_t, however int64_t was used as the
return type from gbm_bo_get_offset to allow negative returns to signal
errors to the caller.
In case of an error getting the offset, the user will also be unable to
get the handle/FD, and thus have nothing to offset into. This means that
returning 0 as an error value is harmless, allowing us to change the
return type to uint32_t in order to avoid signed/unsigned confusion in
callers.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Recent change to use drmGetDevices2() made me realize that
build configured using
PKG_CONFIG_PATH=my_drm_lib_path/pkgconfig ./autogen.sh
considers the libdrm path gotten from pkgconfig only during
make. When invoking "make install" the relink command puts
system library ahead of the path gotten from pkgconfig
(and starts to fail as system libdrm isn't new enough).
This change forces the relink command to respect pkgconfig
settings.
It looks to me that in
https://bugs.freedesktop.org/show_bug.cgi?id=100259
with Emil et al considering it a libtool bug.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
[Emil Velikov: add inline comment]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
patch adds DECODER_FILES for libintel_common, this is so that platforms
such as Android not currently using this functionality can opt out.
Fixes: 7d84bb3 ("intel: Move tools/decoder.[ch] to common/gen_decoder.[ch].")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Patch fixes entrypoint generation for libmesa_anv_entrypoints that
still used old style of calling generator script.
Also small fixes to libmesa_vulkan_common where there was a typo
in target name (vulknan) and files were generated to wrong folder.
Fixes: 8211e3e6 ("anv: Generate anv_entrypoints header and code in one command")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Automake generation rules are replicated for android.
$* macro was expected to return "hsw" but instead gives "hsw.{h,c}"
so $(basename $*) is used as a workaround
to set the correct --chipset option for brw_oa.py script.
Build tested with nougat-x86
Fixes: e565505 "i965: Add script to gen code for OA counter queries"
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Robert Bragg <robert@sixbynine.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Original naming was following Vulkan HAL naming scheme for no good
purpose and we need same binary name for build-id code.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
It's written in C rather than pure python and is strictly faster, the
only reason not to use it that it's classes cannot be subclassed.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
This has the potential to mask errors, since Element.get works like
dict.get, returning None if the element isn't found. I think the reason
that Element.get was used is that vulkan has one extension that isn't
really an extension, and thus is missing the 'protect' field.
This patch changes the behavior slightly by replacing get with explicit
lookup in the Element.attrib dictionary, and using xpath to only iterate
over extensions with a "protect" attribute.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Instead of using an if and a check, use dict.get, which does the same
thing, but more succinctly.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
This produces the header and the code in one command, saving the need to
call the same script twice, which parses the same XML file.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
This produces a file that is identical except for whitespace, there is a
table that has 8 columns in the original and is easy to do with prints,
but is ugly using mako, so it doesn't have columns; the data is not
inherently tabular.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
This does two things, first it updates both the .h and the .c file to
have the same do not edit string. Second, it uses __file__ to ensure
that even if the file is moved or renamed that the name will be correct.
One thing to note is the use of '{{' and '}}' in the C template. This is
to instruct python to print a literal '{' and '}' respectively, rather
than treating the contents as a formatter specifier.
v3: - add this patch
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
This is groundwork for the next patches, it will allows porting the
header and the code to mako separately, and will also allow both to be
run simultaneously.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
It's slow, and has the potential for encoding issues.
v2: - pass xml file location via argument
- update Android.mk
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
These are all fairly small cleanups/tweaks that don't really deserve
their own patch.
- Prefer comprehensions to map() and filter(), since they're faster
- replace unused variables with _
- Use 4 spaces of indent
- drop semicolons from the end of lines
- Don't use parens around if conditions
- don't put spaces around brackets
- don't import modules as caps (ET -> et)
- Use docstrings instead of comments
v2: - Replace comprehensions with multiplication
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
CP DMA and PKT3_WRITE_DATA (in CmdUpdateBuffer) don't (currently) write
through L2. Therefore, to make these writes visible to later accesses
we must invalidate L2 rather than just writing it back, to avoid the
possibility that stale data is read through L2.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix linking error on CentOS 6.
CXXLD glsl_compiler
glsl/.libs/libstandalone.a(lt16-libmesautil_la-u_queue.o): In function `u_thread_get_time_nano':
src/util/../../src/util/u_thread.h:84: undefined reference to `clock_gettime'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Otherwise for apps that don't seed the regular rand() we will always
remove old cache entries from the same dirs.
V2: assume bits returned by rand are independent uniformly distributed
bits and grab our hex value without taking the modulus of the whole
value, this also fixes a bug where 'f' was always missing.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
V2: pass the seed to the seed function so that we can isolate
its uses. Stop leaking fd when urandom couldn't be read.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
V2: pass the seed to rand_xorshift128plus() so that we can isolate
its uses.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will allow to force computing the absolute value for sqrt()
and inversesqrt() in order to follow D3D9 behaviour for buggy
apps that rely on it.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Function::getArgumentList() doesn't exist anymore, switch to using
arg_begin() (existed back to at least llvm-3.6.0).
Reviewed-by: Vedran Miletić <vedran@miletic.net>
CC: <mesa-stable@lists.freedesktop.org>
Any users of KitKat are likely using an older version of Mesa and
KitKat support adds complexity to the make files. Dropping support
allows removing the MESA_LOLLIPOP_BUILD make variable in various make
files.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The Android version defines are only needed for versions less than 4.2
which aren't really supported or tested.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Commit 6facb0c08f ("android: fix libz dynamic library dependencies")
added libz as a dependency, but this breaks host targets as the host
dependency is libz-host. As no host lib needs libz, just remove the
dependency for them.
Fixes: 6facb0c08f "android: fix libz dynamic library dependencies"
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixed with the following command:
perl -pe 'BEGIN{undef $/;} s/ \\\n\n/\n\n/smg' $(find . -name 'Android.*')
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Through the glsl headers we had an odd mix of guards be that
"ifndef", "pragma once" neither or both.
Simplify things by using the more common ones (ifndef) and annotating
all the sources, barring the generated builting header -
builtin_int64.h.
The final header - udivmod64.h - is [seemingly] unused and on its way
out (patch purge it is on the mailing list).
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
This should just work (tm) with the default options. Plus the one we
pass is already the default, so just drop it.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
The header provides the LINUX_VERSION_CODE and KERNEL_VERSION macros.
With neither of which being used by any part of mesa.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
The query is a properties query so it needs to be handled in
GetPhysicalDeviceProperties2, not GetPhysicalDeviceFeatures2.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Rather than using 3 different ways to wrap _mesa_sha1_*() to SHA1*()
functions (a macro, prototype with implementation in .c and an inline
function), make all 3 inline functions.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
At the moment, we would honour any system headers - vulkan_intel.h in
particular over the ones in-tree.
Thus, if one does incremental build of mesa, without the vulkan.h
already installed (or at least not in the same directory as
vulkan_intel.h) the build will fail.
In the future we might want to upstream the vulkan_intel.h within
vulkan.h or use other ways to make vulkan_intel.h obsolete. In either
case, the more robust thing is to rely on our own copy.
v2: Move AM_CPPFLAGS just above LIBDRM_CFLAGS (Grazvydas, Jason)
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Fixes: ee8044fd "intel/vulkan: Get rid of recursive make"
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
primcount must be a GLsizei as in the signature for MultiDrawElements
or bad things can happen.
Furthermore, an error should be flagged when primcount is negative.
Curiously, this code used to work somewhat correctly even when primcount
was negative, because the loop that checks count[i] would iterate out of
bounds and almost certainly hit a negative value at some point.
Found by an ASAN error in
GL45-CTS.gtf32.GL3Tests.draw_elements_base_vertex.draw_elements_base_vertex_primcount
Note that the OpenGL spec seems to have s/primcount/drawcount/ at some
point, and the code still reflects the old language.
v2: provide the correct spec quotes (pointed out by Ian)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
MATERIALFV may end up reading up to 4 floats from the passed parameter.
This should really set a GL_INVALID_ENUM error in the cases where it
matters, but does anybody really care?
Found by ASAN in piglit gl-1.0-beginend-coverage.
v2: fix a trivial compiler warning
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
The calculations of row_pitch, the row pitch's alignment, surface size,
and base_alignment were mixed together. This patch moves the calculation
of row_pitch and its alignment to occur before the calculation of
surface_size and base_alignment.
This simplifies a follow-on patch that adds a new member, 'row_pitch',
to struct isl_surf_init_info.
v2:
- Also extract the row pitch alignment.
- More helper functions that will later help validate the row pitch.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
isl has a giant comment that explains the hardware's padding
requirements. (Hint: Cache lines and page faults). But the comment is in
the wrong place, in isl_calc_linear_row_pitch(), which is unrelated to
padding.
The important parts of that comment were copied to
isl_apply_surface_padding() long ago. So drop the misplaced comment.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
All the plumbing is in place so the extension just needs to be
advertised.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This doesn't really "do" anything because the default tiling for the
winsys buffer is X tiled. We do however want the X tiled modifier to
work correctly from the API perspective, which would imply that if you
set this modifier, and later do a get_modifier, you get back at least X
tiled.
Running with a modified kmscube, here are the bandwidth measurements.
Linear:
Read bandwidth: 1039.31 MiB/s
Write bandwidth: 1453.56 MiB/s
Y-tiled:
Read bandwidth: 458.29 MiB/s
Write bandwidth: 542.12 MiB/s
X-tiled:
Read bandwidth: 575.01 MiB/s
Write bandwidth: 606.25 MiB/s
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This patch begins introducing how we'll actually handle the potentially
many modifiers coming in from the API, how we'll store them, and the
structure in the code to support it.
Prior to this patch, the Y-tiled modifier would be entirely ignored. It
shouldn't actually be used until this point because we've not bumped the
DRIimage extension version (which is a requirement to use modifiers).
Measuring later in the series with kmscube:
Linear:
Read bandwidth: 1048.44 MiB/s
Write bandwidth: 1483.17 MiB/s
Y-tiled:
Read bandwidth: 471.13 MiB/s
Write bandwidth: 589.10 MiB/s
Similar functionality was introduced and then reverted here:
commit 6a0d036483
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Thu Apr 21 20:14:58 2016 -0700
i965: Always use Y-tiled buffers on SKL+
v2: Use last set bit instead of first set bit in modifiers to address
bug found by Daniel Stone.
v3: Use the new priority modifier selection thing. This nullifies the
bug fixed by v2 also.
v4: Get rid of modifier compaction which originally served another
purpose and now serves none (Jason)
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At image creation create a path for dealing with the linear modifier.
This works exactly like the old usage flags where __DRI_IMAGE_USE_LINEAR
was specified.
During development of this patch series, it was decided that a lack of
modifier was an insufficient way to express the required modifiers. As a
result, 0 was repurposed to mean a modifier for a LINEAR layout.
NOTE: This patch was added for v3 of the patch series.
v2: Rework the algorithm for modifier selection to go from a bitmask
based selection to this priority value.
v3: Make DRM_FORMAT_MOD_INVALID allowed at selection as a way of
identifying no modifiers found (because 0 is LINEAR) (Jason)
v4: Remove the logic to prune unknown modifiers (like those from other
vendors) and simply handle is in select_best_modifier (Jason)
Requested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
New to the patch series after reordering things for landing smaller
chunks.
This will essentially enable modifiers from clients that were just
enabled in previous patches. A client could use the modifiers by
setting all of them at create, but had no way to actually query them
after creating the surface (ie. stupid clients could be broken before
this patch, but in more ways than this).
Obviously, there are no modifiers being actually stored yet - so this
patch shouldn't do anything other than allow the API to get back 0 (or
the LINEAR modifier).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I intend to need to get to the devinfo structure, and storing the screen
is an easy way to do that.
It seems to be the consensus that you cannot share an image between
multiple screens.
Scape-goat: Rob Clark <robdclark@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Recent glibc generates this warning:
brw_performance_query.c:1648:13: warning: In the GNU C Library, "minor" is defined
by <sys/sysmacros.h>. For historical compatibility, it is
currently defined by <sys/types.h> as well, but we plan to
remove this soon. To use "minor", include <sys/sysmacros.h>
directly. If you did not intend to use a system-defined macro
"minor", you should undefine it after including <sys/types.h>.
min = minor(sb.st_rdev);
So, include sys/sysmacros.h to shut up the warning.
v2: Use the AC_HEADER_MAJOR defines to figure out the right header
(thanks to Jonathan Gray for helping me not break non-glibc systems)
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
This was used for aubdumping (deleted a while ago) and INTEL_DEBUG=bat
decoding (deleted recently).
While we're changing parameters, delete the wrapper macro and make the
actual function brw_state_batch instead of __brw_state_batch.
This subsumes a patch by Emil Velikov to drop this from BLORP.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This deletes all of our handwritten code in favor of autogenerated
genxml-based decoding. This should be much more usable, as the old
code isn't entirely accurate - we updated some things for new
generations, but not everything.
Aubinator has one annoying limitation: it has no idea how many entries
to print when encountering e.g. 3DSTATE_BINDING_TABLE_POINTERS_VS. It
picks an arbitrary number, which may skip decoding valid data, and may
print extra garbage entries.
We do a better job here by making brw_state_batch track the size of the
data stored at a particular batchbuffer offset. Then, we can divide by
the structure size to obtain the exact number of entries.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This should give substantially better decoding, as the public libdrm
decoder hasn't been properly maintained in years.
For now, we reuse the existing state dumping mechanism. We'll improve
that in the next patch.
To avoid increasing the size of the driver, we restrict this feature
to debug builds of Mesa. There's probably very little use for it in
release builds anyway.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fix build with Python < 2.7.
File "src/compiler/nir/nir_builder_opcodes_h.py", line 46, in <module>
from nir_opcodes import opcodes
File "src/compiler/nir/nir_opcodes.py", line 178, in <module>
unop_convert("{}2{}{}".format(src_t[0], dst_t[0], bit_size),
ValueError: zero length field name in format
Fixes: 762a6333f2 ("nir: Rework conversion opcodes")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Like done in another place in that same file.
CID 1250588
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Simplifies the write code a bit and handles EINTR.
V2: (Timothy Arceri) Drop EINTR handling. To do it
properly we would need a retry limit but it's
probably best to just avoid trying to write if
we hit EINTR and try again next time we see
the program.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
There is no need to hardcode it, we can just use blob_key[0].
This is needed because the next patches are going to change how cache
keys are computed.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This will allow to hash additional data into the cache keys or even
change the hashing algorithm easily, should we decide to do so.
v2: don't try to compute key (and crash) if cache is disabled
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Per pixel stats are cached but were not always being flushed as threads
moved from one draw context to the next. Added an explicit flush to allow
all archrast objects to flush any cached events.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Performance is now 50x faster with archrast now that we're properly
filtering out all of the rdtsc begin/end.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Autogen functions that instantiates different BackendPixelRate templates.
Functions get split into separate files after reaching a user defined
threshold (currently 512 per file) to speed up compilation.
This change will enable the addition of more template flags in the pixel
back end.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Detecting register write support by trial and error introduces a
stall at screen creation time, which it would be nice to avoid.
Certain command parser versions guarantee this will work (see the
giant comment in intelInitScreen2 below, or a few commits ago):
- Ivybridge: version >= 1 (kernel v3.16)
- Baytrail: version >= 2 (kernel v3.19)
- Haswell: version >= 7 (kernel v4.8)
For simplicity, we don't bother with version 1 in this patch.
This assumes that the user hasn't disabled aliasing PPGTT via a kernel
command line parameter. Don't do that - you're only breaking things.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
If we can't write registers, then the effective command parser version
is 0 - it may exist, but it's not usefully enabling anything.
See kernel commit 1ca3712ca3429a617ed6c5f87718e4f6fe4ae0c6 (in v4.8)
where the kernel starts doing this for us. This makes us do more or
less the same thing on older kernels.
This should preserve a bit of sanity by allowing us to perform a
screen->cmd_parser_version > N check to determine that we really can
use the features promised by command parser version N.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This should help us figure out the complexities of which kernel
versions we need to get various features on various platforms.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
In commit d2590eb65f I enabled GL 4.5
on Haswell...but failed to check if we could do indirect compute
shader dispatch...and query buffer objects.
Indirect compute shader dispatch requires command parser version 5
(kernel commit 7b9748cb513a6bef4af87b79f0da3ff7e8b56cd8, which is in
Linux v4.4). On earlier kernels we would have disabled
ARB_compute_shader, which is a mandatory part of OpenGL 4.3+.
Query buffer objects currently require MI_MATH and MI_LOAD_REGISTER_REG,
which mean command parser version 7 (Linux v4.8). On earlier kernels
we would have disabled ARB_query_buffer_object, which is a mandatory
part of OpenGL 4.4+.
The new version support looks like:
- Kernel 4.1 and older => OpenGL 3.3
- Kernel 4.2-4.3 => OpenGL 4.2
- Kernel 4.4-4.7 => OpenGL 4.3
- Kernel 4.8+ => OpenGL 4.5
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The author is Heiko Przybyl(CC'ing), the patch is rebased on top of Bartosz Tomczyk's one per Dieter Nützel's comment.
Tested-by: Constantine Charlamov <Hi-Angel@yandex.ru>
v2: Resend the patch again through git-email. The prev. rebase was sent
through Thunderbird, which screwed up tab characters, making the patch
not apply.
--------------
When fixing the stalls on evergreen I introduced leaking of the useinfo
structure(s). Sorry. Instead of allocating a new object to hold 3 values
where only one is actually used, rework the list to just store the node
pointer. Thus no allocating and deallocation is needed. Since use_info
and use_kind aren't used anywhere, drop them and reduce code complexity.
This might also save some small amount of cycles.
Thanks to Bartosz Tomczyk for finding the bug.
Reported-by: Bartosz Tomczyk <bartosz.tomczyk86 at gmail.com <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>>
Signed-off-by: Heiko Przybyl <lil_tux at web.de <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>>
Supersedes: https://patchwork.freedesktop.org/patch/135852
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
When the iterator encounters a structure field, it now looks up the
gen_group for that structure definition and saves a pointer to it.
This lets us drop a lot of ridiculous code in the caller, which looked
at item->value (<struct NAME dword>), strtok'd the structure name back
out, and looked it up itself.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The iterator code already computed this value, then we stored it in
the structure name, strtok'd it back out, and also manually computed
it when printing dword headers.
Just put the value in the struct and use it. Way simpler.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It made more sense when decode_group() took a bunch of extra options,
but now that there's only one...we may as well pass 0 and call it a day.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
I added this flag in 65a9d5eabb but
it was completely unused. Both callers appear to have printed dword
headers, so we can just drop the flag and continue doing it
unconditionally.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
When decoding a structure field within a group, we may want to look up
that structure type. Having a gen_spec pointer makes it easy to do so.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
gen_field_iterator_next() produces a string representing the value of
the field. For enum values, it also produced a separate "description"
string containing the textual name of the enum.
The only caller of this function combines the two, printing enums as
"<numeric value> (<texture enum name>)". We may as well just store
that in item->value directly, eliminating the description field, and
a layer of wrapping.
v2: Use non-overlapping source and destination strings in snprintf.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
CID 1399479: Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking velems suggests that it may be null,
but it has already been dereferenced on all paths leading to the check.
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Like in a few other places in that radeon_drm_bo.c file.
CID 715739.
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
fc_sp variable should indicate number of elements in
fc_stack array, but fc_sp was increased at beginning of fc_pushlevel
function. It leads to situation where idx=0 was never used, and last
32 element was stored outside fs_stack array.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The second check in the old code looked pretty much unreachable, esp.
because it's not obvious that "max_entries" could be zero. To find out
that it was intentional I had to run some checks, and to dig into
the old versions of the file.
So, rewrite the check to make the intention clear.
v2: s/r600/r600g in the title, and per Dieter Nützel's comment wrap
lines of condition.
Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
The crash is due to NULL pColorBlendState, which is legal if the
pipeline has rasterization disabled or if the subpass of the render pass
the pipeline is created against does not use any color attachments.
Test: Sample subpasses from LunarG can run without crash
Signed-off-by: Xu,Randy <randy.xu@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
The current code evaluated to always true, we only want to flush
on the first submit. Rename the variable to do_flush, and only
emit on the first iteration.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since we already do fabs on the one source, we're guaranteed to get
positive infinity if we get any infinity at all. Since +inf only has
one IEEE 754 representation, we can use an integer comparison and avoid
all of the ordered/unordered issues.
Cc: Dave Airlie <airlied@redhat.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This reverts commit 2845a108a9.
This break VK-GL-CTS randomly.
./deqp-vk --deqp-case=dEQP-VK.texture.filtering.3d.formats.r4g4b4a4*
bounces around here from 6/6 to 3/6 or 4/6 to hanging.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This was meant to be checking the index type to get the correct
index not the last emitted one. This fixes:
dEQP-VK.pipeline.input_assembly.primitive_restart.index_type_uint32.triangle_strip_with_adjacency
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I haven't seen this causing problems in practice, but for correctness
we should also check if rename succeeded to avoid breaking accounting
and leaving a .tmp file behind.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
At the time of target file check, .tmp file is already created and file
lock is held, so we should remove the .tmp, like in other error paths.
With this, piglit no longer leaves large amount of empty .tmp files
behind, which waste directory entries and may interfere with eviction.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
It seems there is a bug because:
- 20 bytes are compared, but only 1 byte stored_keys step is used
- entries can overlap each other by 19 bytes
- index_mmap is ~1.3M in size, but only first 64K is used
With this fix for Deus Ex:
- startup time (from launch to Feral logo): ~38s -> ~16s
- disk_cache_has_key() hit rate: ~50% -> ~96%
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
There should be minimal gain, if any, for nvc0, but nv50 may end up
noticing more often that the lod argument is uniform. This, in turn,
will remove the need for some unnecessary transformations, which were
being hit due to the checks being done pre-ssa.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Helps mainly Feral-ported games, due to their use of fma()
shader-db changes:
total instructions in shared programs : 3901147 -> 3842505 (-1.50%)
total gprs used in shared programs : 471258 -> 467359 (-0.83%)
total local used in shared programs : 27405 -> 27361 (-0.16%)
total bytes used in shared programs : 35749888 -> 35214176 (-1.50%)
local gpr inst bytes
helped 17 1829 4091 4091
hurt 4 44 3 3
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Since switching to LRU eviction the only user of these predicate
functions now resolves directory entry stats itself so pass them
directly saving calling fstat and strlen twice (and the
expensive strlen is skipped entirely if access time is newer).
v2: Update for empty cache dir detection changes
v3: Fix passing string length to predicate with the +1 for NULL
termination and also pass sb as pointer
v4: Missed ampersand for passing sb as pointer
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Previously each time we saw a variable we just created a duplicate
entry in the list. This is particularly bad for loops were we add
everything twice, and then throw nested loops into the mix and the
list was growing expoentially.
This stops the glsl-vs-unroll-explosion test which has 16 nested
loops from reaching the tests mem usage limit in this pass. The
test now hits the mem limit in opt_copy_propagation_elements()
instead.
I suspect this was also part of the reason this pass can be so
slow with some shaders.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Fixes a bunch of piglit crashes that hit an assert() when trying
to delete the framebuffer. The assert() was triggered because
WinSysDrawBuffer was set to NULL before glDeleteFramebuffers()
was called.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In the end, pipeline statistics queries look a lot like occlusion
queries only with between 1 and 11 begin/end pairs being generated
instead of just the one.
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In order to get accurate statistics, we need to disable statistics for
blits, clears, and the surface state memcpy at the top of each secondary
command buffer. There are two possible approaches to this:
1) Disable before the blit/memcpy and re-enable afterwards
2) Move emitting 3DSTATE_VF_STATISTICS from initialization and make it
part of pipeline state and then just disabale statistics before
blits and memcpy operations.
Emitting 3DSTATE_VF_STATISTICS should be fairly cheap so it doesn't
really matter which path we take. We choose the second option as it's
more consistent with the way the rest of the statistics are enabled and
disabled.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It's in 3DSTATE_CLIP, so it doesn't really need the extra detail. This
matches what we do for VS, FS, etc.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The new version is a nice GPU parallel to cpu_write_query_result and it
nicely handles things like dealing with 32 vs. 64-bit offsets in the
destination buffer.
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Not all queries are the same. Even the two queries we support today
require a different amount of data per slot. Once we introduce pipeline
statistics queries, the size will vary wildly.
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We're about to make slots variable-length and always having the
available bits at the front makes certain operations substantially
easier once we do that.
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
From the Vulkan 1.0.39 Specification:
"If VK_QUERY_RESULT_64_BIT is not set and the result overflows a
32-bit value, the value may either wrap or saturate."
So we can either clamp or wrap. Wrapping is both easier and what the
user gets if they use vkCmdCopyQueryPoolResults and we should be
consistent. We could make vkCmdCopyQueryPoolResults clamp but it's
annoying and ends up burning extra batch for something the spec clearly
doesn't require.
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
threaded gallium can't use pipe_context's LLVM target machine, because
create_shader_selector can be called from a non-driver thread.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Now that there's a timebase_scale in gen_device_info which is
effectively the 'period' this switches anv_GetPhysicalDeviceProperties
to using this common device info to initialize the timestampPeriod
device limit.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Prior to Skylake the Gen HW timestamps were driven by a 12.5MHz clock
with the convenient property of being able to scale by an integer (80)
to nanosecond units.
For Skylake the frequency is 12MHz or a scale factor of 83.333333
This updates gen_device_info to track a floating point timebase_scale
factor and makes corresponding _queryobj.c changes to no longer assume a
scale factor of 80 works across all gens.
Although the gen6_ code could have been been left alone, the changes
keep the code more comparable, and it now shares a few utility functions
for scaling raw timestamps and calculating deltas. The utility for
calculating deltas takes into account 32 or 36bit overflow depending on
the current kernel version.
Note: this leaves the timestamp handling of ARB_query_buffer_object
untouched, which continues to use an incorrect scale of 80 on Skylake
for now. This is more awkward to solve since the scaling is currently
done using a very limited uint64 ALU available to the command parser
that doesn't support multiply or divide where it's already taking a
large number of instructions just to effectively multiple by 80.
This fixes piglit arb_timer_query-timestamp-get on Skylake
v2: (Ken) Update timebase_scale for platforms past Skylake/Broxton too.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Older versions of GCC don't like compound literals in static const
variable declarations because they don't think it's an actual constant
value.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This adds some missing return value checks for all uses of snprintf in
brw_performance_query.c. This also switches a use of strncpy + strncat
for snprintf for consistency and to avoid the chance of the strncpy
leaving an unterminated string in the dest buffer if the src is too
long.
This issue with strncpy was picked up by Coverity.
CID: 1402201
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Otherwise it'll be missing in the tarball and make distcheck will fail.
Fixes: 05dd4a1104 ("glapi: Generate GL API marshalling code from the XML.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Using $< in non-suffix make rules is a GNU extension. Explicitly use
the name of the python script to fix the build on OpenBSD.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabore.com>
The index passed to get_shared_memory_ptr is an attribute slot index,
i.e. the index of a vec4 within LDS. Therefore this must be scaled by
sizeof(vec4) to give the LDS byte offset.
Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: <mesa-stable@lists.freedesktop.org>
Avoid a buffer overflow in ac_nir_to_llvm.c's create_function when
using more than 4 descriptor sets. radv claims support for 8.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
According to dl_iterate_phdr man page first object visited is the
main program where dlpi_name is an empty string. This fixes segfault
on Android when using build-id as identifier.
Fixes: d4fa083e11 ("util: Add utility build-id code.")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Function droid_swap_buffers may get called without dri2_surf->buffer set,
in these cases we don't have a back buffer set either. Patch fixes segfault
seen with 3DMark that uses android.opengl.GLSurfaceView for rendering it's UI.
backtrace:
#00 pc 00013f88 /system/lib/egl/libGLES_mesa.so (droid_swap_buffers+104)
#01 pc 000117b2 /system/lib/egl/libGLES_mesa.so (dri2_swap_buffers+50)
#02 pc 000058b2 /system/lib/egl/libGLES_mesa.so (eglSwapBuffers+386)
#03 pc 00011329 /system/lib/libEGL.so (eglSwapBuffersWithDamageKHR+553)
#04 pc 000118e7 /system/lib/libEGL.so (eglSwapBuffers+55)
#05 pc 000754dc /system/lib/libandroid_runtime.so
v2: do like other backends, call get_back_bo (Emil Velikov)
Fixes: 2acc69d ("EGL/Android: Add EGL_EXT_buffer_age extension")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Apps can limit the size of the cache via VkAllocationCallbacks so we
can't be sure that both are always in the cache.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will allow us to use fallback in-memory and on-disk caches
should the app not provide a pipeline cache.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Otherwise we have a race condition between vbo calls in the
glthread and the _vbo_DestroyContext() call.
This fixes a bunch of piglit crashes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The Vulkan spec is fairly clear about when we should and should not
write query pool results. We're also supposed to return VK_NOT_READY if
VK_QUERY_RESULT_PARTIAL_BIT is not set and we come across any queries
which are not yet finished. This fixes rendering corruptions on The
Talos Principle where geometry flickers in and out due to bogus query
results being returned by the driver. These issues are most noticable
on Sky Lake GT4 2hen running on "ultra" settings.
Reviewed-By: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100182
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
There's not much point to having them or not having them but this
reduces some pointless diff from the version we can auto-generate
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The code for decoding structures and commands was almost identical.
The only differences are: we print dword headers for commands, and
we skip the first one (with the command opcode and lengths).
So, generalize decode_structure to add a starting DWord, and a flag
for printing the DWord headers, and reuse it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
handle_struct_decode() is just a wrapper around decode_structure()
with a NULL check. But the only caller already does that NULL check.
So, just use decode_structure() directly.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fix the build on OpenBSD by removing an uneeded include for asm/unistd.h.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
% pattern rules are a GNU extension. As there is only one file here
avoid patterns and globbing entirely to fix the build on non-GNU make.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
v2 [Emil Velikov: brw_oa.py dependency]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
In the odd case where a patch needs to be fixed, squash the appropriate
fix and document how. Add a note in the pre-release notes, such that
devs can quickly spot it.
v2: Grammar/typo fixes (Eric). Use upstream commit [SHA] as reference.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The only use of the header is to provide the _X_INLINE macro. We already
require (and provide where needed) 'inline', plus it's used in the file
already.
So replace the macro and drop the include. This fixes the build on
platforms which lack the header - from X-less Linuxes to Androids.
Fixes: 05dd4a1104 ("glapi: Generate GL API marshalling code from the XML.")
Reported-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100223
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
The registries were migrated to git and are now hosted on GitHub.
The old svn is now read-only, and will not be updated anymore.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Specifically, report 'out of memory' errors that might have happened while
emitting the pipeline's batch.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
These can fail to allocate device memory, however, the driver can recover
from this error by allocating a new binding table block and trying again.
v2:
- Instead of tracking the errors in these functions and making callers
reset the batch's status before attempting to allocate a new block
for the binding table, simply make callers responsible for setting
the error status if they fail to allocate memory during the second
attempt (Jason).
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Also, we had a couple of instances in flush_descriptor_sets() were
we were returning a VkResult directly upon error, but the return
value of this function is not a VkResult but a uint32_t dirty mask,
so simply return 0 in these cases which reduces the amount of
work the driver will do after the error has been raised.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Instead of asserting inside the function, and then use use that information
to return early from its callers upon failure.
v2:
- Make sure that clear_color_attachment() and
clear_depth_stencil_attachment() get the VkResult as well so they
avoid executing the batch if an error happened. (Topi)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Any errors that may have happened during the command buffer recording are
reported by vkEndCommandBuffer() and it is the application's reponsibility
to not submit broken commands to a queue.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
v2: Assert on secondary commands, applications should've called
vkEndCommandBuffer() and received an error for them before (Jason)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Growing the reloc list happens through calling anv_reloc_list_add() or
anv_reloc_list_append(). Make sure that we call these through helpers
that check the result and set the batch error status if needed.
v2:
- Handling the crashes is not good enough, we need to keep track of
the error, for that, keep track of the errors in the batch instead (Jason).
- Make reloc list growth go through helpers so we can have a central
place where we can do error tracking (Jason).
v3:
- Callers that need the offset returned by anv_reloc_list_add() can
compute it themselves since it is extracted from the inputs to the
function, so change the function to return a VkResult, make
anv_batch_emit_reloc() also return a VkResult and let their callers
do the error management (Topi)
v4:
- Let anv_batch_emit_reloc() return an uint64_t as it originally did,
there is no real benefit in having it return a VkResult.
- Do not add an is_aux parameter to add_surface_state_reloc(), instead
do error checking for aux in add_image_view_relocs() separately.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Most of the time we use macros that handle this situation transparently,
but there are some cases were we need to handle this explicitly.
This patch makes sure we don't crash, notice that error handling takes
place in the function that actually failed the allocation,
anv_batch_emit_dwords(), which will set the status field of the batch
so it can be used at a later moment to report the error to the user.
v2:
- Not crashing is not good enough, we need to keep track of the error
(Topi, Jason). Iago: now that we track errors in the batch, this
is being handled.
- Added guards in a few more places that needed it (Iago)
v3:
- Check result of anv_batch_emitn() for NULL before calling memset()
in emit_vertex_input() (Topi)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The anv_batch_set_error() helper will track the first error that happened
while recording a command buffer. The helper returns the currently tracked
error to help the job of internal functions that may generate errors that
need to be tracked and return a VkResult to the caller.
We will use the anv_batch_has_error() helper to guard parts of the driver
that are not safe to execute if an error has been generated while recording
a particular command buffer.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The vkCmd*() functions do not report errors, instead, any errors should be
reported by the time we call vkEndCommandBuffer(). This means that we
need to make the driver robust against incosistent and/or imcomplete
command buffer states through the command recording process, particularly,
avoid crashes due to access to memory that we failed to allocate previously.
The strategy used to do this is to track the first error ocurred while
recording a command buffer in the batch associated with it. We use the
batch to track this information because the command buffer may not be
visible to all parts of the driver that can produce errors we need to be
aware of (such as allocation failures during batch emissions).
Later patches will use this error information to guard parts of the driver
that may not be safe to execute.
v2: Move the field from the command buffer to the batch so we can track
errors from batch emissions (Jason)
v3: Registering errors in the command buffer's batch during
anv_create_cmd_buffer() is unnecessary, since the command buffer
is freed at the end of the function in that case (Topi)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This situation can happen if we failed to allocate memory for the shader.
v2:
- We shouldn't see NULL shaders in anv_shader_bin_ref so we should not check
for that (Jason). Make sure that callers don't attempt to call this
function with a NULL shader and assert that this never happens (Iago).
v3:
- All callers to anv_shader_bin_unref seem to check for NULL before calling,
so just assert that it is not NULL (Topi)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The function is defined right after the prototype declaration. Also, the
protoype for it is included in anv_genX.h which is included via anv_private.h.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
While a context only has a single glthread, the context itself can be
attached to several threads. Therefore the dispatch table must be
updated in all threads before the destruction of glthread. In others
words, glthread can only be destroyed safely when the context is deleted.
Fixes remaining crashes in the glx-multithread-makecurrent* tests.
V2: (Timothy Arceri) updated gl_API.dtd marshal_fail description.
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
We want to support glthread on GLES contexts with reasonable apps, and on
desktop for apps that use VBOs but haven't completely moved to core GL.
To do so, we have to deal with the "the user may or may not pass user
pointers to draw calls" problem.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
glBegin() swaps dispatch tables, and we don't have any code in place for
handling that in glthread (which also messes with dispatch tables), and I
don't particularly care to at this point.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
The threading for GL core is in place, but there are so few applications
actually using a core GL context that it would be nice to extend support
back. However, some of the features of compat GL (particularly user
vertex arrays) would be so expensive to track state for that we want to be
able to disable threading when we discover that the app is using them.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
This avoids an extra pointer dereference in the marshalling functions,
which, with the instruction count doing in the low 30s, could actually
matter for main-thread performance.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
These don't actually read data out of the pointers, they set the
pointers (or offsets in a VBO) to be used in a later draw call.
v2: Don't forget glVertexAttribIPointer, and don't bother with annotations
on aliases.
v3: Mark CompressedTexSubImage1D as sync also.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
v2: Rebase on the Begin/End changes, and just disable this feature on
non-GL-core.
v3: (Timothy Arceri) enable for non-GL-core contexts. Remove
unrelated safe_mul() hunk. while loop style fix.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
This patch splits the context's CurrentDispatch pointer into two
pointers, CurrentClientDispatch, and CurrentServerDispatch, so that
when doing multithread marshalling, we can distinguish between the
dispatch table that's being used by the client (to serialize GL calls
into the marshal buffer) and the dispatch table that's being used by
the server (to execute the GL calls).
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
v2: Keep an allocated buffer around instead of checking for one at the
start of every GL command. Inline the now-small space allocation
function.
v3: Remove duplicate !glthread->shutdown check, process remaining work
before shutdown.
v4: Fix leaks on destroy.
V5: (Timothy Arceri) fix order of source files in makefile
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
This is not yet used in the build, just generated.
v2: Add missing build dependencies.
v3: Avoid mixing declarations and code, remove logic for avoiding emitting
code that the compiler's optimizer can deal with anyway.
v4: (Timothy Arceri) move safe_mul() genereation here from a later patch.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Without doing some additional tracking, we won't know whether the data
will be immediate user data, or will be loaded from a PBO. The normal
teximage functions will be sync by default because they don't know up
front what the size of their image data is. But for compressed teximage,
we have the count information, so they would end up async by default.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Several API functions require special treatment in order to be marshalled
to a background thread. Others can't be safely executed in a background
thread and need to be executed synchronously (e.g. since they return data
through a pointer argument).
This annotation will be used when code generating thread marshalling code,
to ensure that each function is marshalled in the correct way.
Note that PixelMap functions are marked as synchronous for now since
their pointer may be relative to buffer on the GPU, so we'll need
special logic to marshal them properly.
v2: Move description of attribute types to a comment in the dtd file.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
This makes bin/gl-3.2-layered-rendering-gl-layer-render fail only with
2DMS_ARRAY, which is expected given the lackluster MSAA support. However
all the regular types pass.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Currently the GLSL-to-TGSI translation pass assumes it can use
floating point source modifiers on the UCMP instruction. See the bug
report linked below for an example where an unrelated change in the
GLSL built-in lowering code for atan2 (e9ffd12827)
caused the generation of floating-point ir_unop_neg instructions
followed by ir_triop_csel, which is translated into UCMP with a negate
modifier on back-ends with native integer support.
Allowing floating-point source modifiers on an integer instruction
seems like rather dubious design for a transport IR, since the same
semantics could be represented as a sequence of MOV+UCMP instructions
instead, but supposedly this matches the expectations of TGSI
back-ends other than tgsi_exec, and the expectations of the DX10 API.
I take no responsibility for future headaches caused by this
inconsistency.
Fixes a regression of piglit glsl-fs-tan-1 on softpipe introduced by
the above-mentioned glsl front-end commit. Even though the commit
that triggered the regression doesn't seem to have made it to any
stable branches yet, this might be worth back-porting since I don't
see any reason why the bug couldn't have been reproduced before that
point.
Suggested-by: Roland Scheidegger <sroland@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99817
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
cache_put() first creates a .tmp file and then tries to do eviction.
The recently added LRU eviction code selects non-empty directory with
the oldest access time, but that may easily be the one with just the
new .tmp file, especially on Linux where atime is updated lazily
(with "relatime" mount option, which is the default). So when cache is
small, if random doesn't hit another dir LRU keeps selecting the same
dir with just the .tmp and not deleting anything. To fix this (and the
tests), do eviction earlier.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
General protection and prevents us from smashing the stack
on the first clear state validation (a7b8d50bcb). Fixes crash
using icc.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This patch originally had i965 specific code and was named:
commit 61cd3c52b868cf8cb90b06e53a382a921eb42754
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Thu Oct 20 18:21:24 2016 -0700
gbm: Get modifiers from DRI
To accomplish this, two new query tokens are added to the extension:
__DRI_IMAGE_ATTRIB_MODIFIER_UPPER
__DRI_IMAGE_ATTRIB_MODIFIER_LOWER
The query extension only supported 32b queries, and modifiers are 64b,
so we needed two of them.
NOTE: The extension version is still set to 13, so none of this will
actually be called.
v2: Error handling of queryImage (Emil)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Nothing special here other than a brief introduction to modifier
selection. Originally this was part of another patch but was split out
from
gbm: Introduce modifiers into surface/bo creation by request of Emil.
Requested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The idea behind modifiers like this is that the user of GBM will have
some mechanism to query what properties the hardware supports for its BO
or surface. This information is directly passed in (and stored) so that
the DRI implementation can create an image with the appropriate
attributes.
A getter() will be added later so that the user GBM will be able to
query what modifier should be used.
Only in surface creation, the modifiers are stored until the BO is
actually allocated. In regular buffer allocation, the correct modifier
can (will be, in future patches be chosen at creation time.
v2: Make sure to check if count is non-zero in addition to testing if
calloc fails. (Daniel)
v3: Remove "usage" and "flags" from modifier creation. Requested by
Kristian.
v4: Take advantage of the "INVALID" modifier added by the GET_PLANE2
series.
v5: Don't bother with storing modifiers for gbm_bo_create because that's
a synchronous operation and we can actually select the correct modifier
at create time (done in a later patch) (Jason)
v6: Make modifier condition outside the check so that dri_use will work
properly (Jason)
Cc: Kristian Høgsberg <krh@bitplanet.net>
References (v4): https://lists.freedesktop.org/archives/intel-gfx/2017-January/116636.html
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
This is just a stub for now and will be filled in later.
This was split out of an earlier patch
Requested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Modifiers will be obtained or guessed by the client and passed in during
image creation/import. In guessing, a client might decide to simply pass
along all known modifiers
This requires bumping the DRIimage version.
As of this patch, the modifiers aren't plumbed all the way down, this
patch simply makes sure the interface level stuff is correct.
v2: Don't allow usage + modifiers
v3: Make NAND actually NAND. Bug introduced in v2. (Jason)
v4:
- s/obtains/obtained (Jason)
- Pull out i965 imlemnentation into a later patch (Emil)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
This massively decreases VGPR spilling for DiRT Showdown, because we
no longer have to use v4i32 for 2D fetches when level == 0.
We now use v2i32 for those cases.
DiRT Showdown - Spilled VGPRs: -26 (-81%)
This surprisingly doesn't have any useful effect on performance (+ 0.05%).
Initially this was a workaround for a bug introduced in LLVM 4.0
in the SimplifyCFG pass that caused image instrinsics to disappear
(because they were badly sunk). Finally, this is a win because it
decreases SGPR spilling and increases the number of waves a bit.
Although, shader-db results are good I think we might want to
remove it in the future once the issue is fixed. For now, enable
it for LLVM >= 4.0.
This also fixes a rendering issue with the speedometer in Dirt Rally.
More information can be found here https://reviews.llvm.org/D26348.
Thanks to Dave Airlie for the patch.
v2: - add a FIXME comment
- use if (HAVE_LLVM >= 0x0400) instead
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99484
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97988
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Will also help when the src sampler register will be
TGSI_FILE_CONSTANT for bindless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
On platforms that require it, we bump the requirement to 0.4 or later.
Due to an issue with the project [design] any version earlier than it,
is bound to cause issues. For the specifics see the pthread-stubs README
Cc: Uli Schlachter <psychon@znc.in>
Cc: Jonathan Gray <jsg@jsg.id.au>
Cc: Jean-Sébastien Pédron <dumbbell@FreeBSD.org>
Cc: François Tigeot <ftigeot@wolfpond.org>
Cc: Tobias Nygren <tnn@NetBSD.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
drmGetDevices2() provides us with enough flexibility to build heuristics
upon. Opening a random node on the other hand will wake up the device,
regardless if it's the one we're interested or not.
v2: Rebase, explicitly require/check for libdrm
v3: Return VK_ERROR_INCOMPATIBLE_DRIVER for no devices (Ilia)
v4: Rebase
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
drmGetDevices2() provides us with enough flexibility to build heuristics
upon. Opening a random node on the other hand will wake up the device,
regardless if it's the one we're interested or not.
v2: Rebase.
v3: Return VK_ERROR_INCOMPATIBLE_DRIVER for no devices (Ilia)
Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
By this allows us to fetch the device list/info w/o the revision field.
At the moment retrieving the latter wakes up the device.
Note: kernel patch to resolve that should be in 4.10.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Unused/unchecked by any of the callers.
v2: Fix the glsl cases that have crept in since v1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Rather than having an extra memory allocation [that we currently do not
and act accordingly] just make the API take an pointer to a stack
allocated instance.
This and follow-up steps will effectively make the _mesa_sha1_foo simple
define/inlines around their SHA1 counterparts.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Using typedef(s) is not always the answer and makes it harder for people
to do clever (or one might call nasty) things with the code.
Add a struct name which we will use with follow-up commit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
This refactors out the code and fixes it up to be used
for images later. It uses the code in the current RAT binding
for compute.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the code to create CB info out into
a separate function so it can be reused in images
code to create RATs.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This refactors out the code to setup a texture resource
so we can reuse it later from the images code.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This takes the code required to initialise a buffer resource
out of the texture buffer code, into it's own function.
This is going to be used for the image support later.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In order to make ARB_shader_image_load_store, we have to share
the CB space with RATs, so we should only steal the dual src
space if we have dual src enabled.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The GL driver had a driconf option (which doesn't make much sense) and
the Vulkan driver had a hand-rolled environment variable. Instead,
let's tie both into the INTEL_DEBUG mechanism and unify things.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This makes it so that you don't get an "Implement gen7 HiZ" perf warning
when you manually disable HiZ on gen8.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Select higher of current 1G default or 10% of filesystem where
cache is located.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Currently only a one in one out eviction so if at max_size and
cache files were to constantly increase in size then so would the
cache. Restrict to limit of 8 evictions per new cache entry.
V2: (Timothy Arceri) fix make check tests
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Still using fast random selection of two-character subdirectory in
which to check cache files rather than scanning entire cache.
v2: Factor out double strlen call
v3: C99 declaration of variables where used
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
If we fail to randomly select a two letter cache dir, don't select
an empty dir on fallback.
In real world use we should never hit the fallback path but it can
be hit by tests when the cache is set to a very small max value.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
This should help reduce any overhead added by the shader cache
when programs are not found in the cache.
To avoid creating any special function just for the sake of the
tests we add a one second delay whenever we call dick_cache_put()
to give it time to finish.
V2: poll for file when waiting for thread in test
V3: fix poll delay to really be 100ms, and simplify the wait function
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
V2: Make a copy of the data so we don't have to worry about it being
freed before we are done compressing/writing.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
LLVM 4.0 released with a pretty messy regression, that hopefully
get fixed in the future.
This work around was proposed by Tom, and it fixes the CTS regressions
here at least, I'm not sure if this will cause any major side effects,
but correctness over speed and all that.
radeonsi should possibly consider the same workaround until an llvm
fix can be found.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fix is extracted from amdgpu-pro shader traces.
It appears the gather4 workaround for integer types doesn't
work for cubes, so instead if forces a float scaled sample,
then converts to integer.
It modifies the descriptor before calling the gather.
This also produces some ugly asm code for reasons specified
in the patch, llvm could probably do better than dumping
sgprs to vgprs.
This fixes:
dEQP-VK.glsl.texture_gather.basic.cube.rgba8*
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I couldn't really find an encoding in the spec. I'm not sure it
prescribes VK_MAKE_VERSION format, but vulkan.gpuinfo.org interprets
it that way by default. vulkaninfo gives the raw number, so we could
alternatively do something like 17001000, but that doesn't show
up right on vulkan.gpuinfo.org again. Looking at that site, the -pro
driver also uses VK_MAKE_VERSION, so keeping consistency is probably
best.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
I've skimmed to changes from 1.0.5 to 1.0.42 and I think we have all
changes. We're still not conformant ofcourse, but this should not
regress stuff,
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Need to flush before updating the buffer to ensure that the copy is
ordered after previous accesses (assuming the app has performed the
appropriate barriers).
This fixes potential issues due to draws prior to an update reading
the new buffer content, despite having the necessary barriers between
them.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
BSD regex library doesn't support extended RE escapes (e.g. \+) and
shorthand character classes (e.g. \s, \S) and SVR4-style word
delimiters[1] (on DragonFly and NetBSD). Both GNU and BSD sed support
-E and -r to enable extended RE but OS X still lacks -r.
[1] https://www.illumos.org/issues/516
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com> (GNU sed)
Apart from avoiding some unneeded size cases, this shouldn't have any
actual functional impact.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The NIR story on conversion opcodes is a mess. We've had way too many
of them, naming is inconsistent, and which ones have explicit sizes was
sort-of random. This commit re-organizes things and makes them all
consistent:
- All non-bool conversion opcodes now have the explicit size in the
destination and are named <src_type>2<dst_type><size>.
- Integer <-> integer conversion opcodes now only come in i2i and u2u
forms (i2u and u2i have been removed) since the only difference
between the different integer conversions is whether or not they
sign-extend when up-converting.
- Boolean conversion opcodes all have the explicit size on the bool and
are named <src_type>2<dst_type>.
Making things consistent also allows nir_type_conversion_op to be moved
to nir_opcodes.c and auto-generated using mako. This will make adding
int8, int16, and float16 versions much easier when the time comes.
Reviewed-by: Eric Anholt <eric@anholt.net>
The original version was very convoluted and tried way too hard to not
just have the nested switch statement that it needs. Let's just write
the obvious code and then we know it's correct. This fixes a bunch of
missing cases particularly with int64.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
The original bit-size validation wasn't capable of properly dealing with
instructions with variable bit sizes. An attempt was made to handle it
by looking at source and destinations but, because the validation was
done in validate_alu_(src|dest), it didn't really have the needed
information. The new validation code is much more straightforward and
should be more correct.
Reviewed-by: Eric Anholt <eric@anholt.net>
We've always required bit sizes to match but the rules for number of
components have been a bit loose. You've never been allowed to source
from something with less components than you consume, but more has
always been fine. This changes the validator to require that they match
exactly. The fact that they don't always match has been a source of
confusion in NIR for quite some time and it's time we got rid of it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Using coord_components of the source texture is correct for everything
except cube maps where it's off by one.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Some SPIR-V texturing instructions pack more than the texture coordinate
into the coordinate source. We need to mask off the unused channels.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
In the near future we are going to require that the num_components in a
src dereference match the num_components of the SSA value being
dereferenced. To do that, we need copy_prop to not remove our MOVs from
a larger SSA value into an instruction that uses fewer channels.
Because we suddenly have to know how many components each source has,
this makes the pass a bit more complicated. Fortunately, copy
propagation is the only pass that cares about the number of components
are read by any given source so it's fairly contained.
Shader-db results on Sky Lake:
total instructions in shared programs: 13318947 -> 13320265 (0.01%)
instructions in affected programs: 260633 -> 261951 (0.51%)
helped: 324
HURT: 1027
Looking through the hurt programs, about a dozen are hurt by 3
instructions and the rest are all hurt by 2 instructions. From a
spot-check of the shaders, the story is always the same: They get a
vec4 from somewhere (frequently an input) and use the first two or three
components as a texture coordinate. Because of the vector component
mismatch, we have a mov or, more likely, a vecN sitting between the
texture instruction and the input. This means that the back-end inserts
a bunch of MOVs and split_virtual_grfs() goes to town. Because the
texture coordinate is also used by some other calculation, register
coalesce can't combine them back together and we end up with an extra 2
MOV instructions in our shader.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
For render passes with multiple subpasses on gen7, we only fast-clear at
the top but an input attachment use can cause us to do a resolve in the
middle of the render pass. Once we've done so, we are no longer have a
fast-cleared surface so we can just set aux_usage to NONE.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
fixes build error when brw_nir.h not found in the generated file
brw_nir_trig_workarounds.c.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This patch adds missing error-checking and fixes resource leak in
allocation failure path on anv_CreateDevice()
v2: Fixes from Jason Ekstrand's review
a) Add missing destructors for all of the state pools on allocation
failure path
b) Add missing destructor for batch bo pools on allocation failure path
v3: Fixes from Emil Velikov's review
Add missing destructor for queue and scratch_pool on allocation failure
path
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ported from radeonsi, pointed out by Tom.
"This prevents LLVM from using sext instructions for local memory
offsets and allows the backend to fold immediate offsets into the
instruction. This also prevents some incorrect code generation for
ptrtoint and inttoptr instructions."
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This must be set to ICD_LOADER_MAGIC by vkAllocateCommandBuffers, which
was being done when allocating a new buffer but not when reusing an
existing one in the cache. This would hit an assertion and crash in
debug builds of the Vulkan loader.
Fixes: 682248db45 ("radv: Cache command buffers in command pool.")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
No intended change in behavior. Just a refactor.
v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For
Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is a wrapper for a Vulkan output array. A Vulkan output array is
one that follows the convention of the parameters to
vkGetPhysicalDeviceQueueFamilyProperties().
v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For
Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes the following segmentation fault:
radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c
-> if (!bo->handle)
(gdb) bt
0 radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c
1 0x00007fffe73575de in radeon_cs_create_fence radeon_drm_cs.c
2 0x00007fffe7358c48 in radeon_drm_cs_flush radeon_drm_cs.c
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
wayland-drm-client-protocol.h is generated in builddir, so when
builddir != srcdir the header is not found, and compilation of
wsi_common_wayland.c will fail.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We have a performance problem with dynamic buffer descriptors. Because
we are currently implementing them by pushing an offset into the shader
and adding that offset onto the already existing offset for the UBO/SSBO
operation, all UBO/SSBO operations on dynamic descriptors are indirect.
The back-end compiler implements indirect pull constant loads using what
basically amounts to a texelFetch instruction. For pull constant loads
with constant offsets, however, we use an oword block read message which
goes through the constant cache and reads a whole cache line at a time.
Because of these two things, direct pull constant loads are much faster
than indirect pull constant loads. Because all loads from dynamically
bound buffers are indirect, the user takes a substantial performance
penalty when using this "performance" feature.
There are two potential solutions I have seen for this problem. The
alternate solution is to continue pushing offsets into the shader but
wire things up in the back-end compiler so that we use the oword block
read messages anyway. The only reason we can do this because we know a
priori that the dynamic offsets are uniform and 16-byte aligned.
Unfortunately, thanks to the 16-byte alignment requirement of the oword
messages, we can't do some general "if the indirect offset is uniform,
use an oword message" sort of thing.
This solution, however, is recommended for a few of reasons:
1. Surface states are relatively cheap. We've been using on-the-fly
surface state setup for some time in GL and it works well. Also,
dynamic offsets with on-the-fly surface state should still be
cheaper than allocating new descriptor sets every time you want to
change a buffer offset which is really the only requirement of the
dynamic offsets feature.
2. This requires substantially less compiler plumbing. Not only can we
delete the entire apply_dynamic_offsets pass but we can also avoid
having to add architecture for passing dynamic offsets to the back-
end compiler in such a way that it can continue using oword messages.
3. We get robust buffer access range-checking for free. Because the
offset and range are baked into the surface state, we no longer need
to pass ranges around and do bounds-checking in the shader.
4. Once we finally get UBO pushing implemented, it will be much easier
to handle pushing chunks of dynamic descriptors if the compiler
remains blissfully unaware of dynamic descriptors.
This commit improves performance of The Talos Principle on ULTRA
settings by around 50% and brings it nicely into line with OpenGL
performance.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
During initial CCS bring-up, I discovered that you have to do a full CS
stall prior to doing a CCS resolve as well as afterwards. It appears
that the same is needed for fast-clears as well. This fixes rendering
corruptions on The Talos Principle on Sky Lake GT4. The issue hasn't
been demonstrated on any other hardware however, given that this appears
to be a "too many things in the pipe" problem, having it be easier to
reproduce on a system with more EUs makes sense. The issues with
resolves is demonstrable on a GT3 or GT2 so this is probably also a
problem on all GTs.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
The number of dynamic descriptors is limited by both the number of
descriptors and the total number of dynamic things. Because there isn't
a single "maximum dynamic things" limit, we need to divide by two so
that they can create the maximum of both UBOs and SSBOs.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
Rely on nir for optimization, to reduce compile times. Very minimal impact
on shader-db:
total instructions in shared programs: 104170 -> 104199 (0.03%)
total dwords in shared programs: 209664 -> 209728 (0.03%)
total full registers used in shared programs: 7156 -> 7161 (0.07%)
total half registers used in shader programs: 109 -> 109 (0.00%)
total const registers used in shared programs: 24222 -> 24224 (0.01%)
half full const instr dwords
helped 12 107 103 112 98
hurt 11 104 105 115 102
But shader db runtime dropped from ~29.3s user to ~20.4s user.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This reduces the size of the aubinator binary from ~1.4Mb to ~700Kb.
With can now drop the checks on xxd in configure.
v2: Fix incorrect makefile dependency (Lionel)
v3: use $(PYTHON2) (Emil)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
compiler/brw_vec4_gs_visitor.cpp:744:39: error:
‘GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES’ was not declared in this scope
output_vertex_size_bytes <= GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES);
Fixes: d0d4a5f43b ("i965: split EU defines to brw_eu_defines.h")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The project is a thing only for BSD platforms. Or in other words - for
any other platforms building/installing pthread-stubs results only in a
pthread-stub.pc file.
And even where it provides a DSO, there's a fundamental design issue
with it - see the pthread-stubs mailing list for the specifics.
v2: Update comment above the switch statement (Jon Turney).
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Acked-by: Gary Wong <gtw@gnu.org>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Randy Fishel <randy.fishel@oracle.com>
Cc: Niveditha Rau <niveditha.rau@oracle.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
As of last few commits we have the two split, thus we no longer require
the i965 in order to have the ANV driver.
Even though ANV does not link against libdrm nor libdrm_intel, we still
require those as dependencies due to the headers they provide.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2 [Emil Velikov]
- Various fixes and initial stab at the Android build.
- Keep the generation rules/EXTRA_DIST outside the conditional
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At the moment all the tests but test_eu_compact are actual C++ gtests.
To simplify things, we can move the gtest.la to the common TEST_LIBS.
As we're here, we can rename change the test extension [to .cpp] to
avoid using the confusing dummy.cpp.
Add a nice comment in the makefile for posterity.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The test/binary was removed back in 2012. With that one gone, we can
drop the .gitignore file all together.
Cc: Eric Anholt <eric@anholt.net>
Fixes: c885039442 ("i965: Drop the missing symbols link test.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Mostly a dummy git mv with a couple of noticable parts:
- With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
- Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
- brw_util.[ch] is not really compiler specific, so it's moved to i965.
v2:
- move brw_eu_defines.h instead of brw_defines.h
- remove no-longer applicable includes
- add missing vulkan/ prefix in the Android build (thanks Tapani)
v3:
- don't list brw_defines.h in src/intel/Makefile.sources (Jason)
- rebase on top of the oa patches
[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Split out the EU defines from the 'generic' ones, as the former are more
compiler oriented.
With a later commit we'll move brw_eu_defines.h alongside the compiler
infra to src/intel/. Pulling all the defines in there seems overzealous.
Some defines are used by both i965 and the i965 compiler. Those are
moved to brw_eu_defines.h, and annotated accordingly. The i965 users
were updated to have the extre include to indicate that.
With future work we might provide a better, split but for now this seems
reasonable.
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Otherwise we'll get errors such as
error: conflicting types for ‘ffs’
error: conflicting types for ‘ffsll’
We might want to improve the heuristics and provide a definition only
when a native one is missing. We can address that at a later stage.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
File is using MI_LOAD_REGISTER_IMM, GEN7_CACHE_MODE_1 and others as
defined in the header.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The follow three groups are not used by neither the DRI module nor the
compiler.
BRW_POLYGON_*_FACING
BRW_POLYGON_FACING_*
BRW_STATELESS_BUFFER_*
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Neither of the changed files requires the brw_program.h include. Since
we're about to move them [to src/intel/compiler] with the next commit
there's no point in having the include.
Let alone the very confusing compiler include directive
[-I${top_srcdir}/src/mesa/drivers/dri/i965/] that one would have to use.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Function was made static and moved to another header with earlier
commit.
Fixes: 760c8a1d95 ("i965: Make mark_surface_used a static inline in brw_compiler.h")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously, we were depending on EGL for generating the headers and
providing the protocol symbols. However, since neither Vulkan driver
actually wants to link against EGL, this is kind of pointless. It also
creates a weird build dependency.
v2 [Jason]
- Add missing wsi/ prefix, MKDIR_GEN
v3 [Emil Velikov]
- include BUILT_SOURCES/generation rules outside of conditional
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Unused and we'll rework the way wayland-drm-client-protocol.h is
generated with later commit.
v2 [Emil]
- Also remove wayland-client.h
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In some cases, we can end up calling WAYLAND_SCANNER even when
there's no binary. Do follow the other's approach set by
AX_PROG_FLEX/BISON and set the variable to :
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Strictly speaking things work as-is, but let's move the file alongside
the artefacts it references. Analogous to all other places in mesa.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Advertise 10bpp support if the driver supports decoding to a P016 surface.
v2: Advertise 10bpp for the decoder as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Mark Thompson <sw@jkqxz.net>
We support P010 and P016 as targets for 10bpp video decoding.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
No hardware I know off can actually support P010 natively. But we can easily
support P016 and as long as nobody decodes anything into the lower 6bits it
doesn't make any difference to P010.
v2: allow P0160 for post processing as well
v3: fix post processing once more
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
This makes debugging of decoding problems quite a bit easier.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Just use whatever the state tracker allocated.
v2: fix msb mode
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
The firmware expects the value in pixel not bytes. Didn't made a difference
so far because we only used 8bpp surfaces.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Same layout as NV12, but 16bit per channel instead of 8.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
Less IFETCH latency on misses. Shader code is write once read many,
so GTT doesn't make much sense anyway.
If it turns out to fragment the CPU visible VRAM too much, we can upload with SDMA.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This will help us move u_queue.c here eventually and also provide
string function wrappers for anyone wishing to port disk_cache.c
to windows.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is not used anywhere and Visual Studio looks to have
supported memmove() for a long time if not always.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Because we optimistically skip compiling shaders if we have seen them
before we may need to compile them later at link time if they haven't
yet been use in a specific combination to create a program.
Rather than always recompiling we take advantage of the
gl_compile_status enum introduced in the previous patch to only
compile when we have previously skipped compilation.
This helps with regressions in app start-up times on cold cache
runs, compared with no cache.
Deus Ex: Mankind Divided start-up times:
cache disabled: ~3m15s
cold cache master: ~4m23s
cold cache with this patch: ~3m33s
Acked-by: Marek Olšák <marek.olsak@amd.com>
This will allow us to tell if a shader really has been compiled or
if the shader cache has just seen it before.
Acked-by: Marek Olšák <marek.olsak@amd.com>
... so that we can avoid threading complications or unnecessary
compaction table initializations (which just consists of setting some
pointers based on devinfo->gen).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We don't use DRYRUN (and no others scripts have one) so just drop it.
This allows us to rework the loop to the more commonly used "git .... |
while read foo; do ... done"
That in itself gets rid of the only remaining bashism and we can toggle
the shebang to /bin/sh.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
All of those should be executed $PYTHON2/python2 [or equivalent] hence
why they are missing the execute bit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Seemingly there is nothing bash specific in these. The Debian
checkbashisms does not spot neither run in zsh.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The file is used to generate svgadump/svga_dump.c... in theory at least.
Atm. the file is checked in-tree but that is about to change later
commits.
As we get to that we'll use $PYTHON2 or equivalent as used throughout
the tree.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
All of the scripts are [must be] executed via $PYTHON2 [or equivalent]
hence why they are missing the execute bit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Nearly all the python scripts used in-tree are invoked via $PYTHON2 or
equivalent. As such having the execute bit not needed and generally
ill-advised.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This makes it easier/clearer as to:
- if the file should have the execute bit set (.py should not)
- do we need the shebang in the first place and if so what it should be
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Unlike stride, there was no previous offset getter, so it can be right
on the first try.
v2: Return EINVAL when plane is greater than total planes to make it
match the similar APIs.
Avoid leak after fromPlanar (Daniel)
Make sure when getting offsets we consider dumb images (Daniel)
v3: Use Jason's recommendation for handling the non-planar case.
v4: Return int64_t so we can get real errors
v5: Add an assertion for dumb BOs (Jason)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
v2: Preserve legacy behavior when plane is 0 (Jason Ekstrand)
EINVAL when input plane is greater than total planes (Jason Ekstrand)
Don't leak the image after fromPlanar (Daniel)
Move bo->image check below plane count preventing bad index succeeding (Daniel)
v3: Fix DRIimage leak (using Jason's recommended change)
Make plane 0 return planar stride. This might break legacy behavior (Jason)
v4: Move bogus hunk for get_handle_for_plane to the right patch (Jason)
Fix error handling path to be cleaner (Jason)
v5: Add assert for dumb BOs to make sure plane == 0 (Jason)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
v2: Make the error return be -1 instead of 0 because I think 0 is
actually valid.
v3: Set errno to EINVAL when the specified plane is above the total
planes. (Jason Ekstrand)
Return the bo's handle if there is no image ie. for dumb images like cursor (Daniel)
v4:
- Add assertions about plane == 0 (Jason)
- Add a comment about new restriction on planar dumb bo which is not an
earlier patch in the series.
- Correctly refactor from v2 in this patch; it ended up rebased into the
wrong patch.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
This will be used by clients that need to know the number of planes
allocated for them on behalf of the GL or other API. The best current
example of this is when an extra "plane" is allocated to store
compression data for the primary plane.
v2: Return 1 for cases where there is no image, ie. dumb bo (Daniel)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
As more GBM functionality support planes is being evaluated, it becomes
clear that a dumb bo can never actually be planar. It's questionable
whether it was ever feasible to do this, and later functionality will
implicitly assume a dumb BO is non-planar.
v2: Include stdbool.h
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Daniel Stone <daniels@collabora.com>
This adds support for exposing basic Observation Architecture
performance counters on Haswell.
This support is based on the i915 perf kernel interface which is used
to configure the OA unit, allowing Mesa to emit MI_REPORT_PERF_COUNT
commands around queries to collect counter snapshots.
To take into account the small chance that some of the 32bit counters
could wrap around for long queries (~50 milliseconds for a GT3 Haswell @
1.1GHz) the implementation also collects periodic metrics.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Avoiding lots of error prone boilerplate and easing our ability to add +
maintain support for multiple OA performance counter queries for each
generation:
This adds a python script to generate code for building up
performance_queries from the metric sets and counters described in
brw_oa_hsw.xml as well as functions to normalize each counter based on
the RPN expressions given.
Although the XML file currently only includes a single metric set, the
code generated assumes there could be many sets.
The metrics as described in XML get translated into C structures
which are registered in a brw->perfquery.oa_metrics_table hash table
keyed by the GUID of the metric set in XML.
v2: numerous python style improvements (Dylan)
v3: Makefile.am fixups (Emil)
v4: Pattern rule for codegen + orthogonal .c and .h rules (Robert)
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
In preparation for generating code from brw_oa_hsw.xml for describing OA
performance counter queries this adds some OA specific members to
brw_perf_query that our generated code will initialize:
- The oa_metric_set_id is the ID we will pass to
DRM_IOCTL_I915_PERF_OPEN, and is an ID got via sysfs under:
/sys/class/drm/<card>/metrics/<guid/id
- The oa_format is the OA report layout we will request from the kernel
- The accumulator offsets determine where the different groups of A, B
and C counters are located within an intermediate 64bit 'accumulator'
buffer.
Additionally brw_perf_query_counter now has 64bit or float _read()
callback members for OA counters.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
In preparation for generating code from the XML performance counter meta
data, this makes some additions to brw_context.h for this code to be
able to reference.
It adds a brw->perfquery.oa_metrics_table hash table for indexing built
up query descriptions by the GUID that is expected to be advertised by
the kernel (via sysfs) to be able to use that query.
It adds an 'OA_COUNTERS' brw_query_kind to be assigned to queries built
up by generated code.
It adds a brw->perfquery.sys_vars structure to have a consistent place
to represent the different system variables like $EuCoresTotalCount and
$EuSlicesTotalCount that are referenced by OA counter normalization
equations.
Although extending + referencing gen_device_info for these variables
was considered, these are some of the (mostly minor) reasons for
going with a dedicated structure:
- Currently we only need this info for the performance_query backend
and it might be a bit tedious to go back and initialize the state
for pre-Haswell devinfo structures.
- Considering the $SubsliceMask then the requirement for how multiple
per-slice masks are packed only comes from how the variables are
references by availability tests in XML, and might not be a good
general representation for tracking subslice masks if another use
case arises.
- If we used gen_device_info then we'd likely want to avoid making
assumptions about the C types during codegen and adding explicit
casts, while that's not necessary with a dedicated struct with all
members being uint64_t.
- This structure and the code for initializing it is currently shared
(just through copy & paste) with a few other projects dealing with
OA counters, and that's been convenient so far.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
In preparation for exposing Gen Observation Architecture performance
counters via INTEL_performance_query this adds an XML description for an
initial 'Render Metrics Basic Gen7.5' query and corresponding counters.
The intention is to auto generate code for building a query from these
counters as well as the code for normalizing the individual counters.
Note that the upstream for this XML data is currently GPU Top:
https://github.com/rib/gputop
The files are maintained under gputop-data/ and they are themselves
derived from files in an internal 'MDAPI XML' schema. There are scripts
under gputop-scripts/ and make rules in gputop-data/Makefile.xml for
maintaining these files.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Function arguments do not have an "origin" instruction, causing a
NULL-pointer dereference without this check.
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
There is no need to check sampler == 0 twice. This removes now
unused _mesa_lookup_samplerobj_locked().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Since blob is intended for serializing data, it's not a good idea to
leave padding holes with uninitialized data, which may leak heap
contents and hurt compression if the blob is later compressed, like
done by shader cache. Clear it.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Negating size_t on 32bit produces a 32bit result. This was effectively
adding values close to UINT_MAX to the cache size (the files are usually
small) instead of intended subtraction.
Fixes 'make check' disk_cache failures on 32bit.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Builtins are created once and allocated using their own private ralloc
context. When reparenting IR that includes builtins, we might be steal
bits of builtins. This is problematic because these builtins might now
be freed when the shader that includes then last is disposed. This
might also lead to inconsistent ralloc trees/lists if shaders are
created on multiple threads.
Rather than including builtins directly into a shader's IR, we should
include clones of them in the ralloc context of the shader that
requires them. This fixes double free issues we've been seeing when
running shader-db on a big multicore (72 threads) server.
v2: Also rename _mesa_glsl_find_builtin_function_by_name() to better
reflect how this function is used. (Ken)
v3: Rename ctx to mem_ctx (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was a hook I came up when trying to do the initial performance
counter work years ago. Nothing's used it for a long time, and the
upcoming performance counter support doesn't want it either.
So, goodbye render ring prelude.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The only way we write CMASK/DCC compressed textures through shaders
is fast clears and CMASK/DCC inits, which have their own flushes.
Hence the CB cache is always up to date.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
I think we should only flush right before an action (draw/dispatch etc.),
as otherwise it is too easy to issue redundant flushes.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Without stores, the only writes are fast clears, transfers and metadata
initialization, each of which have the appropiate invalidations already.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The data should always be in memory after a src flush.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Jason has patches to add validation to this area, this should fix
radv shaders.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This extension was enabled in commit 40dd45d0c6 ("i965: Enable
ARB_shader_atomic_counter_ops") but the commit failed to update the
release notes or features.txt. The release notes ship has sailed, since
the commit was in 13.0.
Math results land in r4, regardless of the condition. To implement them,
we just need to ensure that the results are moved out of r4 (as often
happens anyway, the values is live across another math instruction), so
that we can attach the condition to the MOV.
Fixes dEQP-GLES2.functional.shaders.random.all_features.fragment.93 and a
couple others, that were assertion failing that their conditions hadn't
been handled during the QIR->QPU stage.
This ended up confusing the scheduler for things like fabs (implemented as
fmaxabs x, x) or squaring a number, and it would try to avoid scheduling
them because it appeared more expensive than other instructions.
Fixes failure to register allocate in
dEQP-GLES2.functional.uniform_api.random.3 with almost no shader-db
effects (+.35% max temps)
Currently when running mesa on imx6 the following loader warnings
are seen:
# kmscube -D /dev/dri/card1
MESA-LOADER: device is not located on the PCI bus
MESA-LOADER: device is not located on the PCI bus
MESA-LOADER: device is not located on the PCI bus
Using display 0x1920948 with EGL version 1.4
As this is not an error message, change it to debug level in
order to have a cleaner log output.
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Adding libmesa_amd_common dependency and exporting its headers,
avoids the following building error:
external/mesa/src/gallium/drivers/r600/evergreen_compute.c:29:10: fatal error: 'ac_binary.h' file not found
^
1 error generated.
Fixes: 3bbbb63 "automake: r600: radeonsi: correctly manage libamd_common.la linking"
Fixes: 503fb13 "radeon/ac: switch to ac_shader_binary_config_start()"
v2 [Emil Velikov: drop unneeded LOCAL_EXPORT_C_INCLUDE_DIRS]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Earlier commit added extra tracking and we've attempted to remove the
vdpau/other folder if empty. V2 of said commit dropped the pipe
to /dev/null and the explicit "true" override.
Sadly both of those are needed since there's no guarantee that the
folder will be empty before we [mesa] make install.
Since we're bringing those two back, there's no need to track if we've
installed anything, and simply do "rm -d foo/ &>/dev/null || true"
Tested-by: Andy Furniss <adf.lists@gmail.com>
Reported-by: Andy Furniss <adf.lists@gmail.com>
Fixes: 1cd4fde053 ("gallium/targets: don't leave an empty target directory(ies)")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
This is what comment above definition says and change fixes issue with
32bit build where BLOCK_POOL_MEMFD_SIZE is used as ftruncate parameter
and constant currently gets converted from 4294967296 to 0.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With mesa/drm commit cd2f91e18db087edf93fed828e568ee53b887860
Author: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Date: Fri Jul 31 10:47:50 2015 -0700
intel: Drop aub dumping functionality
the drm_intel_aub routines are mere stubs and do nothing. Likewise
remove our invocations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should let Dota 2 run on debug builds though it will spew errors
like mad. Hopefully, Valve will get this fixed sooner rather than
later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Over the course of driver development, we've come up with a number of
different schemes for adding giant blocks of asserts inside the driver.
This one is only being used once in anv_pipeline.c and the way it's
being used actually generates compiler warnings in release builds. This
commit drops the anv_validate macro and just puts the contents of the
one validation function in side of a "#ifdef DEBUG" guard.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Except for a few unimplemented things on gen7, we don't really have
stubs anymore so we should drop this. This commit replaces the few gen7
stub() calls with explicitly labeled finishme's and makes the sparse
binding stuff silently no-op or return a FEATURE_NOT_PRESENT error.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This acts identically to anv_finishme except that it only dumps out
these nice log messages if you run with INTEL_DEBUG=perf.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes a GCC warning when compiling with -Wextra:
radv_device.c:463:47: warning: initialized field overwritten [-Woverride-init]
Signed-off-by: Damien Grassart <damien@grassart.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The dynamic_offset_offset in the descriptor set binding layout is
relative to the dynamic_offset_start for the set in the pipeline
layout.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
A buffer descriptor is 16 bytes, not 16 dwords.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I was already tired of seeing the message
Package libomxil-bellagio was not found in the pkg-config search path.
Perhaps you should add the directory containing `libomxil-bellagio.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libomxil-bellagio' found
on every configure, but I just got a distro bug reported where the user
was confused by this message and thought it indicated a bug.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Available since pkg-config-0.28 and pkgconf-0.8.10.
The removal of the AC_PATH_PROG is intentional. Use pkg-config.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
It printed the version of LLVM ($1):
configure: error: 3.6.0 requires libelf when using llvm
instead of the driver name ($2):
configure: error: r600 requires libelf when using llvm
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
Still not sure we can support miptrees when sampling from
HTILE enabled textures.
Added the tcCompatible winsys stuff while I'm at it.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This fixes:
dEQP-VK.pipeline.render_to_image.3d.huge.depth.r8g8b8a8_unorm
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Don't fast clear inside the meta loop as things get
confused, fixes a crash in:
dEQP-VK.api.copy_and_blit.resolve_image.whole_array_image.2_bit
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Ported from anv:
3d33a23e anv: Properly handle destroying NULL devices and instances
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This uses these in a few places, and fixes one or two
cases which were using da as 32-bit instead of bool.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This was made unnecessary with fd33a6bcd7.
This was mostly done with:
find ./src -type f -exec sed -i -- \
's:PIPE_THREAD_ROUTINE(\([^,]*\), \([^)]*\)):int\n\1(void \*\2):g' {} \;
With some small manual tidy ups.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
pipe_mutex_unlock() was made unnecessary with fd33a6bcd7.
Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_unlock(\([^)]*\)):mtx_unlock(\&\1):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
replace pipe_mutex_lock() was made unnecessary with fd33a6bcd7.
Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_lock(\([^)]*\)):mtx_lock(\&\1):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
pipe_mutex_destroy() was made unnecessary with fd33a6bcd7.
Replace was done with:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_destroy(\([^)]*\)):mtx_destroy(\&\1):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
pipe_mutex_init() was made unnecessary with fd33a6bcd7.
Replace was done using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_init(\([^)]*\)):(void) mtx_init(\&\1, mtx_plain):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We never actually used the resource streamer in any shipping build
of Mesa. We have no plans to do so in the future. We looked into
using it in Vulkan, and concluded that it was unusable. We're not
the only ones to arrive at the conclusion that it's not worth using.
So, drop the last vestiges of resource streamer support and move on.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Experimentation shows that without alignment factor gcc and clang choose
a factor of 16 even on IA-32, which doesn't match what malloc() uses (8).
The problem is it makes gcc assume the pointer is 16 byte aligned, so
with -O3 it starts using aligned SSE instructions that later fault,
so always specify a suitable alignment factor.
Cc: Jonas Pfeil <pfeiljonas@gmx.de>
Fixes: cd2b55e5 "ralloc: Make sure ralloc() allocations match malloc()'s alignment."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100049
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Tested by: Mike Lothian <mike@fireburn.co.uk>
Tested by: Jonas Pfeil <pfeiljonas@gmx.de>
There are still some distributions trying to support unfortunate people
with old or exotic CPUs that don't have 64bit atomic operations. The
only thing preventing compile of the Intel driver for them seems to be
initialization of a debug variable.
v2: use call_once() instead of unsafe code, as suggested by Matt Turner
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93089
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
If we have any pending flushes on the primary command buffer, these
must be performed before executing the secondary buffer.
This fixes potential corruption when the contents of a subpass which
clears any of its render targets are given in a secondary buffer: the
flushes after a fast clear would not have been performed until the
vkCmdEndRenderPass call.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
_mesa_lookup_samplerobj() returns NULL if sampler is 0.
v2: use _mesa_lookup...(...) != NULL
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes the following assertion when the key is 0.
main/hash.c:181: _mesa_HashLookup_unlocked: Assertion `key' failed.
Fixes: 633c959fae ("getteximage: Return correct error value when texure object is not found")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Call site attributes are used since LLVM 4.0.
This also reverts commit b19caecbd6
"radeon/ac: fix intrinsic version check", because this is the correct fix.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Since all output buffers are whole frames, this should always be set.
Technically, setting this flag is is optional (see OpenMAX IL section
3.1.2.7.1), but some clients assume that it will be used and
therefore buffer indefinitely thinking that all output buffers are
fragments of the first frame when it is not set.
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
From OpenMAX IL section 4.3.5:
"The value of nIndex is the range 0 to N-1, where N is the number of
formats supported by the port. There is no need for the port to
report N, as the caller can determine N by enumerating all the
formats supported by the port. Each port shall support at least one
format. If there are no more formats, OMX_GetParameter returns
OMX_ErrorNoMore (i.e., nIndex is supplied where the value is N or
greater)."
Only one format is supported, so N = 1 and OMX_ErrorNoMore should be
returned if nIndex >= 1. The previous code here would return the
same format for all values of nIndex, resulting in an infinite loop
when a client attempts to enumerate all formats.
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
The VAAPI documentation is not very clear here, but the intent
appears to be that a forward reference is forward from a frame in the
past, not forward to a frame in the future (that is, forward as in
forward prediction, not as in a forward reference in source code).
This interpretation is derived from other implementations, in
particular the i965 driver and the gstreamer client.
In order to match those other implementations, this patch swaps the
meaning of forward and backward references as they currently appear
for motion-adaptive deinterlacing.
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested with ffmpeg and gst-vaapi. Without this bits per
frame is set way too low for fractional framerates.
v2: Mark Thompson: simplify calculation.
Use float.
Signed-off-by: Andy Furniss <adf.lists@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Mainly to avoid gcc's complains about uninitialized ptr and offset use
later in that code.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Use the same helpers as for other handle<->pointer conversions.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
So that we don't keep allocating BOs for the IBs and upload buffers.
We run some risk of memory increase with e.g. a bimodal size
distribution of command buffers, but I haven't noticed a significant
increase with dota2 and talos.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This reverts commit 0f60c6616e.
Piglit and all games tested so far seem to be working without
issue. This change will allow wide user testing and we can decided
before the next release if we need to turn it off again.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously we were deleting the entire cache if a user switched
between 32 and 64 bit applications.
V2: make the check more generic, it should now work with any
platform we are likely to support.
V3: Use suggestion from Emil to make even more generic/fix issue
with __ILP32__ not being declared on gcc for regular 32-bit builds.
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Don't flush multiple times if we clear multiple attachments. Also allows
doing the depth clear in parallel with the fast color clears.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
GS implementation uses the masked.{gather,store} intrinsics,
introduced in llvm-3.9.0. swr llvm version requirement in
automake and scons now match (scons already needed >= 3.9).
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
The OpenGL 4.5 specification's description of TexBuffer says:
"The number of texels in the texture image is then clamped to an
implementation-dependent limit, the value of MAX_TEXTURE_BUFFER_SIZE."
We set GL_MAX_TEXTURE_BUFFER_SIZE to 2^27. For buffers with a byte
element size, this is the maximum possible size we can encode in
SURFACE_STATE. If you bind a buffer object larger than this as a
texture buffer object, we'll exceed that limit and hit an isl assert:
assert(num_elements <= (1ull << 27));
To fix this, clamp the size in bytes to MaxTextureSize / texel_size.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Earlier commit was picked from a larger series, but did not consider
that it removed the vulkan <> wayland-drm interdependency.
Rather than reverting everything, temporarily move wayland-drm further
up to resolve the issue. Since it [wayland-drm] does not have any
in-mesa dependencies that's perfectly safe.
Cc: Vedran Miletić <vedran@miletic.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100060
Fixes: e135ce6f08 ("vulkan: Build common Vulkan code earlier")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Javier Jardón <jjardon@gnome.org>
Fixes a series of libz related building errors:
target SharedLib: gallium_dri_32
(out/target/prod...SHARED_LIBRARIES/gallium_dri_intermediates/LINKED/gallium_dri.so)
external/elfutils/libelf/elf_compress.c:117: error: undefined reference to 'deflateInit_'
...
external/elfutils/libelf/elf_compress.c:244: error: undefined reference to 'inflateEnd'
clang++: error: linker command failed with exit code 1 (use -v to see
invocation)
Fixes: 85a9b1b "util/disk_cache: compress individual cache entries"
See detailed explanation of why this is needed in commit eb60a89bc3.
This spot was missed/overlooked. Basically as a result of the fact
that BEGIN_* ends up calling PUSH_SPACE, which in turn adds an extra 8
to the requested amount, we have to be mindful of that when doing bare
nouveau_pushbuf_space calls.
Reportedly this fixes some crashes when replaying a hitman trace taken
on radeonsi.
Fixes: eb60a89bc3 ("nouveau: take extra push space into account for pushbuf_space calls")
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reported-by: Karol Herbst <nouveau@karolherbst.de>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
When binding as textures, the alignment can be 16. However when binding
as an image, the address has to be aligned to 256. (Also when binding as
an RT, but that can't happen with GL or current gallium APIs.)
Reported-by: Roy Spliet <nouveau@spliet.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
when files are being generated the value of $intermediates var content can be
completely random, this makes sure that outdir is the wanted one.
Fixes: 3f2cb699 ("android: vulkan: add support for libmesa_vulkan_util")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Some drivers do not support certain targets - for example nouveau
doesn't do VAAPI, while freedreno doesn't do of the video backends.
As such if we enter vdpau when building freedreno/ilo/etc, a vdpau/
folder will be created, empty library will be build and almost
immediately removed. Thus keeping an empty vdpau/ folder around.
There are two ways to fix this.
* add substantial tracking in configure/makefiles so that we never end
up in targets/vdpau
Downsides:
Error prone, as the configure checks and the 'include
gallium/drivers/foo/Automake.inc' can easily get out of sync.
* remove the folder, if empty, alongside the empty library.
Downsides:
In the latter case vdpau/ might be empty before the mesa build has
started, yet we'll remove it either way.
This patch implements the latter option, as the downside isn't that
significant, plus the patch is way shorter ;-)
v2: use has_drivers to track since TARGET_DRIVERS can contain space,
hence neither string comparison nor -n/-z works correctly.
Gentoo Bugzilla: https://bugs.gentoo.org/545230
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The previous implementation was fine for GLSL which doesn't really have
a signed modulus/remainder. They just leave the behavior undefined
whenever either source is negative. However, in SPIR-V, there is a
defined behavior for negative arguments. This commit beefs up the pass
so that it handles both correctly. Tested using a hacked up version of
the Vulkan CTS test to get 64-bit support.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is a work in progress - some things may still need fixing.
But it should be in pretty decent shape.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Equivalent *TexSubImage* methods generates INVALID_ENUM.
From OpenGL 4.5 spec, section 8.6 Alternate Texture Image
Specification Commands:
"An INVALID_ENUM error is generated by *TexSubImage* if target does
not match the command, as shown in table 8.15."
And:
"An INVALID_OPERATION error is generated by *TextureSubImage* if
the effective target of texture does not match the command, as
shown in table 8.15."
Fixes:
GL45-CTS.direct_state_access.textures_copy_errors
v2: slightly change commit summary (Samuel)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Valgrind reports that the shader cache writes uninitialized data to disk.
Turns out ureg_get_tokens() is returning the count of allocated tokens
instead of how many are actually used, so the cache writes out unused
space at the end. Use the real count instead.
This change should not cause regressions elsewhere because the only
ureg_get_tokens() user that cares about token count is the shader cache.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
V2:
- when loading from disk cache also binary insert into memory cache.
- check that the binary loaded from disk is the correct size. If not
delete the cache item and skip loading from cache.
V3:
- remove unrequired variable
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This reduces the cache size for Deus Ex from ~160M to ~30M for
radeonsi (these numbers differ from Grigori's results below
probably due to different graphics quality settings).
I'm also seeing the following improvements in minimum fps in the
Shadow of Mordor benchmark on an i5-6400 CPU@2.70GHz, with a HDD:
no-cache: ~10fps
with-cache-no-compression: ~15fps
with-cache-and-compression: ~20fps
Note: The with cache results are from the second run after closing
and opening the game to avoid the in-memory cache.
Since we mainly care about decompression I went with
Z_BEST_COMPRESSION as suggested on irc by Steinar H. Gunderson
who has benchmarked decompression speeds.
Grigori Goronzy provided the following stats for Deus Ex: Mankind
Divided start-up times on a Athlon X4 860k with a SSD:
No Cache 215 sec
Cold Cache zlib BEST_COMPRESSION 285 sec
Warm Cache zlib BEST_COMPRESSION 33 sec
Cold Cache zlib BEST_SPEED 264 sec
Warm Cache zlib BEST_SPEED 33 sec
Cold Cache no compression 266 sec
Warm Cache no compression 34 sec
The total cache size for that game is 48 MiB with BEST_COMPRESSION,
56 MiB with BEST_SPEED and 170 MiB with no compression.
These numbers suggest that it may be ok to go with Z_BEST_SPEED
but we should gather some actual decompression times before doing
so. Other options might be to do the compression in a separate
thread, this might allow us to use a higher compression algorithim
such as LZMA.
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Previously, when q.subroutine was set to 1, a new subroutine
declaration was added to the AST, while 0 meant a subroutine
definition has been detected by the parser.
Thus, setting the q.subroutine flag in both situations is
obviously wrong because a new type identifier is added instead
of trying to match the declaration. To fix it up, introduce
ast_type_qualifier::is_subroutine_decl() to differentiate
declarations and definitions easily.
This fixes a regression with:
arb_shader_subroutine/compiler/direct-call.vert
Cc: Mark Janes <mark.a.janes@intel.com>
Fixes: be8aa76afd ("glsl: remove unecessary flags.q.subroutine_def")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100026
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Depending on the generated Makefile means that all generated sources are
recreated after ./configure.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
While an input attachment may only take on one of those two layouts,
other depth/stencil attachments that use the same image may have
HiZ-enabled layouts. Improves the average frame rate on a release
candidate of a proprietary Vulkan benchmark by 9.94% over 3 runs on my
SKL GT4.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We'll loop through this array when performing automatic layout
transitions.
v2: Adjust formatting of an assignment (Jason Ekstrand)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We will be using the image layout. Store the full struct directly from
the user.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Due to recent commits, the sampler now bypasses the auxiliary HiZ buffer
when reading from a depth image subresource that is in the general
layout. Remove this unneeded resolve.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will be used to sample a depth input attachment without having to
pass through the HiZ buffer.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
surf_usage is only useful to image views that may use HiZ buffers.
Storage image views don't use HiZ buffers.
v2: Update commit message and add an assertion.
Fixes: 055ff2ec52 ("anv: Replace anv_image_has_hiz() with ISL_AUX_USAGE_HIZ")
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Validate the inputs, verify that this image has a depth
buffer, use gen_device_info instead of
v2:
- Add parenthesis (Jason Ekstrand)
- Make parameters const
- Use gen_device_info instead of gen
- Pass aspect to missed function in transition_depth_buffer
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This function supersedes layout_to_hiz_usage().
v2:
- Don't find the optimal buffer for layout transitions (Jason Ekstrand).
- Pass the devinfo instead of the gen (Jason Ekstrand)
- Update the function documentation.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The header of ralloc needs to be aligned, because the compiler assumes
that malloc returns will be aligned to 8/16 bytes depending on the
platform, leading to degraded performance or alignment faults with ralloc.
Fixes SIGBUS on Raspberry Pi at high optimization levels.
This patch is not perfect for MSVC, as maybe in the future the alignment
for the most demanding data type might change to more than 8.
v2: Commit message reword/typo fix, and add a bigger explanation in the
code (by anholt)
Signed-off-by: Jonas Pfeil <pfeiljonas@gmx.de>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Recent change to st/mesa state update logic caused major regressions to
swr validation code.
swr uses the same validation logic (swr_update_derived) for both draw
and Clear calls. New st/mesa state update logic results in certain state
objects not being set/bound during Clear. This was causing null ptr
exceptions. Creation of static dummy state objects allows setting these
pointers during Clear validation, without interfering with relevant state
validation.
Once fixed, new logic also highlighted an error in dirty bit checking for
fragment shader and clip validation.
(The alternative is to have a simplified validation routine for Clear.
Which may do that at some point.)
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
During the first update of the hw_clear_state atoms, we may not yet
have a current rasterizer state object. So, svga->curr.rast may be
NULL and we crash.
Add a few null pointer checks to work around this. Note that these
are only needed in the state update functions which are called for
'clear' validation.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This allows us to allocate surface states from the command buffer when
pushing descriptor sets rather than allocating them through a
descriptor set pool.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In validate_DrawElements_common() we need to check for OES_geometry_shader
extension to determine if we should fail if transform feedback is
unpaused. However current code reads ctx->Extensions.OES_geometry_shader
directly, which does not take context version into account. This means
that if the context is GLES 3.0, which makes the OES_geometry_shader
inapplicable, we would not validate the draw properly. To fix it, let's
replace the check with a call to _mesa_has_OES_geometry_shader().
Fixes following dEQP tests on i965 with a GLES 3.0 context:
dEQP-GLES3.functional.negative_api.vertex_array#draw_elements
dEQP-GLES3.functional.negative_api.vertex_array#draw_elements_incomplete_primitive
dEQP-GLES3.functional.negative_api.vertex_array#draw_elements_instanced
dEQP-GLES3.functional.negative_api.vertex_array#draw_elements_instanced_incomplete_primitive
dEQP-GLES3.functional.negative_api.vertex_array#draw_range_elements
dEQP-GLES3.functional.negative_api.vertex_array#draw_range_elements_incomplete_primitive
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
One less set of enums. Dropped the #defines from brw_defines.h and ran:
$ for file in *.cpp *.c *.h; do sed -i \
-e 's/BRW_SURFACEFORMAT_/ISL_FORMAT_/g' \
-e 's/ISL_FORMAT_ASTC_[A-Zxs0-9_]*/\U&/g' $file; \
done
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
If we don't have pipelined register access (e.g. Haswell before kernel
v4.2), then we can only implement EXT_transform_feedback by reseting the
SO offsets *between* batches. However, if we do have pipelined access to
the SO registers on gen7, we can simply emit an inline reset of the SO
registers without a full batch flush.
v2 [by Ken]: Simplify after recent kernel feature detection changes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
According to the PRM description of the Depth field:
"This field specifies the total number of levels for a volume texture
or the number of array elements allowed to be accessed starting at the
Minimum Array Element for arrayed surfaces"
However, ISL defines array_len as the length of the range
[base_array_layer, base_array_layer + array_len], so it already represents
a value relative to the base array layer like the hardware expects.
v2: Depth is defined as a U11-1 field, so subtract 1 from
the actual value (Jason)
This fixes a number of new CTS tests that would crash otherwise:
dEQP-VK.pipeline.render_to_image.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The algorithms used by this pass, especially for division, are heavily
based on the work Ian Romanick did for the similar int64 lowering pass
in the GLSL compiler.
v2: Properly handle vectors
v3: Get rid of log2_denom stuff. Since we're using bcsel, we do all the
calculations anyway and this is just extra instructions.
v4:
- Add back in the log2_denom stuff since it's needed for ensuring that
the shifts don't overflow.
- Rework the looping part of the pass to be easier to expand.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Each of the pop functions (and push_else) take a control flow parameter as
their second argument. If NULL, it assumes that the builder is in a block
that's a direct child of the control-flow node you want to pop off the
virtual stack. This is what 90% of consumers will want. The SPIR-V pass,
however, is a bit more "creative" about how it walks the CFG and it needs
to be able to pop multiple levels at a time, hence the argument.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This is shared between the Vulkan and GL drivers as it's a requirement
of the back-end compiler. However, it doesn't really belong in the
compiler. We rename the file to match the prefix of the other stuff in
common and because libdrm defines an intel_debug.h and this avoids a
pile of possible name conflicts.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This hasn't been used for quite some time now but we never bothered to
get rid of it when we dropped GLSL IR support for vec4.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
One of these days, I'd like to see this function go away all together
but for now, let's at least put it near the struct it updates.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It does sort-of go with MAX_UBO and friends but MAX_DRAW_BUFFERS is an
actual hardware constant based on the number of things we can blend
rather than an arbitrary "number of things allowed in GL" like some of
the other maximums are.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While we're at it, we also change the GEN6 binding macro to be a start
index that gets added to the binding. This makes things a bit more
explicit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's currently in brw_util.c but that's the only bit of brw_util.c
that's shared between the compiler and the rest of the GL driver.
It's just a fairly obvious table so the duplication isn't bad. It's
certainly less pain than trying to figure out how to share the code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Vulkan doesn't respect MAX_SURFACES so this assert isn't valid in that
case. It should, however, assert that it isn't insanely large.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a relic of when we wired up meta to be able to use RECTLIST
primitives. It's no longer needed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This isn't used by Vulkan and is specific to the way the GL driver
works. There's no reason to have it in common compiler code. Also, it
relies on BRW_MAX_* defines which are defined in brw_context.h
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The zy swizzle gives us one component of quotient and one component of
remainder. What we wanted was zw for the remainder.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We're about to use the build-id as the starting point for another SHA1
hash in the Intel Vulkan driver, and returning a pointer is far more
convenient.
Reviewed-by: Chad Versace <chadversary@chromium.org>
The queryid_valid() function asserts that an ID given by an application
isn't zero since the spec explicitly reserves an ID of zero as invalid.
The implementation was written as if the ID was a signed integer and
based on the assumption that queryid_to_index() is simply subtracting
one from the ID. It was broken because in fact the ID was stored in an
unsigned int and testing for an index >= 0 would always succeed.
This adds a spec quote to clarify why zero is considered invalid and
checks for zero before even passing the ID to queryid_to_index() for
then checking the upper bound.
This is a v2 of a patch originally posted by Juha-Pekka (thanks)
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Fix usage of ac_add_function_attr() and make it known!
common/ac_nir_to_llvm.c: In function 'create_llvm_function':
common/ac_nir_to_llvm.c:265:4: error: implicit declaration of function
'ac_add_function_attr' [-Werror=implicit-function-declaration]
ac_add_function_attr(main_function, i + 1, AC_FUNC_ATTR_BYVAL);
^~~~~~~~~~~~~~~~~~~~
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The wl_drm interface (akin to X11's DRI2) uses the standard set of DRM
FourCC format codes. wl_shm copies this, except for ARGB8888/XRGB8888,
which use their own definitions.
Make sure we only use wl_shm format codes when we're working with
wl_shm. Otherwise, using swrast with 32bpp formats would fail with an
error.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Daniel Stone <daniels@collabora.com> (v1)
Fixes: cb5e799448 ("egl/wayland: unify dri2_wl_create_surface implementations")
v2: [Emil Velikov: move to dri2_wl_create_window_surface]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com> (IRC)
In the past, we used this on Gen4-5 to transform non-normalized texture
coordinates (for sampler2DRect) to normalized ones. We also used it on
Gen6-7.5 for sampler2DRect with GL_CLAMP.
Jason dropped this code in 6c8ba59cff
in favor of using nir_lower_tex(), which just does a textureSize()
call. But we were still setting up these state references for
useless uniform data.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
They can vary at call sites if the intrinsic is NOT a legacy SI intrinsic.
We need this to force readnone or inaccessiblememonly on some amdgcn
intrinsics.
This is only used with LLVM 4.0 and later. Intrinsics only used with
LLVM <= 3.9 don't need the LEGACY flag.
gallivm and ac code is in the same patch, because splitting would be
more complicated with all the LEGACY uses all over the place.
v2: don't change the prototype of lp_add_function_attr.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com> (v1)
Even though compute shaders cannot access the framebuffer, there is a
synchronization issue when a compute dispatch accesses a texture that
was previously bound and drawn to as a framebuffer.
Section 9.3 (Feedback Loops Between Textures and the Framebuffer) of
the OpenGL 4.5 spec rather implicitly clarifies that undefined behavior
results if the texture is still attached to the currently bound
framebuffer. However, the feedback loop is broken when the application
changes the framebuffer binding before a compute dispatch, and the
state tracker needs to let the driver known about this.
Fixes GL45-CTS.compute_shader.pipeline-post-fs on SI family Radeons.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
exec_node::get_prev() does not guard against going past the beginning
of the list, so we need to add explicit checks here.
Found by ASAN in piglit arb_shader_storage_buffer_object-rendering.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
radeon_llvm_check and friends were never called in the no-opencl case,
which ended up with an empty llvm module list. As --enable-opencl always
requires --enable-llvm, we can use the latter as the guard.
Signed-off-by: Marc Dietrich <marvin24@gmx.de>
[Emil Velikov: commit message polish]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This bit is definitely not necessary because subroutine_list
can be used instead. This frees one more bit in the flags.q
struct which is nice because arb_bindless_texture will need
4 bits for the new layout qualifiers.
No piglit regressions found (including compiler tests) with
"-t subroutine".
v2: set the subroutine flag for validating illegal flags
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Do not hardcode the file in the python script, but pass it via the build
system(s). The latter is the only one that should know about the file
location/tree structure.
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The following changes are implemented:
Add src/vulkan/Android.mk to build libmesa_vulkan_util
Android.mk: add src/vulkan to SUBDIR to build new module
intel/vulkan: fix libmesa_vulkan_util,vk_enum_to_str.h dependencies
Add -o OUTPUT_PATH option in src/vulkan/util/gen_enum_to_str.py script
Use -o OUTPUT_PATH option in automake generation rules for vk_enum_to_str.{c,h}
Fixes: e9dcb17 "vulkan/util: Add generator for enum_to_str functions"
Fixes: 8e03250 "vulkan: Combine wsi and util makefiles"
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
[Emil Velikov]
- Move parser within main()
- Use --outdir instead of -o
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Since both r600 and radeonsi use code from libamd_common they need to
static link it. At the same time, adding a common library to LIB_DEPS is
fragile [can lean to multiple symbol definitions] and non-obvious - I
had to do a double-take how things work atm.
So follow the libradeon.la approach and put common libraries in
TARGET_RADEON_COMMON
Fixes: 936f5407a7 ("gallium/radeon: Add libamd_common.a to TARGET_LIB_DEPS also for r600")
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Otherwise we'll fail to find the header and `make distcheck` will bail.
Fixes: e9dcb17962 ("vulkan/util: Add generator for enum_to_str functions")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
When generating the MOV INDIRECT instruction, the source type is ignored
and it is set to destination's type. However, this is going to change in a
later patch, so we need to explicitly set the proper source type.
brw_vec8_grf() creates an float type's fs_reg by default, when the
ICP handle is actually unsigned. This patch fixes these cases before
applying the aforementioned patch.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The lowered BSW/BXT indirect move instructions had incorrect
source types, which luckily wasn't causing incorrect assembly to be
generated due to the bug fixed in the next patch, but would have
confused the remaining back-end IR infrastructure due to the mismatch
between the IR source types and the emitted machine code.
v2:
- Improve commit log (Curro)
- Fix read_size (Curro)
- Fix DF uniform array detection in assign_constant_locations() when
it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT.
v3:
- Move changes in assign_constant_locations() to other patch.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Previously, if we had accesses with different sizes to the same uniform, we might not
push it aligned with the bigger one. This is a problem in BSW/BXT when we access
an array of DF uniform with both direct and indirect addressing because for the latter
we use 32-bit MOV INDIRECT instructions. However this problem can happen with other
generations and bitsizes.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
So we don't need to know about radv_sampler in ac_nir_to_llvm.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes build failure with --enable-opencl --enable-xvmc:
make[4]: Entering directory '/home/daenzer/src/mesa-git/mesa/build-amd64/src/gallium/targets/xvmc'
CXXLD libXvMCgallium.la
../../../../src/gallium/drivers/r600/.libs/libr600.a(evergreen_compute.o): In function `evergreen_create_compute_state':
/home/daenzer/src/mesa-git/mesa/build-amd64/src/gallium/drivers/r600/../../../../../src/gallium/drivers/r600/evergreen_compute.c:254: undefined reference to `ac_elf_read'
../../../../src/gallium/drivers/r600/.libs/libr600.a(evergreen_compute.o): In function `r600_shader_binary_read_config':
/home/daenzer/src/mesa-git/mesa/build-amd64/src/gallium/drivers/r600/../../../../../src/gallium/drivers/r600/evergreen_compute.c:189: undefined reference to `ac_shader_binary_config_start'
/home/daenzer/src/mesa-git/mesa/build-amd64/src/gallium/drivers/r600/../../../../../src/gallium/drivers/r600/evergreen_compute.c:189: undefined reference to `ac_shader_binary_config_start'
collect2: error: ld returned 1 exit status
Makefile:760: recipe for target 'libXvMCgallium.la' failed
Fixes: dc4c551a34 ("radeon/ac: switch from radeon_elf_read() to ac_elf_read()")
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com>
I have no idea why these were part of the compiler files. They're
miptree related code, and the compiler doesn't appear to use them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
For radeonsi we could probably switch to
ac_shader_binary_read_config(). However the functions have
diverged so just share this helper for now.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The read config functions are different for r600 and radeonsi so
we can't just share the one in amd common. So just share this
instead.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There was exactly one user of this, and I just removed it.
It also accessed an implicit global context, with no locking. This
meant that it was only safe if all callers of ralloc_autofree_context()
held the same lock...which is a pretty terrible thing for a utility
library to impose.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of using ralloc_autofree_context() to install an atexit()
handler to ralloc_free(glsl_type::mem_ctx), we can simply free them
from _mesa_glsl_release_types().
This is effectively the same, because _mesa_glsl_release_types() is
called from _mesa_destroy_shader_compiler(), which is called from Mesa's
one_time_fini() function, which Mesa installs as an atexit() handler.
The one advantage here is that it ensures the built-in functions are
destroyed before the types.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
this allows to pass the generated files directly to llc or bugpoint
v2: add atomic counter ID
v3: remove extra scope operator, constify
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
For blitting we need to use the depth or stencil format, never
the combined.
This fixes:
dEQP-VK.texture.shadow.2d.nearest.less_or_equal_d32_sfloat_s8_uint
and a few others.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These formats are used by some CTS tests, may as well fill them in.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is similar to what we do in the texture error codepath.
While we are at it, update the specification comment with
latest GL 4.5 spec.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This improves consistency with image variables and atomic
counters which are already rejected the same way.
Note that opaque variables can't be treated as l-values, which
means only the 'in' function parameter is allowed.
v2: rewrite commit message
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
The main idea behind this is to free some bits in the flags.q
struct because currently all 64-bits are used and we can't
add more layout qualifiers without reaching a static assert.
In order to do that (mainly for ARB_bindless_texture), use an
enumeration for the AMD_conservative_depth layout qualifiers
because it's forbidden to declare more than one depth qualifier
for gl_FragDepth.
Note that ast_type_qualifier::merge_qualifier() will prevent
using duplicate layout qualifiers by returning a compile-time
error.
No piglit regressions found (including compiler tests) with
RX480 on RadeonSI.
v2: use a switch case
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Andres Gomez <agomez@igalia.com> (v1)
Preliminary work for ARB_bindless_texture which can interact
with ARB_shader_image_load_store.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This game uses GLSL 430 but the interpolation qualifiers in
some shaders don't match, which ends up in a link error. GLSL
440 spec removed this restriction, force it.
This fixes the following link error, as well as serious
rendering problems.
error: vertex shader output `out_TEXCOORD1' specifies noperspective
interpolation qualifier, but fragment shader input specifies no
interpolation qualifier
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
If i-th thread could not be created it means we have i threads,
not i+1, because we start from 0.
Fixes: 404d0d5 "gallium/u_queue: add an option to have multiple worker threads"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Commit 4aea8fe ("gallium/u_queue: fix random crashes when the app calls
exit()") added a atexit handler which calls
util_queue_killall_and_wait() for each queue to stop the threads.
However the app is also free to use atexit handlers to clean up things,
leading to util_queue_destroy() call which will also call
util_queue_killall_and_wait() for the same queue again, causing threads
being joined twice, and that is undefined. This happens with libglut,
for example. A simple fix is to just set num_threads to 0 as there are
no more valid threads after util_queue_killall_and_wait() returns.
Fixes: 4aea8fe "gallium/u_queue: fix random crashes when the app calls exit()"
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Per spec, VK_QUERY_RESULT_64_BIT specifies the integer size and the
availability flag is an integer. We apparently handled this correctly
already for the copy to buffer case.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
PKT3_OCCLUSION_QUERY hangs when used in a nested IB. This only
calls it when in a primary command buffer and we change
GetQueryPoolResults to not need it. CmdCopyQueryPoolResults
still needs it so we break that behavior for secondary command buffers.
However, that would hang already and using an unitialized value is
better than a hang.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
This adds initial support for NV_dedicated_allocation, then
uses it for the wsi image/memory allocation paths internally
in the driver.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This bo->fd wasn't setting some stuff correctly that could
lead to crashes for anything using this path later.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a complete rewrite of my previous rfc patches.
This adds the ability to present to a different GPU that rendering
using a driver side operation that can copy from the tiled to
linear shared image.
This does prime support completely in the swapchain present code,
and each queue has a precreated command buffer for each image
and for the each queue family. This means presenting should work
on graphics and compute queues and transfer in the future.
v1.1: initialise needs_linear_copy in swapchain.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously, only the last error code was returned.
Using `set -e` makes the script quit on any unhandled error.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
This fixes 4a883966c1 where the
PIPE_CAP was removed.
Now USER_INDEX_BUFFERS are always enabled remove the check and only
check for cmst_active directly.
v2: Axel pointed out the code was still needed when cmst was inactive,
Rebase on master too
v3: Drop struct member user_ibufs also && fixup shortlog (Edward).
v4: Fix negation
v5: Use the right variable name csmt != cmst
Fixes: 4a883966c1 ("gallium: remove PIPE_CAP_USER_INDEX_BUFFERS")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99953
Reported-and-tested-by: Vinson Lee <vlee@freedesktop.org> (v1)
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Make use of common uploaders that landed recently to Mesa
v2: fixed formatting, broken due to thunderbird configuration
v3: per Axel comment: added a comment into NineDevice9_DrawPrimitiveUP
v4: per Axel comment: changed style of the comment
This reduces register pressure in both types of shaders, by reordering the
input loads from the var->data.driver_location order to whatever order
they appear first in the NIR shader. These instructions aren't
reorderable at our QIR scheduling level because the FS takes two in
lockstep to do an interpolation, and the VS takes multiple read
instructions in a row to get a whole vec4-level attribute read.
shader-db impact:
total instructions in shared programs: 76666 -> 76590 (-0.10%)
instructions in affected programs: 42945 -> 42869 (-0.18%)
total max temps in shared programs: 9395 -> 9208 (-1.99%)
max temps in affected programs: 2951 -> 2764 (-6.34%)
Some programs get their max temps hurt, depending on the order that the
load_input intrinsics appear, because we end up being unable to copy
propagate an older VPM read into its only use.
We need to be paying attention to optimization's impact on this -- even if
we reduce instruction count, increasing max temps in general is likely to
cause us to fail to register allocate on some shaders, which means that
those won't run at all.
CXX glsl/ast_to_hir.lo
glsl/ast_to_hir.cpp: In member function 'virtual ir_rvalue* ast_declarator_list::hir(exec_list*, _mesa_glsl_parse_state*)':
glsl/ast_to_hir.cpp:4846:42: warning: missing braces around initializer for 'unsigned int [16]' [-Wmissing-braces]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Andres Gomez <agomez@igalia.com>
If a VAO isn't bound and u_vbuf isn't enabled because of the Core profile,
we'll get user vertex buffers in drivers if we update vertex buffers
in glClear. So don't do that.
This fixes a regression since disabling u_vbuf for Core profiles.
Reviewed-by: Brian Paul <brianp@vmware.com>
The clip state is updated before VS, so it can be NULL for the first draw
call. Just remove the unnecessary dependency on st->vp.
Reviewed-by: Brian Paul <brianp@vmware.com>
Not needed. ddebug does the same thing. The limitation is that drivers
can only use pipe_resource::screen through pipe_resource_reference.
This unbreaks trace, because pipe_context uploaders aren't wrapped,
so trace doesn't understand buffers returned by them.
Reviewed-by: Brian Paul <brianp@vmware.com>
v3: split from the etnaviv patch; fix new_ib.buffer leak
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com> (VMware driver only)
the format of the rt can be different than the one of the texture, so must
propagate the format explicitly to the helper. Broken since
3f9c5d6244 (but unused by st/mesa).
This changes the way radv_entrypoints_gen.py works from generating a
table containing every single entrypoint in the XML to just the ones
that we actually need. There's no reason for us to burn entrypoint
table space on a bunch of NV extensions we never plan to implement.
RADV implements VK_AMD_draw_indirect_count, so add that to the list.
Port of 114c281e70
"and/entrypoints: Only generate entrypoints for supported features"
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Ideally would have caught these when adding the interface but this just
switches a few return types for the INTEL_performance_query backend
interface to bool instead of GLboolean.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Starting with the next commit, badly sorting this list will break the
eglGetProcAddress().
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This will allow us to make sure the list is always sorted in the next
commit.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Let's make that comment true.
If will also be necessary in a couple commits (using bsearch).
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Use the utility u_copy_nv12_from_yv12 to implement this similarly to
how it's been done in the VPAU state tracker. The old code mixed up
planes and fields and didn't correctly handle video surfaces in
interlaced format.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
mplayer likes putting YV12 data, and if there is a buffer format mismatch,
the vdpau state tracker would try to reallocate the video surface as an
YV12 surface. A virtual driver doesn't like reallocating and doesn't like YV12
surfaces, so if we can't support YV12, try an YV12 to NV12 conversion
instead.
Also advertize that we actually can do the getBits and putBits conversion.
v2: A previous version of this patch prioritized conversion before
reallocating. This has been changed to prioritize reallocating in this version.
Cc: Christian König <christian.koenig@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
v3: have util_clear_texture mirror the pipe function (Roland Scheidegger)
v2: rework util clear functions such that they operate on a resource
instead of a surface (Roland Scheidegger)
Creates a util_clear_texture function for implementing the GL_ARB_clear_texture
in softpipe and llvmpipe.
Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
This adds support to write to sample mask from the fragment shader.
We can optimise this later like radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This refactors out the sample index fixup between
txf and image load.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This follows the txf_ms code, I can't figure out why amdgpu-pro
doesn't do this in their shaders, they must know someone we don't.
This fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_id.*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The code was interpolating at the offset from the sample,
not the offset from the center. Also fix for persample interpolation
modes we should force the pixel center to be at the sample.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix issue with index buffers that do not contain a 0 index. 0 index
can be a non-valid index if the (copied) vertex buffers are a subset of the
user's (which happens because we only copy the range between min & max).
Core will use an index passed in from the driver to replace invalid indices.
Only do this for calls that contain non-zero indices, to minimize performance
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
cost.
For now, the cache key is all of FETCH_COMPILE_STATE.
Use new/delete for swr_vertex_element_state, since we have to call the
constructors/destructors of the struct elements.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
If a thread doesn't load GLSL IR from cache but does load TGSI
from cache (that was created by another thread) than it will
crash due to expecting gl_program_parameter_list to have been
restored from the GLSL IR cache and not be null.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
This just enables basic MSAA compression (no fast clears) for all
multisampled surfaces. This improves the framerate of the Sascha
"multisampling" demo by 76% on my Sky Lake laptop. Running Talos on
medium settings with 8x MSAA, this improves the framerate in the
benchmark by 80%.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Not all clear colors are valid. In particular, on Broadwell and
earlier, only 0/1 colors are allowed in surface state. No CTS tests are
affected outright by this because, apparently, the CTS coverage for
different clear colors is pretty terrible. However, when multisample
compression is enabled, we do hit it with CTS tests and this commit
prevents regressions when enabling MCS on Broadwell and earlier.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
v2: Instead of having the same block in isl_gen7,8,9.c add it
once into isl.c::isl_choose_image_alignment_el() instead.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The isl_surf_init call that each of these helpers make can, in theory,
fail. We should propagate that up to the caller rather than just
silently ignoring it.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
OpenGL allows the TCS to be missing and supplies an implicit passthrough
shader, but OpenGL ES does not (see section 7.3 of the ES 3.2 spec,
cited above in the code).
One open question is how to handle this for ARB_ES3_2_compatibility.
This patch raises the link error for all ES shading language programs,
but it might make sense to base it on the API. The approach taken in
this patch is more restrictive, but should still allow any valid ES
programs to work in GL.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Andres Gomez <agomez@igalia.com>
From IVB PRM, SURFACE_STATE::Height:
"For typed buffer and structured buffer surfaces, the number of
entries in the buffer ranges from 1 to 2^27 . For raw buffer
surfaces, the number of entries in the buffer is the number of bytes
which can range from 1 to 2^30."
The minimum value is 1, according to the spec. The spec quote
was already added into the code by 028f6d8317.
Fixes crashing tests under:
dEQP-VK.robustness.buffer_access.*
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
From ARB_post_depth_coverage:
"This extension allows the fragment shader to control whether values in
gl_SampleMaskIn[] reflect the coverage after application of the early
depth and stencil tests. This feature can be enabled with the following
layout qualifier in the fragment shader:
layout(post_depth_coverage) in;
Use of this feature implicitly enables early fragment tests."
And a bit later it also adds:
"early_fragment_tests" requests that fragment tests be performed before
fragment shader execution, as described in section 15.2.4 "Early Fragment
Tests" of the OpenGL Specification. If neither this nor post_depth_coverage
are declared, per-fragment tests will be performed after fragment shader
execution."
Fixes:
GL45-CTS.post_depth_coverage_tests.PostDepthSampleMask
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It will return the current variable ('var') or the earlier declaration ('earlier') in
case of redeclaration of that variable.
In order to distinguish between both, 'is_redeclaration' boolean will indicate in which
case we are.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The get_variable_being_redeclared() function can free 'var' because
a re-declaration of an unsized array variable can establish the size, so
we set the array type to the 'earlier' declaration and free 'var' as it is
not needed anymore.
However, the same 'var' is referenced later in ast_declarator_list::hir().
This patch fixes it by picking the ir_variable_mode from the proper
ir_variable.
This error was detected by Address Sanitizer.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99677
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Before releasing a shared context, flush the context
with ST_FLUSH_WAIT to make sure all commands are executed.
This ensures that rendering to any shared resources is completed
before they will be referenced by another context.
Fixes an intermittent flickering with Photoshop. (VMware bug# 1779340)
Reviewed-by: Brian Paul <brianp@vmware.com>
When st_context_flush() is called with ST_FLUSH_WAIT,
the function will return after the fence is completed.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes up the clip distance passing between the geometry
shader and the copy shader. It packs the clip and cull distances
into one or two consecutive slots, and avoids wasting space and
make sure the gs output and copy shader input agree on where
things are stored.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This works out the geometry shader clip/cull inputs separately
to the outputs, and uses that information to read from the ES->GS
ring buffer. It stores the clip/cull distances packed into one
or two slots. It fixes the es output emission and gs input
reading to match.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
As geom shaders can have different ones on entry and exit.
also move to uint8_t as these are never that big.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For prime support I need to access this, so move it in advance.
[airlied: fix int->uint32_t]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In some configurations the util directory is created when building out
of tree, but not others. This patch ensures that it's created.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-and-Tested-by: Mike Lothian <mike@fireburn.co.uk>
For gpu generations that use LLVM we create a timestamp string
containing both the LLVM and Mesa build times, otherwise we just
use the Mesa build time.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will be used to share the sha1 computed by the tgsi load
function with the tgsi write function.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We want to use this in the new tgsi shader cache so we move it here
and make it available externally.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If there was more than a single directory in the .cache/mesa dir
then it would only remove one (or none) of the directories.
Apparently Valgrind was also reporting:
Conditional jump or move depends on uninitialised value
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
This adds a python generator to produce enum_to_str functions for
Vulkan from the vk.xml API description. It supports extensions as well
as core API features, and the generator works with both python2 and
python3.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
sizeof(struct pipe_draw_info) = 104 -> 88
Also, vertices_per_patch is switched to ubyte, because it can't be more
than 32.
Seemed-reasonable-to: Roland Scheidegger
This fixes:
vdpauinfo: ../lib/CodeGen/TargetPassConfig.cpp:579: virtual void
llvm::TargetPassConfig::addMachinePasses(): Assertion `TPI && IPI &&
"Pass ID not registered!"' failed.
v2: use list_head, switch the call order in destroy
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This adds a bare-bones backend for the INTEL_performance_query extension
that exposes pipeline statistics.
Although this could be considered redundant given that the same
statistics are already available via query objects, they are a simple
starting point for this extension and it's expected to be convenient for
tools wanting to have a single go to api to introspect what performance
counters are available, along with names, descriptions and semantic/data
types.
This code is derived from Kenneth Graunke's work, temporarily removed
while the frontend and backend interface were reworked.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of using the same backend interface as AMD_performance_monitor
this defines a dedicated INTEL_performance_query interface that is
modelled more on the ARB_query_buffer_object interface (considering the
similarity of the extensions) with the addition of vfuncs for
initializing and enumerating query and counter info.
Compared to the previous backend, some notable differences are:
- The backend is free to represent counters using whatever data
structures are optimal/convenient since queries and counters are
enumerated via an iterator api instead of declaring them using
structures directly shared with the frontend.
This is also done to help us support the full range of data and
semantic types available with INTEL_performance_query which is awkward
while using a structure shared with the AMD_performance_monitor
backend since neither extension's types are a subset of the other.
- The backend must support waiting for a query instead of the frontend
simply using glFinish().
- Objects go through 'Active' and 'Ready' states consistent with the
query object backend (hopefully making them more familiar). There is
no 'Ended' state (which used to show that a query has ended at least
once for a given object). There is a new 'Used' state, set when a
query is first begun which implies that we are expecting to get
results back for the object at some point. There's no equivalent to
the 'EverBound' state since the spec doesn't require there to be a
limbo state between generating IDs and associating them with an object
on query Begin.
The INTEL_performance_query and AMD_performance_monitor extensions are
now completely orthogonal within Mesa main (though a driver could
optionally choose to implement both extensions within a unified backend
if that were convenient for the sake of sharing state/code).
v2: (Samuel Pitoiset)
- init PerfQuery.NumQueries in frontend
- s/return_string/output_clipped_string/
- s/backed/backend/ typo
- remove redundant *bytesWritten = 0
v3:
- Add InitPerfQueryInfo for lazy probing of available queries
v4:
- Clean up some internal usage of GL typedefs (Ken)
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To allow the backend interfaces for AMD_performance_monitor and
INTEL_performance_query to evolve independently based on the more
specific requirements of each extension this starts by separating
the frontends of these extensions.
Even though there wasn't much tying these frontends together, this
separation intentionally copies what few helpers/utilities that were
shared between the two extensions, avoiding any re-factoring specific to
INTEL_performance_query so that the evolution will be easier to follow
later.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It looks like it was partly copied from the median filter fragment shader
and unnecessesarily saved a lot of temporary values.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The vdpau state tracker allows multiple threads access to the same gallium
context simultaneously. We can fix this either by locking the same mutex
each time the context is used or by using a different gallium context for
each mutex domain. Here we do the latter, although I'm not sure that's really
the best option.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
When looking at the full range matrices, it becomes obvious that the difference
between the standard matrices and the full range matrices is that the full
range matrices are multiplied by 1.164. Together with offsetting the y value
with -16/255, this will scale and offset RGB with the desired quantities.
However, the standard SMPTE 240M matrix seems to differ a bit since the
U and V coefficients are only multiplied with 1.138 to get the full range
matrix. This would actually alter the color somewhat so I figure that's an
error. The full range matrix is consistent with Nvidia's VDPAU implementation.
We can also incorporate the ybias in the brightness simplifying the
calculation somewhat.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The brightness matrix doesn't actually match the procamp matrix and
what's calculated in vl_csc_get_matrix.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
It will cause multiple simultaneous maps of the same vertex buffer and
flushed-while-mapped warnings.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Windows doesn't have dlfcn.h. Protect the code in question
with #if ENABLE_SHADER_CACHE test. And fix indentation.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This extension adds new query types which can be used to detect overflow
of transform feedback buffers. The new query types are also accepted by
conditional rendering commands.
v3:
- s/gen7+/gen6+/ in the relnotes (Jordan Justen)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Enable the use of a transform feedback overflow query with
glBeginConditionalRender. The render commands will only execute if the
query is true (i.e. if there was an overflow).
Use ARB_conditional_render_inverted to change this behavior.
v4:
- reuse MI_MATH calcs from hsw_queryob (Kenneth)
- fallback to software conditional rendering when MI_MATH is not
available (Kenneth)
v5:
- check query->Target (Kenneth)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Enable getting the results of a transform feedback overflow query with a
buffer object.
v4:
- hsw_overflow_result_to_gpr0 a public function, so it can be used
by conditional render. (Kenneth)
- fix typo grp0/gpr0 (Kenneth)
- rename load_gen_written_data_to_regs to
load_overflow_data_to_cs_gprs (Kenneth)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When querying for transform feedback overflow on one or all of the
streams, store information about number of generated and written
primitives. Then check whether generated == written.
v2:
- use only SO_PRIM_STORAGE_NEEDED, do not fallback to
CL_INVOCATION_COUNT. (Kenneth)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add some basic types and storage for the queries of this extension.
v2:
- update date of extension (Kenneth)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
WARNING: sphinx.ext.pngmath has been deprecated. Please use
sphinx.ext.imgmath instead.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
src/gallium/docs/source/tgsi.rst:3488: WARNING: Title underline too short.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Without these, mathjax considers these as the continuation of the
previous line.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
src/gallium/docs/source/context.rst:95: ERROR: Unexpected indentation.
Sub lists need to be surrounded by a blank line.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The only feature over and above ES 3.0 is DrawTransformFeedback().
We already have to do the whole SOL_NUM_PRIMS_WRITTEN counter dance in
order to compute the SVBI value for ResumeTransformFeedback(), at which
point our existing GetTransformFeedbackVertexCount() implementation will
do the trick (though with a stall to CPU map the buffer).
Someday, we could probably implement DrawTransformFeedback() more
efficiently, using the "Load Internal Vertex Count" feature of
3DSTATE_SVB_INDEX and the 3DPRIMITIVE indirect vertex count bit.
Rumor has it this allows people to use WebGL 2.0 on Sandybridge.
Note that we don't need pipelined register writes like Gen7+ because
we use the 3DSTATE_SVB_INDEX command rather than MI_LOAD_REGISTER_MEM.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99842
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This fixes Piglit's ARB_transform_feedback2/change-objects-while-paused
GLES 3.0 test. When resuming the transform feedback object, we need to
reset the SVBI counters so we continue writing at the correct point in
the buffer.
Instead of SO_WRITE_OFFSET counters (with a DWord offset), we have the
Streamed Vertex Buffer Index (SVBI) counters, which contain a count of
vertices emitted.
Unfortunately, there's no straightforward way to store the current SVBI
counter values to a buffer. They're not available in a register. You
can use a bit in the 3DSTATE_SVB_INDEX packet to copy them to another
internal counter which 3DPRIMITIVE can use...but there's no good way to
extract that either.
So, once again, we use SO_NUM_PRIMS_WRITTEN to calculate the vertex
numbers. Thankfully, we can reuse most of the existing Gen7+ code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
I'm going to need this in a new Resume hook shortly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Sandybridge and earlier only have a single counter.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This way on Sandybridge we'll only do 1 stream worth of math, since
we only have one SO_NUM_PRIMS_WRITTEN counter.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
I plan to use these functions on Sandybridge soon. I changed the prefix
on a couple of functions to "brw" instead of "gen7" as in theory they
should be usable all the way back to G45.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
These driver hooks are not used when MI_MATH and MI_LOAD_REGISTER_REG
are supported, which Gen8+ can always do. So this code is dead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
R600_DEBUG=mono has had no effect since:
commit 1fabb29717
Author: Marek Olšák <marek.olsak@amd.com>
Date: Tue Feb 14 22:08:32 2017 +0100
radeonsi: have separate LS and ES main shader parts in the shader selector
Also, this assertion was failing:
si_state_shaders.c:1307: si_shader_select_with_key: Assertion
`!shader->is_optimized' failed.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We were unconditionally storing these outputs, sometimes even one component
at a time, but apps never read them in TES.
Move the TESSINNER/OUTER buffer stores into the TCS epilog where we can
easily disable them on demand.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This removes a lot of useless LDS stores.
A few games read TESSINNER/OUTER, but not any other outputs. Most games
don't read any outputs.
The only app doing LDS output reads is UE4 Lightsroom Interior.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This allows the helper to check for llc instead of having to do it
manually at all the call sites.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
All this cache line address calculation stuff is tricky. Let's not
duplicate it more places than we have to.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It's a bit shorter and easier to work with. Also, we're about to add a
helper called clflush which does the clflush but without any memory
fencing.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This validation was added before the etnaviv drm driver landed in
the linux kernel. Due some pre-merge API changes we had to fix-up
this value but with a mainline kernel this is not a problem anymore.
Lets remove that validation which also gets rid of problem caught
by Coverity, reported to me by imirkin.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Used only within the generated source file.
Fixes: 12301c5418 ("radv: drop the RADV_CALL macro.")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Otherwise symbols wont be annotated with C linkage and we'll fail at
link time.
Currently this is worked around by wrapping the header inclusion itself.
The latter in itself fragile and not recommended.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
It's a problem waiting to happen. Individual headers should be annotated
if needed.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Namely, after the include directives. The headers are properly annotated
so keeping things as-is is only asking for trouble.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
compiler.h defines a few mesa specific macros which are not C specific.
This allows us to avoid buggy extern C { #include $system_header }
constructs.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
i.e. add extern C {} in program/symbol_table.h
It will allow us remove a workaround we have elsewhere in the code.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The same PS epilog workaround as for 8-bit integer formats is required,
since the CB doesn't do clamping.
Fixes GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels*.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Also handle the GL_ARB_indirect_parameters case where the count itself
is in a buffer.
Use transfers rather than mapping the buffers directly. This anticipates
the possibility that the buffers are sparse (once ARB_sparse_buffer is
implemented), in which case they cannot be mapped directly.
Fixes GL45-CTS.gtf43.GL3Tests.multi_draw_indirect.multi_draw_indirect_type
on <= CIK.
v2:
- unmap the indirect buffer correctly
- handle the corner case where we have indirect draws, but all of them
have count 0.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Allocating huge buffers in VRAM is not a problem, but when those buffers
start being migrated, the kernel runs into errors because it cannot split
those buffer up for moving through GTT.
This should fix intermittent failures of
GL45-CTS.texture_buffer.texture_buffer_max_size
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The preamble flushes now and the rest is the responsibility of the app.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This splits out the cache flush bit setting code
dependent on the src/dest access flags.
It then calls it from the subpass barrier code.
It also marks a TODO to remove the aggressive CS/PS
flushes at some point.
This fixes a bunch of the
dEQP-VK.renderpass.attachment_allocation.input_output.*
tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This assert might have made sense before but we no longer use
gl_linked_shader here. Unless the caller has really done something
crazy this assert is fairly useless.
We also do some small tidy ups in this change.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The version tag used to nominate has bitten even experienced mesa
developers. Not to mention that it deviates from the one used in the
kernel leading to further confusion.
Simplify things and omit it all together.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reword the section to focus on what is allowed, using a more brief, yet
descriptive wording.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reenable the PPC64LE Vector-Scalar Extension for LLVM versions >= 3.8.1,
now that LLVM bug 26775 and its corollary, 25503, are fixed.
Amendment: remove extraneous spaces in macro def & invocations.
We would prefer a runtime check, e.g. via an LLVMQueryString
(analogous to glGetString, eglQueryString) or LLVMGetVersion API,
but no such API exists at this time.
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
[Emil Velikov: remove LLVM_VERSION macro]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
If llvm::sys::getHostCPUName() returns "generic", override
it with "pwr8" (on PPC64LE).
This is a work-around for a bug in LLVM: a table entry for "POWER8NVL"
is missing, resulting in (big-endian) "generic" being returned on
little-endian Power8NVL systems. The result is that code that
attempts to load the least significant 32 bits of a 64-bit quantity in
memory loads the wrong half.
This omission should be fixed in the next version of LLVM (4.0),
but this work-around should be left in place in case some
future version of POWER<n> also ends up unrepresented in LLVM's table.
This workaround fixes failures in the Piglit arb_gpu_shader_fp64 conversion
tests on POWER8NVL processors.
(V4: add similar comment in the code.)
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Cc: 12.0 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Define ElfW() and NT_GNU_BUILD_ID if needed as these defines are not
present on at least OpenBSD and FreeBSD. Fixes the build on OpenBSD.
Fixes: d4fa083e11 ("util: Add utility build-id code.")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
generated-sources-dir-for macro replaces intermediates-dir-for
and LOCAL_MODULE_CLASS is defined as required by new macro,
in order to avoid the following building error:
external/mesa/src/gallium/drivers/radeonsi/si_debug.c:29:10: fatal error: 'sid_tables.h' file not found
^
1 error generated.
Fixes: 730574c58e ("android: ac/debug: move sid_tables.h generation and
IB decode to amd/common")
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
This adds support to radv_GetPhysicalDeviceXlibPresentationSupportKHR
and radv_GetPhysicalDeviceXcbPresentationSupportKHR to check if the
local device file descriptor is compatible with the descriptor
retrieved from the X server via DRI3.
This will stop radv binding to an X server until we have prime
support in place. Hopefully apps use this API before trying
to render things.
v2: drop unneeded function, don't leak memory. (jekstrand)
v3: also check in surface_get_support callback.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just keeps popping up minor problems and regressions we should
revisit in a more sustainable manner later.
This also reverts:
Revert "radv: query cmds should mark a cmd buffer as having draws."
Revert "radv: also fixup event emission to not get culled."
This reverts commit d1640e7932.
This reverts commit 8b47b97215.
This reverts commit b4b19afebe.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
start can only be non-zero with MultiDrawElements, which is unlikely
to occur with UNSIGNED_BYTE indices.
v2: Also fix the util_shorten_ubyte_elts_to_userptr call.
Tested with the new piglit.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This iterates the fast clear flush across the layers in the
specified range.
It also moves the compute resolve flush into the function
and builds the range in there.
This fixes:
dEQP-VK.geometry.layered.* regressions since fast clears.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes:
dEQP-VK.renderpass.formats.a2b10g10r10_unorm_pack32*
regressions.
Fixes:
f22836dbdd radv: Add CPU color packing for VK_FORMAT_A2B10G10R10_UNORM_PACK32.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Even though the preferred stance is not to fix incorrect applications
via the driver, this prevents some nasty GPU hangs.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This lowers lgkm wait cycles by 30% on VI and normal conditions.
The might be a measurable improvement when CE is disabled (radeon)
or under L2 thrashing.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
So that we can disable u_vbuf for GL core profiles.
This is a v2 of the previous VI-only patch.
It requires SH_MEM_CONFIG.ALIGNMENT_MODE = UNALIGNED on CIK-VI.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If an unsized declared array is not the last in an SSBO
and an implicit size can not be defined on linking time,
the linker should raise an error instead of reaching
an assertion on GL.
This reverts part of commit 3da08e1664
getting back to the behavior of commit 5b2675093e
The original patch was correct for GLES that should produce
a compile-time error but the linker error is still necessary
in desktop GL.
Fixes the following piglit tests:
tests/spec/arb_shader_storage_buffer_object/non_integral_size_array_member.shader_test
tests/spec/arb_shader_storage_buffer_object/unsized_array_member.shader_test
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
This one only keeps allocated memory in the list, and list nodes
in the descriptor sets. Thsi doesn't need messing around with
max_sets, and we get automatic merging of free regions.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
We only use the freed ones after all free space has been used. If
the app only allocates small descriptor sets, we might go over
max_sets before the memory is full.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
Fixes: f4e499ec79
The optimization in unpack_64 is clearly subsumed with the opt_algebraic
optimizations in the previous commit. The pack optimization may not be
quite handled by opt_algebraic but opt_algebraic should get the really
bad cases. Also, it's been broken since it was merged and we've never
noticed so it must not be doing anything.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NIR is a typeless IR and the two opcodes, when considered bitwise, do
exactly the same thing. There's no reason to have two versions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In order to avoid costly fallback recompiles when cache items are
created with an old version of Mesa or for a different gpu on the
same system we want to create directories that look like this:
./{TIMESTAMP}_{LLVM_TIMESTAMP}/{GPU_ID}
Note: The disk cache util will take a single timestamp string, it is
up to the backend to concatenate the llvm string with the mesa string
if applicable.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Here we skip the recreation of uniform storage if we are relinking
after a cache miss. This is improtant because uniform values may
have already been set by the application and we don't want to reset
them.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will allow us to skip certain things when falling back to
a full recompile on a cache miss such as avoiding reinitialising
uniforms.
In this change we use it to avoid reading the program metadata
from the cache and skipping linking during a fallback.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
These may be lowered constant arrays or uniform values that we set before linking
so we need to cache the actual uniform values.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
For now this disables the shader cache when transform feedback is
enabled via the GL API as we don't currently allow for it when
generating the sha for the shader.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
V2: don't store pointers use an enum instead to flag what should be
restored. Also do the work in a helper that we will later use for
the subroutine remap table.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The three additional tables are AttributeBindings, FragDataBindings,
and FragDataIndexBindings.
The first table (AttributeBindings) was identified as missing by
trying to test the shader cache with a program that called
glGetAttribLocation.
Many thanks to Tapani Pälli <tapani.palli@intel.com>, as it was review
of related work that he had done previously that pointed me to the
necessity to also save and restore FragDataBindings and
FragDataIndexBindings.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The scenario is:
glShaderSource
glCompileShader <-- deferred due to cache hit of shader
glShaderSource <-- with new source code
glAttachShader
glLinkProgram <-- no cache hit for program
At this point we need to compile the original source when we
fallback.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The hash key for glsl metadata is a hash of the hashes of each GLSL
source string.
This commit uses the put_key/get_key support in the cache put the SHA-1
hash of the source string for each successfully compiled shader into the
cache. This allows for early, optimistic returns from glCompileShader
(if the identical source string had been successfully compiled in the past),
in the hope that the final, linked shader will be found in the cache.
This is based on the intial patch by Carl.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This uses disk_cache.c to write out a serialization of various
state that's required in order to successfully load and use a
binary written out by a drivers backend, this state is referred to as
"metadata" throughout the implementation.
This initial version is intended to work with all stages beside
compute.
This patch is based on the initial work done by Carl.
V2: extend the file's doxygen comment to cover some of the
design decisions.
V3:
- skip cache for fixed function shaders
- add int64 support
- fix glsl IR program parameter caching/restore and cache the
parameter values which are used by gallium backends.
- use new link status enum
V4:
- add compute program support
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Remove local definition of RADEON_INFO_TILE_CONFIG and use the correct
macro provided by libdrm_radeon RADEON_INFO_TILING_CONFIG.
Latter was present as of libdrm 2.4.22, sirca 2010.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Such that we can remove all the local fall-back definitions and use the
official UABI ones.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The currently used range HEAD..origin/master is far too broad. It looks
for nominations within the already_landed list (branchpoint..HEAD).
Similarly we look for already_landed whiting the [possible] nominations
Rand branchpoint..origin/master.
Improve things by limiting the look ups to the branch point.
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Currently we loop (git log --grep) to check if the fix has landed. We
can simplify and make things faster by storing the already_picked list
and grep ping through it.
Slim down the message while we're here.
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Mention the generic channels (PPA, Corp, other) as well as give a couple
of examples. Even if the latter became out of date the former should a
be good guide.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Print only the information needed. Namely:
*info: the DRI module picked and the vendor/renderer strings
*gears: everything but the "...configuration file..." line(s)
v2: (Eric) Use "2>&1 |" over "|&", properly escape &.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
We had multiple cases in the past where files used only by the
Scons/MinGW/Windows build were missing.
Avoid such instances and add a step to catch them early.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
These are enough for the spir-v generator to handle UConvert
and SConvert operations, and fix the 4 tests in CTS.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds the support at the spirv->nir level for the Int64
cap.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is used in DOOM, so provide the fast clear path for it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes regressions from c505d6d852.
Switching from using gl_shader_program to gl_program for the pipline
objects CurrentProgram array meant we were freeing gl_shader_programs
immediately after glDeleteProgram was called, but the spec states
the program should only get deleted once it is no longer in use.
To work around this we add a new ReferencedPrograms array to track
gl_shader_programs in use.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the buffer has been freed by the kernel under memory pressure, it is
invalid to try and access the backing storage for that buffer in the
future - the backing storage is not recreated automatically. As such we
need to mark the GL object as being freed for unretained buffers and so
recreate the object on next use.
Futhermore from the GL_APPLE_object_purgeable:
"In contrast, by calling ObjectUnpurgeableAPPLE with an <option> of
UNDEFINED_APPLE, the application is indicating that it intends to
recreate the contents of the storage from scratch. Further, the
application is is stating that it would like the GL to do only the
minimal amount of work set PURGEABLE_APPLE to FALSE. If
ObjectUnpurgeableAPPLE is called with the <option> set to
UNDEFINED_APPLE, then ObjectUnpurgeableAPPLE will return the value
UNDEFINED_APPLE."
we must always report GL_UNDEFINED_APPLE when called with
glObjectUnpurgeable(GL_UNDEFINED_APPLE).
Testcase: piglit/object_purgeable-api-*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This partially reverts commit 97217a40f9.
It leaves ES 2.0 support in place per Ian's suggestion, because ES 2.0
is designed to work on hardware like i915.
Chrome only uses the GPU if you have GL >= 2.0, and using i915 (and
prog_execute) actually hurt performance compared with the software
paths.
The --build-id=... ld flag has been present since binutils-2.18,
released 28 Aug 2007.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Provides the ability to read the .note.gnu.build-id section of ELF
binaries, which is inserted by the --build-id=... flag to ld.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
glGetTextureSubImage() and glGetCompressedTextureSubImage() are currently
returning INVALID_OPERATION error when the passed texture argument does not
correspond to an existing texture object. However, the error should be
INVALID_VALUE instead. From OpenGL 4.5 spec PDF, section '8.11. Texture
Queries', page 236:
"An INVALID_VALUE error is generated if texture is not the name of
an existing texture object."
Same wording applies to the compressed version.
The INVALID_OPERATION error is coming from the call to
_mesa_lookup_texture_err(). This patch uses _mesa_lookup_texture() instead
and emits the correct error in the caller.
Fixes: GL45-CTS.get_texture_sub_image.errors_test
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Mesa currently doesn't allow to create 3.1+ compatibility profiles
mainly because various features are unimplemented and bugs can
happen.
However, some buggy apps request a compat profile without using
any old features unimplemented in mesa, and they fail to start.
This option should help some games to run but it's not enough
for all (eg. Dying Light).
v2: - s/force_compat_profile/allow_higher_compat_version
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
vkQueuePresentKHR() takes VkPresentInfoKHR pointer and includes a
pResults fields which must holds the results of all the images
requested to be presented. Currently we're not filling this field.
Also as a side effect we probably want to go through all the images
rather than stopping on the first error.
This commit also makes the QueuePresentKHR() implementation return the
first error encountered.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Commit 8bca8d89ef ("glx/glvnd: Fix dispatch function names and indices")
fixed the sorting of the array initializers in g_glxglvnddispatchfuncs.c
because FindGLXFunction's binary search needs these to be sorted
alphabetically.
That commit also mostly fixed the sorting of the DI_foo defines in
g_glxglvnddispatchindices.h, which is what actually matters as the
arrays are initialized using "[DI_foo] = glXfoo," but a small error
crept in which at least causes glXGetVisualFromFBConfigSGIX to not
resolve, breaking games such as "The Binding of Isaac: Rebirth" and
"Crypt of the NecroDancer" from Steam not working and possible causes
other problems too.
This commit fixes the last of the sorting errors, fixing these mentioned
games not working.
Fixes: 8bca8d89ef ("glx/glvnd: Fix dispatch function names and indices")
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Cc: Adam Jackson <ajax@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This is possibly a bad idea, I might have to consider a better one.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes a regression with the remove non-draw cmd buffers in
queries.
Fixes: 8b47b97215 radv: detect command buffers that do no work and drop them (v2)
Signed-off-by: Dave Airlie <airlied@redhat.com>
For GS input arrays, we may turn a packed_type of ivec4 into an
array of ivec4s. We still want flat qualification.
Found by inspection. Not known to help anything.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Unfortunately, this doesn't substantially improve the performance of any
known apps. With Dota 2 on my Sky Lake gt4, it seems help by somewhere
between 0% and 1% but there's enough noise that it's hard to get a clear
picture.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
It's a bit hard to measure because it almost gets lost in the noise,
but this seemed to help Dota 2 by a percent or two on my Broadwell
GT3e desktop.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This helps Dota 2 on Broadwell by 8-9%. I also hacked up the driver and
used the Sascha "shadowmapping" demo to get some results. Setting
uses_kill to true dropped the framerate on the demo by 25-30%. Enabling
the PMA fix brought it back up to around 90% of the original framerate.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Vulkan doesn't have a stencilWriteEnable bit like it does for depth.
Instead, you have a stencil mask. Since the stencil mask is handled as
dynamic state, we have to handle it later during command buffer
construction. This, combined with a later commit, seems to help Dota2
on my Broadwell GT3e desktop by a couple percent because it allows the
hardware to move the depth and stencil writes to early in more cases.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This changes the way anv_entrypoints_gen.py works from generating a
table containing every single entrypoint in the XML to just the ones
that we actually need. There's no reason for us to burn entrypoint
table space on a bunch of NV extensions we never plan to implement.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Even though we supported both coherent and non-coherent memory types, we
effectively forced apps to use the coherent types by accident. Found by
inspection, only compile tested.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
I think this only affects radeonsi - VI, because all other drivers using
u_vbuf probably don't support GL_DOUBLE, so they won't be affected by this.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Notes:
- make sure the default size is large enough to handle all state trackers
- pipe wrappers don't receive transfer calls from stream_uploader, because
pipe_context::stream_uploader points directly to the underlying driver's
stream_uploader (to keep it simple for now)
v2: add error handling to nv50, nvc0, noop
v3: set const_uploader
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)
Tested-by: Charmaine Lee <charmainel@vmware.com>
For lower memory usage and more efficient updates of the buffer residency
list. (e.g. if drivers keep seeing the same buffer for many consecutive
"add" calls, the calls can be turned into no-ops trivially)
v2: add const_uploader, add documentation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Tested-by: Charmaine Lee <charmainel@vmware.com>
This ports the remains of the workarounds from radeonsi for
the non-TESS cases. It should provide equivalent workarounds
for hawaii and bonarie.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This allows shaders to write to storage images declared with unknown
format if they are decorated with NonReadable ("writeonly" in GLSL).
Previously an image view would always use a lowered format for its
surface state, however when a shader declares a write-only image, we
should use the real format. Since we don't know at view creation time
whether it will be used with only write-only images in shaders, create
two surface states using both the original format and the lowered
format. When emitting the binding table, choose between the states
based on whether the image is declared write-only in the shader.
Tested on both Sascha Willems' computeshader sample (with the original
shaders and ones modified to declare images writeonly and omit their
format qualifiers) and on our own shaders for which we need support
for this.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Allow that capability if the driver indicates that it is supported, and
flag whether images are read-only/write-only in the nir_variable (based
on the NonReadable and NonWritable decorations), which drivers may need
to implement this.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
As soon as we support shaderStorageImageWriteWithoutFormat we can see
write-only images (sampled == 2) that don't have a format specified.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This makes our driver robust to changes in spirv_to_nir which would set
this flag on the variable. Right now, our driver relies on spirv_to_nir
*not* setting var->data.image.write_only for correctness. Any patch
which implements the shaderStorageImageWriteWithoutFormat will need to
effectively revert this commit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This adds two columns to the format table as well as two helpers for
determining whether or not a given format is supported for typed reads
and writes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Passes the newly added piglit test for this extension on i965.
V2: Fix comments by Ilia.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This uses the common fs interp code to use the new
llvm intrinsics so llvm can drop the old ones.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This puts the common gfx state for the device into an
indirect buffer, and just calls out to it, on CIK and above.
This is taken from what radeonsi does.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just prep work for the following patch to use
a common gfx init indirect buffer.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If a buffer is just full of flushes we flush things on command
buffer submission, so don't bother submitting these.
This will reduce some CPU overhead on dota2, which submits a fair
few command streams that don't end up drawing anything.
v2: reorganise loop to count first then malloc,
rename some vars (Bas)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
BLORP is now smart enough to handle any swizzle (even those that contain
ZERO or ONE) in a reasonable manner. Just let BLORP handle it. This
fixes the following Vulkan CTS tests on Haswell:
- dEQP-VK.api.image_clearing.clear_color_image.1d_b4g4r4a4_unorm_pack16
- dEQP-VK.api.image_clearing.clear_color_image.2d_b4g4r4a4_unorm_pack16
- dEQP-VK.api.image_clearing.clear_color_image.3d_b4g4r4a4_unorm_pack16
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
It's trivial to swizzle clear colors on the CPU, easily deals with the
hardware restrictions for render target swizzles, and makes swizzled
clears work on all hardware as opposed to just HSW+.
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Released back in 2007 so it should not be an issue for anyone building
from git.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
DRI2DriverPrimeShift was added in dri2proto-2.8, which we now require
as of the previous commit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
DRI2DriverPrimeShift was added in dri2proto-2.8, which we now require as
of the previous commit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Noticed while skimming through, although admittedly there's many other
dependencies that are not tracked by the scons build.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
dri2proto 2.8 was released 4+ years ago, so it must be of no surprise
for anyone building mesa from git.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The two symbols referenced were introduced with v2.2 and 2.3 of
the dri2proto package and we require dri2proto >= 2.6.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Replace with AS_HELP_STRING and AC_MSG_ERROR respectively, as spotted by
autoupdate.
Note that the suggested AC_CANONICAL_SYSTEM > AC_CANONICAL_TARGET change
is not addressed here since that requires very extensive testing.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
I forgot to error check stat() and also I wasn't using the subdir in
is_two_character_sub_directory().
Fixes: d7b3707c61 "util/disk_cache: use stat() to check if entry is a directory"
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Drop all -m*, -W*, -O*, -g* and -f* flags, with the exception of
-fno-rtti, which must be used if it's part of the llvm-config --cxxflags
output. We don't want LLVM to dictate the flags we use, and it can even
cause build failures, e.g. if LLVM and Mesa are built with different
compilers.
While we're at it, eat any whitespace preceding dropped flags as well.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
TCS and TES inputs without an array size are implicitly sized to
gl_MaxPatchVertices. But TCS outputs are apparently not:
"If no size is specified, it will be taken from the output patch size
(gl_VerticesOut) declared in the shader."
Fixes dEQP-GLES31.functional.program_interface_query.program_output.
array_size.separable_tess_ctrl.var.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
We were already unwrapping types when the producer was a non-array
stage and the consumer was an arrayed-stage...but we ought to unwrap
both ends for TCS -> TES matching too.
This will allow us to drop the "resize to gl_MaxPatchVertices" check
shortly, which breaks some things.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
OpenGL ES actually has spec text to prohibit this. It's just OpenGL
that's confusing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
ES 3.x requires both TCS and TES to be present. We already checked
the TCS && !TES case above, so we just have to check !TCS && TES here.
Note that this is allowed in OpenGL, just not ES.
This fixes a subcase of:
dEQP-GLES31.functional.debug.negative_coverage.*.tessellation.single_tessellation_stage
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Now that we have OES_tessellation_shader, the same situation can occur
in ES too, not just GL core profile.
Having a TCS but no TES may confuse drivers - i965 crashes, for example.
This prevents regressions in
ES31-CTS.core.tessellation_shader.single.xfb_captures_data_from_correct_stage
with some SSO pipeline validation changes I'm making.
v2: Add an ES spec citation (suggested by Alejandro)
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The blob uses these, and it fixes a bunch of dEQP stencil sampling tests
involving border colors. Probably the Z-based samplers work somehow
differently wrt border colors when using the stencil swizzle.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The struct have different size, so the arrays have different stride.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
- Use the same instruction area on GC3000 as the Vivante driver.
This allows the same number of instructions on GC3000 as GC2000
instead of half.
- Makes sure that the "PE to FE" stall before updating the shader code
or constants is hit (which is conditional on vs_offset > 0x4000). This
is necessary on GC3000 too, it increases stability.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Update from etnaviv repository rnndb. This adds some newly
discovered state for GC3000 (and some GC2000) features.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Just noticed we do a fair bit of unneeded searching here.
Since we know that the buffers in a CS are unique already,
the first time we get any buffers, we can just memcpy those into
place, and when we are searching for subsequent CSes, we only
have to search up until where the previous unique buffers were.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
On SM35 there does not appear to be a way to emit a ATOM.EXCH with a
null destination. This should be functionally equivalent to a plain
store however, so just do that.
Fixes GL45-CTS.compute_shader.atomic-case2 on SM35.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The former logic just plain didn't work at all. We need to write the
subsequent dword to the next buffer location.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
We have logic to short-circuit such retrievals to zero. However "zero"
was an immediate, and some logic expected to get registers (to later be
propagated). Fix this by using loadImm.
Fixes GL45-CTS.gpu_shader5.images_array_indexing
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
From GLSL ES 3.10 spec, section 4.1.9 "Arrays":
"If an array is declared as the last member of a shader storage block
and the size is not specified at compile-time, it is sized at run-time.
In all other cases, arrays are sized only at compile-time."
In desktop GLSL it is allowed to have unsized-arrays that are
not last, as long as we can determine that they are implicitly
sized, which is detected at link-time.
With this patch Mesa reports a compilation error as glslang does with
the following shader:
buffer SSBO { vec4 data[]; vec4 moreData;};
void main (void)
{
}
Fixes:
dEQP-GLES31.functional.debug.negative_coverage.log.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.callbacks.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.get_error.shader.compile_compute_shader
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gallium's blitter expects that it can set the sample mask even when the
rasterizer doesn't have the flag on.
Between this and the previous test, 10 new ext_framebuffer_multisample
tests start passing.
gallium's quad-based blitter for copying MSAA depth textures expects to be
able to do 4 passes updating a sample at a time using glSampleMask, and
there's no color buffer bound when it's doing that.
In the hardware we only get to declare 8 vertex elements (GLES2's
minimum), so we should be exposing that number here. Fixes an assertion
failure in piglit texrect-many, at the expense of various GL 2.0-ish
minmax tests now complaining that our count is too low.
The kernel will reject our shader if we emit one here, and having 4, 8, or
12 as the top end of our UBO clamp rare is enough that it's not worth
making the kernel let us.
Fixes piglit fs-const-array-of-struct and
fs-const-array-of-struct-of-array since recent GLSL linking changes made
us get this as an indirect load of a uniform, instead of a tempoary.
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Currently we have extra (somewhat questionable) modularity, such that
one could build some parts with LLVM while others w/o.
That is extremely fragile, error prone and requires quite noticable
amount of code throughout.
Thus lets deprecate the gallium toggle in faviour of the generic one.
The former will throw a warning when set, and it will be overwritten by
the latter. This will allow gradual transition w/o breaking people's
scripts.
v2: Rebase, document in release notes.
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de> (v1)
The extra function brings no added benefit as of earlier commit which
made llvm_require_version (as called by radeon_llvm_check) require LLVM
(--enable-gallium-llvm).
Fixes: 5f966a96af7 "configure.ac: Mandate --enable-gallium-llvm when
checking LLVM version"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
Earlier refactoring commits changed from one, dare I say it, broken
behaviour to another. Namely:
Before, as you explicitly --enable-gallium-llvm your selection was
ignored when llvm-config was not present/detected.
Today, the "auto" heuristics enables gallium llvm regardless if you have
llvm/llvm-config available or not.
Rework the auto-detection to attribute for llvm's presence.
v2: Set enable_gallium_llvm=no when LLVM is not found.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
Reported-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Already implicitly handled throughout, but keep it clear and disable
gallium-llvm. This change should be a no-op.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
Earlier refactoring commits started setting the above regardless if LLVM
is used or not. Move them to the respective section to restore the
original functionality.
Since we require the preprocessor flags (includes in particular) for the
header version parsing keep those as-is. They are not used outside of
configure.ac thus should not cause any side-effects.
As-is adding the C/CXXFLAGS can lead to build issues on when
cross-compiling.
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
Although it works, it's not the correct thing to do.
v2: Rebase
v3: Rebase
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de> (v1)
LLVM_BINDIR is completely unused while others such as LLVM_LIBDIR are
used only internally. In the latter case there's no need to AC_SUBST it.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
Set FOUND_LLVM only when LLVM is present (checking for exact version/etc
is deferred) and use enable-gallium-llvm to indicate the global LLVM
status.
Renaming the latter is not appropriate for stable patches, so we'll
address it with a later commit.
Loosely based on work by Tobias.
v2: Check FOUND_LLVM if enable_gallium_llvm is set.
Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
... to where it's applicable.
Since we effectively made --enable-gallium-llvm mean --enable-llvm with
earlier commits, we need to move the requirement to guard the compnents
added for the LLVM draw.
Otherwise we'll error (as below) when building RADV w/o gallium drivers.
configure: error: --enable-gallium-llvm is required when building radv
v2: Don't remove but move the dependency (Tobias).
Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
With this change we effectively require --enable-gallium-llvm when
building RADV. This should be perfectly safe since the gallium radeonsi
driver already explicitly requires it.
The "gallium" part in --enable-gallium-llvm is about to be removed soon
(not in stable), but until then make sure that things can build.
To reflect the requirement (as opposed to check previously) we rename
llvm_check_version_for to llvm_require_version
Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
Drop the gallium prefix since we're about it use it throughout the
configure.
Note we do want to check for enable_gallium_llvm check since (as
explicitly requested) the toggle should mean --enable-llvm. Latter of
which to be resolved with later patches.
Cc: Dave Airlie <airlied@redhat.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tobias Droste <tdroste@gmx.de>
This is actually not needed because the version is checked later.
Around line 2380
if test "x$enable_gallium_llvm" == "xyes"; then
llvm_check_version_for $LLVM_REQUIRED_GALLIUM "gallium"
llvm_add_default_components "gallium"
fi
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Cc: Tobias Droste <tdroste@gmx.de>
Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
v2: [Emil Velikov: rebase/respin series order]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Otherwise we would fail with "implicit declaration of function" geteuid
and getenv respectively.
To trigger (re)move the libdrm.pc file and use the following:
$ ./autogen.sh --disable-egl --disable-gbm --disable-dri \
--with-dri-drivers=swrast --with-gallium-drivers=swrast
$ make
Cc: Vinson Lee <vlee@freedesktop.org>
Fixes: 3f462050c ("loader: Add an environment variable to override driver name choice.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99701
v2: [Emil: handle stdlib.h add commit message]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Not much point in the const qualifier since we provide a copy to the
user. Resolves the following -Wignored-qualifiers warning.
src/intel/blorp/blorp_blit.c:1857:8: warning: 'const' type qualifier on
return type has no effect [-Wignored-qualifiers]
v2: keep const qualifier of local variable.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We want cached GTT for all non-persistent read mappings.
Set level = 0 on purpose.
Use dma_copy, because resource_copy_region causes a failure in the PBO
read of piglit/getteximage-luminance.
If Rocket League used the READ flag, it should get cached GTT.
v2: mask out UNSYNCHRONIZED
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
UMR is our new debugging tool. It must have +s set for Mesa to use it
without root privileges:
sudo chmod +s .../umr
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The instruction has an associated label when Instruction.Label == 1,
as can be seen in ureg_emit_label() or tgsi_build_full_instruction().
This fixes dump generating extra :0 labels on conditionals, and virgl
parsing more than the expected tokens and eventually reaching "Illegal
command buffer" (when parsing more than a safety margin of 10 we
currently have).
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
It's legal to submit just semaphores with no command streams,
this patch fixes this case by emitting the empty cs, it also
handles the fence emission for this case better.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We just increased the max UBO, so we should also increase the clamp that
we do for robustness. Similarly, as we're including the fileIndex in the
new indirect value, we should reset fileIndex to 0 so that it is not
added in a second time.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Many many many compute shaders only define a 1- or 2-dimensional block,
but then continue to use system values that take the full 3d into
account (like gl_LocalInvocationIndex, etc). So for the special case
that a dimension is exactly 1, we know that the thread id along that
axis will always be 0, so return it as such and allow constant folding
to fix things up.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Kepler and up unfortunately only support up to 8 constbufs. We work
around this by loading from constbufs as if they were storage buffers.
However we were not consistently applying limits to loads from these
buffers. Make sure to do the same thing we do for storage buffers.
Fixes GL45-CTS.robust_buffer_access_behavior.uniform_buffer
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Apparently GL 4.5 requires 14 of these (there's a "*" in the spec, but
it's unclear what it refers to). We need to expose an extra binding
point for the "program parameters", which means this must be 15. Remove
the last vestige of the "use c14 for immediates" idea.
Fixes GL45-CTS.shading_language_420pack.binding_uniform_block_array
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
There's all kinds of logic that doesn't like there being holes in defs
or srcs lists. Avoid them. This also fixes the sched logic for maxwell.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Unfortunately there is no SHF.L/SHF.R instruction pre-SM35. So we have
to do a bit more work to get the job done.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
A few thoughts:
- Some of that LegalizeSSA logic should really live much earlier and be
subject to the likes of DCE and other useful passes
- Some of the "lowering" done in from_tgsi should be done later so that
proper optimization might be done.
However this all works and the above can be improved upon later.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Hardware does not support 64-bit integers MAD and MUL operations, so we need
to transform them in 32-bit operations.
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
We were never emitting a .X flag for consuming condition code on SET,
and weren't emitting a signed type for SLCT comparison. Discovered while
working on int64 logic.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
These operations allow you to compute min/max on arbitrary-width
integers, 32 bits at a time.
Note that the low/med ops implicitly set the condition code, and the
med/high ops implicitly consume it.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Nouveau does not currently have logic to implement this as a library
function. Even though such a library could be written, there's no big
advantage to do it that way for now given that int64 is a very uncommon
use-case. Allow a driver to expose INT64 without supporting division and
modulo operations.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Previously if you used MESA_GL_VERSION_OVERRIDE=3.3COMPAT, Mesa exposed
an OpenGL 3.3 compatibility profile context (with various unimplemented
features and bugs), but still refused to compile shaders with
#version 330 compatibility
This patch simply adds a small bit of plumbing to let that through.
Of course the same caveats apply: compatibility profile is still not
supported (and will not be supported), so there are no guarantees that
anything will work.
Tested-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Most of them already redirected to https anyway, so we might as well
avoid the redirection and the security implications by linking directly
to the right protocol.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Brian Paul <brianp@vmware.com>
This will be used to remove cache items created with old versions
of Mesa or other invalid cache items from the cache.
V2: rename stub function (cache_* funtions were renamed disk_cache_*)
in master.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
For the on-disk shader cache we want to be able to differentiate
between a program that was linked and one that was loaded from cache.
V2:
- don't return the new enum directly to the application when queried,
instead return GL_TRUE or GL_FALSE as required. Fixes google-chrome
corruptions when using cache.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
For allowing fast color clears in the main render targets of dota2.
[airlied: fix clear_vals[1] as suggested by Andres.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Removed temporary scafolding in PA, widended the PA_STATE interface
for SIMD16, and implemented PA_STATE_CUT and PA_TESS for SIMD16.
PA_STATE_CUT and PA_TESS now work in SIMD16.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Make all SimdVectors in LLVM represented as simdscalar[4] rather
than a struct.
Fixes issues with promotion of values from i32 to i64 to match
register width.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
SIMD16 Primitive Assembly (PA) only supports TriList and RectList.
CUT_AWARE_PA, TESS, GS, and SO disabled in the SIMD16 front end.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
File names were wrong, file formats were wrong, bunzip command was
wrong...
I also removed all but the simplest example; people who use pipes already
know how to untar, so let's simplify and remove potential confusion for
non-tech-savvy users.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Release candidates haven't been in a 'beta' subdir in a long time, so let's
replace the dead link with an explanation instead.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to initialize dcc like we do in the subpass path.
v2: fix initial/final layouts
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Not wired up (not referenced in any SUBDIR), leading to `make distcheck'
failure.
Fixes: d77fa310ed "ilo: EOL drop unmaintained gallium drv from buildsys"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Properly annotate <li> and keep the note analogous to all the previous
ones - OpenVG, st/egl, etc.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
There were two "libglvnd configuration" section in the squashed commit
that added libglvnd support, while only one in the original libglvnd
branch. A following commit moves one of them downwards. Now remove the
upper "older" one and move GL_LIB name decision downwards after the new
libglvnd configuration section.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
The instance offers 2 cores, so use them to speed things up.
v2: Set MAKEFLAGS instead [Eric]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Note: we need the explicit --enable-freedreno for libdrm since the
latter is 'smart' and disables it if building on !arm platforms.
The radeonsi and swr are explicitly left out since they require
'too-recent' LLVM - 3.6
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The current regex was tracking only the libdrm_foo packages, while with
recent changed we bumped only (and rightfully so) libdrm.
Fix the regex to track any libdrm package.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
It's unlikely that any of the additions come as a suprise to anyone
i915, nouveau, radeon, r200. Regardless, state clearly what's
available.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
As per the spec -
"The functions memoryBarrierShared() and groupMemoryBarrier() are
available only in compute shaders; the other functions are available
in all shader types."
Conform to this by adding another delegate to check for compute
shader support instead of only whether the current stage is compute
This allows some fragment shaders in Dirt Rally to compile
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When parsing texture instruction, it doesn't stop if the
'cur' is ',', the loop variable 'i' will also be increased
and be used to index the 'inst.TexOffsets' array. This can lead
an oob access issue. This patch avoid this.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Li Qiang <liq3ea@gmail.com>
This reverts commit 0bac2551e4.
Now that we position the guardband correctly (applying translations
in addition to scaling) and made it as large (or larger) than the
render target, this shouldn't be necessary.
Now we leave guardband clipping enabled 100% of the time, like the
Windows driver does.
Fixes GL45-CTS.gtf21.GL2FixedTests.clip.clip. It tries to draw a
16384x64 rectangle, and it appears that some kind of numerical
imprecisions in the clipper result in some edge pixels going missing.
The Windows driver passes this test because of guardband clipping.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously we disabled the guardband when the viewport was smaller than
the framebuffer on Gen6-7.5, to prevent portions of primitives from
being draw outside of the viewport. On Gen8+, we relied on the viewport
extents test to effectively scissor this away for us.
We can simply always enable scissoring instead. We already include the
viewport in the scissor rectangle, so this will effectively do the
viewport extents test for us. (The only difference is that the scissor
rectangle doesn't support sub-pixel values. I think that's okay.)
Given that the viewport extents test is essentially a second scissor,
and is enabled for basically all 3D drawing on Gen8+, it stands to
reason that scissoring is cheap. Enabling the guardband reduces the
cost of clipping, which is expensive.
The Windows driver appears to never disable guardband clipping, and
appears to use scissoring in this case. I don't know if they leave
it on universally though.
This fixes misrendering in Blender, where the "floor plane" grid lines
started rendering at wrong angles after I disabled XY clipping of line
primitives. Enabling the guardband seems to solve the issue.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99339
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(Patch co-authored by Jason and Ken.)
We scaled the guardband based on the viewport size, but failed to
take into account the translation portion of the viewport transform.
This meant the guardband was always centered around the origin.
We want it to be centered around the screen-space drawing area,
which is the intersection of the viewport and the render target.
At best, getting this wrong would reduce the guardband's effectiveness
in some cases. At worst, it might break things - objects outside of the
guardband are trivially rejected, so getting the guardband in the wrong
place and leaving guardband clipping enabled could cause problems.
v2: drop clamping of positive maximums.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The next patch will make the guardband calculation dependent on the
transformation matrix. Instead of computing it in both atoms, just
combine them into a single atom.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
CMASK alignment can be greater than image data alignment, so pass
it to the app so that it knows what alignment to backing memory
should have.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Now that there's MESA_LOADER_DRIVER_OVERRIDE for choosing the driver name
we load, we don't need this any more.
v2: Get the junk out of pipe_loader_drm.c, too.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v2)
My vc4 simulator has been implemented so far by having an entrypoint
claiming to be i965, which was a bit gross. The simulator would be a lot
less special if we entered through the vc4 entrypoint like normal, so add
a loader environment variable to allow the i965 fd to probe as vc4.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
All the replicated prototypes/function bodies obfuscated the interesting
logic of the file: the mapping from driver enable macros to entrypoints we
expose, and the way that the swrast entrypoints are special compared to
the DRM entrypoints.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The GLX specification says about glXDestroyPixmap:
"The storage for the GLX pixmap will be freed when it is not current
to any client."
We're not really following this language to the letter: some of the storage
is freed immediately (in particular, the dri3_drawable, which contains both
GLXDRIdrawable and loader_dri3_drawable). So we NULL out the pointers to
that freed storage; the previous patches added the corresponding NULL-pointer
checks.
This fixes memory corruption in piglit
./bin/glx-visuals-depth/stencil -pixmap -auto
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The GLX specification says about glXDestroyPixmap:
"The storage for the GLX pixmap will be freed when it is not current
to any client."
So arguably, functions like glXSwapIntervalMESA can be called after
glXDestroyPixmap has been called for the currently bound GLXPixmap.
In that case, the GLXDRIDrawable no longer exists, and so we just skip
those calls.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
With a subsequent patch, we might see NULL loaderPrivates, e.g. when
a DRIdrawable is flushed whose corresponding GLXDRIdrawable was destroyed.
This resulted in a crash, since the loader vs. DRI3 drawable structures
have a non-zero offset.
Fixes glx-visuals-{depth,stencil} -pixmap
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Set 3DSTATE_WM/ThreadDispatchEnable bit on/off based on the same
conditions as used in the GL version.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before 4.5, the default framebuffer was not allowed for
GetFramebufferParameter, so it should return INVALID_OPERATION for any
call using the default framebuffer.
4.5 included new pnames, and some of them are allowed for the default
framebuffer. For the rest, INVALID_OPERATION. From OpenGL 4.5 spec,
section 9.2.3 "Framebuffer Object Queries:
"An INVALID_OPERATION error is generated by GetFramebufferParameteriv
if the default framebuffer is bound to target and pname is not one
of the accepted values from table 23.73, other than
SAMPLE_POSITION."
Fixes:
GL45-CTS.direct_state_access.framebuffers_get_parameter_errors
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
4.5 added new pnames allowed for GetFramebufferParameter, and
GetNamedFramebufferParameter.
From OpenGL 4.5 spec, section 9.2.3 "Framebuffer Object Queries" (quoting
the paragraph with only the new pnames, not all the supported):
"pname may also be one of DOUBLEBUFFER,
IMPLEMENTATION_COLOR_READ_FORMAT, IMPLEMENTATION_COLOR_READ_TYPE,
SAMPLES, SAMPLE_BUFFERS, or STEREO, indicating the corresponding
framebuffer-dependent state from table 23.73. Values of
framebuffer-dependent state are identical to those that would be
obtained were the framebuffer object bound and queried using the
simple state queries in that table. These values may be queried
from either a framebuffer object or a default framebuffer."
Fixes:
GL45-CTS.direct_state_access.framebuffers_get_parameters
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Current implementation returns the value for the currently bound read
framebuffer. GetNamedFramebufferParameteriv allows to get it for any
given framebuffer. GetFramebufferParameteriv would be also interested
on that method
It was refactored by allowing to pass a given framebuffer. If NULL is
passed, it used the currently bound framebuffer.
It also adds a call to _mesa_update_state. When used only by
GetIntegerv, this one was called as part of the extra checks defined
at get_hash. But now that the method is used by more methods, and the
update is needed, it makes sense (and it is safer) just calling it on
the method itself, instead of rely on the caller.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
If we have an indirect index here we need to scale it by attribute slots
e.g. is this is vec2[256] then we get an indir_index in the 0.255 range
but the vec2 are aligned inside vec4 slots. So scale the indir index,
then extract the channels.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Compressing a render target and decompressing it in the same
single-subpass render pass may waste bandwidth. While this may be
beneficial in some circumstances, it does not help in all. Reclaims
about 1.95% FPS for Dota 2 on some configurations.
v2 (Jason Ekstrand):
- Provide a more thorough comment
- Enable CCS_D for input attachments
v3 (Jason Ekstrand):
- Provide performance numbers
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
dEQP-EGL.functional.create_context.no_config tries to create a context
with no config, then immediately destroys it. The drawbuffer is never
set up, so we can't dereference it asking if it's double buffered, or
we'll crash on a null pointer dereference.
Just bail early.
Applications using EGL_KHR_no_config_context could hit this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
cs can be NULL when it comes from r600_buffer_map_sync_with_rings()
to avoid doing the same checks. It was checked for write mappings
but not for read mappings.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Earlier changes introduced is_ycrcb flag which checks the component
order of u and v components. Condition for setting the flag was
incorrect, with ycrcb we are supposed to have cr before cb.
This patch (together with a fix in our gralloc) fixes corrupted
rendering from 'test-opengl-gl2_yuvtex' native test and corrupted
gallery thumbnail in application switcher on Android-IA.
Fixes: 51727b1cf5
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
The intent of the libdrm_$driver version limits has always been to not
burden the "other" drivers with updating their libdrm unless really
necessary. Unfortunately the configure script erroneously only checked
the driver-specific bit and not the generic bit of libdrm as well. Fix
this.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This fixes
GL45-CTS.tessellation_shader.tessellation_shader_tessellation.max_in_out_attributes
on nouveau. We only support 30 patch varyings (as 2 vec4 slots end up
being used for tess level settings), but were getting 32 exposed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
OpenGL 4.5 spec, section "8.11.4 Texture Image Queries", page 233 of
the PDF states:
"An INVALID_OPERATION error is generated if texture is the name of a buffer
or multisample texture."
This is currently not being checked and e.g a multisample texture image can
be passed down to the driver hook. On i965, it is crashing the driver with an
assertion:
intel_mipmap_tree.c:3125: intel_miptree_map: Assertion `mt->num_samples <= 1' failed.
v2: (Ilia Mirkin) Move the check from gettextimage_error_check() to
GetTextureSubImage() and use the texObj target.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Although this might come from somewhere else require it explicitly.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I enabled CCS for storage images in the Vulkan driver and ran it through
the CTS. It didn't result in any hangs but it demonstrated that the data
port cannot handle CCS.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Nothing uses this yet but it serves as a nice bit of documentation
that's relatively easy to find.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The term "lossless compression" could potentially mean multisample
color compression, single-sample color compression or HiZ because they
are all lossless. The term CCS_E, however, has a very precise meaning;
in ISL and is only used to refer to single-sample color compression.
It's also much shorter which is nice.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Commit 968ffd6c86 stored the last subpass
index of all the attachments but that of the depth-stencil attachment.
This could cause depth buffers used in multiple subpasses not to be in
the requested final layout. Fix this error.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Make the cap consistent with PIPE_CAP_INT64.
Aside from the hypothetical case of using draw for vertex shaders (and
actually caring about doubles...), every implementation supports doubles
either nowhere or everywhere.
Also, st/mesa didn't even check the cap correctly in all supported
shader stages.
While at it, add a missing LLVM version check for 64-bit integers in
radeonsi. This is conservative: judging by the log, LLVM 3.8 might be
sufficient, but there are probably bugs that have been fixed since then.
v2: fix clover (Marek)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Since we already have the functionality in place and games
like Game of Thrones seem to depend on this extension, I
think it makes sense to enable it by making it part of
the extension string even though it's still a draft:
https://www.khronos.org/registry/gles/extensions/EXT/EXT_compressed_ETC1_RGB8_sub_texture.txt
Note: OES_compressed_ETC1_RGB8_sub_texture seems to be listed
in gl2ext.h, but there's no documentation for it in the KHR
registry
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Up to now on Gen8+ we only allocated a vertex element for
gl_InstanceIndex or gl_VertexIndex when a vertex shader uses
gl_BaseInstanceARB or gl_BaseVertexARB. This is because we would
configure the VF_SGVS packet to make the VF unit write the
gl_InstanceIndex & gl_VertexIndex values right behind the values
computed from the vertex buffers.
In the next commit we will also write the gl_DrawIDARB value. Our
backend expects to pull the gl_DrawIDARB value from the element
following the element containing gl_InstanceIndex, gl_VertexIndex,
gl_BaseInstanceARB and gl_BaseVertexARB (see
vec4_vs_visitor::setup_attributes). Therefore we need to allocate an
element for the SGVS elements as long as at least one of the SGVS
element is read by the shader. Otherwise our shader will use a
gl_DrawIDARB value pulled from the URB one element too far (most
likely garbage).
v2: Fix my english (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For RGB formats in Vulkan, we use the corresponding RGBA format with a
swizzle of RGB1. While this swizzle is exactly what we want for
texturing, it's not allowed for rendering according to the docs. While
we haven't been getting hangs or anything, we should probably obey the
docs. This commit just sanitizes all render swizzles so that the alpha
channel maps to ALPHA.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The CTS tests at least are using this, and we were totally
ignoring it.
This hopefully fixes the bouncing multisample CTS tests.
v2: get family mask in ignored case from command buffer.
v3: only change things in one place, use logic from Bas.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Otherwise we were writing these as 4 components, and things went bad.
Fixes (the remaining):
dEQP-VK.clipping.user_defined.*.vert_geom.*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes clip distance fetches as they are single item loads
with a const_index like float[1].
Fixes:
dEQP-VK.clipping.user_defined.*.vert_geom.[0-6]
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This code should be used in radv, so move it to a shared location
in advance of doing that.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the past I've gotten this function confused with the one in
ir_to_mesa.cpp of the same name. Now that the affected flag setting
has move into a helper it makes sense just to inline this remaining
code.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Add assert checking that num_sources is never larger than 3.
This prevents Coverity from concluding that the unhandled
cases of num_sources not being 0-3 are relevant.
Coverity-Id: 1399480-1399489
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
fixes to issues spotted by Emil Velikov:
- set ANV_TIMESTAMP corretly
- fix typo with VULKAN_GEM_FILES
v2: update to use Makefile.sources under vulkan
instead of having own
v3: update to changes to generate from vk.xml
(commit c7fc310)
v4: remove 'hw' relative path
cleanups, remove unnecessary cruft
review from Emil Velikov:
- move to vulkan folder
- remove timestamp gen, no longer necessary
- more cleanups
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
All the other calls to retrieve the attachment have been covered except
this one - return the proper error for attachment points that are valid
enums but out of bound for the driver.
Fixes GL45-CTS.geometry_shader.layered_fbo.fb_texture_invalid_attachment
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It is not clear from the docs exactly how pipelined STATE_BASE_ADDRESS
actually is. We know from experimentation that we need to flush the
render cache prior to emitting STATE_BASE_ADDRESS and invalidate the
texture cache afterwards. The only thing the PRM says is that, on gen8+
we're supposed to invalidate the state cache after STATE_BASE_ADDRESS
but experimentation has indicated that doing so does nothing whatsoever.
Since we don't really know, let's do just a bit more flushing in the
hopes that this won't be a problem again. In particular:
1) Do a CS stall before we emit STATE_BASE_ADDRESS since we don't
really know whether or not it's pipelined.
2) Do a data cache flush in case what runs before STATE_BASE_ADDRESS
is a compute shader.
3) Invalidate the state and constant caches after STATE_BASE_ADDRESS
because the state may be getting cached there (we don't really know).
Reported-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
We had no good reason for *not* doing this on gen7 before but we didn't
know it was needed. Recently, when trying update to Vulkan CTS version
1.0.2 in our CI system, Mark discovered GPU hangs on Haswell that appear
to be STATE_BASE_ADDRESS related. This commit fixes them.
Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
No f16 support as I'm not quite sure about alignment yet.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The addition of Neon assembly breaks on arm64 builds because the assembly
syntax is different. For now, restrict Neon to ARMv7 builds.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
clang throws an error on "%r2" and similar. I couldn't find any
documentation on what "%r?" is supposed to mean and I've never seen any
use like that as far as I remember. The parameter is supposed to be
cpu_stride and just %2/%3 should be sufficient.
There's no need for trailing ";" either, so remove those, too.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This prevents LLVM from using sext instructions for local memory offsets
and allows the backend to fold immediate offsets into the instruction.
This also prevents some incorrect code generation for ptrtoint and
inttoptr instructions.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
See "glsl: Rewrite atan2 implementation to fix accuracy and handling
of zero/infinity." for the rationale, but note that the instruction
count benefit discussed there is somewhat less important for the SPIRV
implementation, because the current code already emitted no control
flow instructions -- Still this saves us one hardware instruction per
scalar component on Intel SKL hardware.
Fixes the following Vulkan CTS tests on Intel hardware:
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.scalar
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec2
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec3
dEQP-VK.glsl.builtin.precision.atan2.highp_compute.vec4
dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec2
dEQP-VK.glsl.builtin.precision.atan2.mediump_compute.vec4
Note that most of the test-cases above expect IEEE-compliant handling
of atan2(±∞, ±∞), which this patch doesn't explicitly handle, so
except for the last two the test-cases above weren't expected to pass
yet. The reason they do is that the i965 back-end implementation of
the NIR fmin and fmax instructions is not quite GLSL-compliant (it
complies with IEEE 754 recommendations though), because fmin/fmax of a
NaN and a non-NaN argument currently always return the non-NaN
argument, which causes atan() to flush NaN to one and return the
expected value. The front-end should probably not be relying on this
behavior for correctness though because other back-ends are likely to
behave differently -- A follow-up patch will handle the atan2(±∞, ±∞)
corner cases explicitly.
v2: Fix up argument scaling to take into account the range and
precision of exotic FP24 hardware. Flip coordinate system for
arguments along the vertical line as if they were on the left
half-plane in order to avoid division by zero which may give
unspecified results on non-GLSL 4.1-capable hardware. Sprinkle in
some more comments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This addresses several issues of the current atan2 implementation:
- Negative zero (and negative denorms which end up getting flushed to
zero) isn't handled correctly by the current implementation. The
reason is that it does 'y >= 0' and 'x < 0' comparisons to decide
on which side of the branch cut the argument is, which causes us to
return incorrect results (off by up to 2π) for very small negative
values.
- There is a serious precision problem for x values of large enough
magnitude introduced by the floating point division operation being
implemented as a mul+rcp sequence. This can lead to the quotient
getting flushed to zero in some cases introducing an error of over
8e6 ULP in the result -- Or in the most catastrophic case will
cause us to return NaN instead of the correct value ±π/2 for y=±∞
and x very large. We can fix this easily by scaling down both
arguments when the absolute value of the denominator goes above
certain threshold. The error of this atan2 implementation remains
below 25 ULP in most of its domain except for a neighborhood of y=0
where it reaches a maximum error of about 180 ULP.
- It emits a bunch of instructions including no less than three
if-else branches per scalar component that don't seem to get
optimized out later on. This implementation uses about 13% less
instructions on Intel SKL hardware and doesn't emit any control
flow instructions.
v2: Fix up argument scaling to take into account the range and
precision of exotic FP24 hardware. Flip coordinate system for
arguments along the vertical line as if they were on the left
half-plane in order to avoid division by zero which may give
unspecified results on non-GLSL 4.1-capable hardware. Sprinkle in
some more comments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This does point at the front-end emitting silly code that could have
been optimized out, but the current fsign implementation would emit
bogus IR if abs was set for the argument (because it would apply the
abs modifier on an unsigned integer type), and we shouldn't rely on
the upper layer's optimization passes for correctness.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Will avoid a regression in a future commit that introduces some
additional rcp operations. According to the GLSL 4.10 specification:
"Dividing by 0 results in the appropriately signed IEEE Inf."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
This will be used internally by the GLSL front-end in order to
implement some built-in functions. Plumb it through MESA IR for
back-ends that rely on this translation pass.
v2: Add comment.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
This fixes rendering of full-screen quads (and other screen-filling
geometry, e.g. ioquake3 walls up-close) on gc3000. It should be a no-op
on other hardware.
- It looks like SE_CLIP registers were not set at all.
I'm amazed that rendering worked without them. Emit them to
avoid issues on gc3000.
- Define constants
ETNA_SE_SCISSOR_MARGIN_RIGHT (0x1119)
ETNA_SE_SCISSOR_MARGIN_BOTTOM (0x1111)
ETNA_SE_CLIP_MARGIN_RIGHT (0xffff)
ETNA_SE_CLIP_MARGIN_BOTTOM (0xffff)
These demarcate the margin (fixp16) between the computed sizes and the
value sent to the chip. I have set these to the numbers used by the
Vivante driver for gc2000. I am not sure whether any old hardware was
relying on the old numbers, or whether those were just a guess. But if
so, these need to be moved to the _specs structure.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Shaders using sin/cos instructions were not working on GC3000.
The reason for this turns out to be that these chips implement sin/cos
in a different way (but using the same opcodes):
- Need their input scaled by 1/pi instead of 2/pi.
- Output an x and y component, which need to be multiplied to
get the result.
- tex_amode needs to be set to 1.
Add a new bit to the compiler specs and generate these instructions
as necessary.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Commit 2852efcda4 moved the location of
the depth input attachment surface state from the render pass to the
image view, but failed to update the surface state location used when
emitting the binding table. Fix this by loading the surface state from
the correct location.
Fixes:
dEQP-VK.renderpass.formats.d16_unorm.input.*
dEQP-VK.renderpass.formats.d24_unorm_s8_uint.input.*
dEQP-VK.renderpass.formats.d32_sfloat.input.*
dEQP-VK.renderpass.formats.x8_d24_unorm_pack32.input.*
dEQP-VK.renderpass.attachment_allocation.input_output.93
dEQP-VK.renderpass.attachment_allocation.input_output.92
dEQP-VK.renderpass.attachment_allocation.input_output.82
dEQP-VK.renderpass.attachment_allocation.input_output.46
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Exposing rb swapped (or other swizzled) formats for rendering would
involve swizzing in the pixel shader. This is not the case at the
moment, so reject requests for creating such surfaces.
(GPUs that need an extra resolve step anyway due to multiple pixel
pipes, such as gc2000, might also do this swap in the resolve operation.
But this would be tricky to keep track of)
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Use of unsigned loop control variable with '>= 0' would lead
to infinite loop.
Reported by clang:
etnaviv_compiler.c:1024:39: warning: comparison of unsigned expression
>= 0 is always true [-Wtautological-compare]
for (unsigned sp = c->frame_sp; sp >= 0; sp--)
~~ ^ ~
v2: Simply use the same datatype as c->frame_sp is using.
CC: <mesa-stable@lists.freedesktop.org>
Reported-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
There are some corner cases where you end up with an esgs ring, but no
gsvs ring, test for both before dereferencing.
Fixes:
dEQP-VK.geometry.emit.points_emit_0_end_0
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This emits the compiled geometry shader and other state registers.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This uses the scratch infrastructure to handle the esgs
and gsvs rings.
(this replaces the old code that did this with patching).
v2: fix correct ring sizes, reset sizes (Bas)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds gs copy shader support to the pipeline cache, and few
geometry related changes.
v2: rebase for spill changes.
v2.1: fix incorrect pipeline destruction.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This handles geometry shader inputs written by the vertex (es) shader
to the esgs ring.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This handles emitting things to the gsvs ring, and sending the
correct GS msgs.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just places the flag into the shader info so we can use it from
the driver after we create the shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This enables the paths for setting up user ptrs to vs/es and gs.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This sets up the rings and adds the variables
needed to make them work.
v2: rework for sharing ring and scratch
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Prevent Coverity seeing potential errors when src is
no initialized in the switch case.
Coverity-Id: 1396397
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We also add a flag for detecting shaders written to shader cache.
V2: dont leak cache
Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The shader cache is expected to be developed incrementally over a
fairly long series of commits. For that period of instability, we
require users to opt into the shader cache by setting:
MESA_GLSL_CACHE_ENABLE=1
In the future, when the shader cache is complete, we can revert this
commit so that the cache will be on by default.
The user can always disable the cache with
MESA_GLSL_CACHE_DISABLE=1. That functionality is not affected by this
commit, (nor will it be affected by the future revert).
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes a bunch of buffer related:
dEQP-VK.memory.pipeline_barrier.*
tests, that were crashing in LLVM due to this being missing.
Reviewed-by: Andres Rodriguez<andresx7@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Should be r600_common_screen instead of r600_screen.
Fixes: 80157a2c20 ("gallium/radeon: clean up r600_query_init_backend_mask")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This fixes a bug uncovered by the 17-part patch series, specifically:
"gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counter"
If dirty_tex_counter has been updated and set_shader_image invokes DCC
decompression, the DCC decompression itself checks the counter and updates
descriptors, which in turn invokes the same DCC decompression. The blitter
can't handle the recursion and the driver eventually crashes.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The update frequency is very low.
Difference: Only account for the size when allocating a new one and when
starting a new IB, and check for NULL. (v3)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Commit 7b5878ee04 increased number of
outputs to 64, but left output array intact. This caused stack overflow
when number of outputs is bigger then 32. Found by ASAN.
Cc: "12.0 13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
There are even more counters in the CP_STAT register but I think
these ones are enough for now.
v2: only read (and expose) CP_STAT on VI+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For simplicity, GPU-sdma-busy will return 0 on previous gens.
v2: only read SRBM_STATUS2 on Evergreen+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We also want to monitor other MMIO counters like SRBM_STATUS2 in
order to know if SDMA is busy.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The perf difference is very small, 3.25->2.84% in amdgpu_cs_flush()
in the DXMD benchmark.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This just needs to be done for r600g in the screen.
We don't need an IB submission for every new context created for GCN.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The perf difference is very small: 0.99% -> 0.40% for the time spent
in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Previously the z offset of the destination image was being ignored. It
should be taken into account when copying into a 3d target.
Also, img_extent_el.depth was being incorrectly clamped to 1 due to the
source image being VK_IMAGE_TYPE_2D. This would result in the blit
failing to iterate over all the 3d slices. Instead we clamp to the
destination image type.
Fixes failures in CTS tests:
dEQP-VK.api.copy_and_blit.image_to_image.3d_images.*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
There is a new error code in Maintenance1 that is more specific to the
situation: VK_ERROR_OUT_OF_POOL_MEMORY_KHR
Fixes CTS test case:
dEQP-VK.api.descriptor_pool.out_of_pool_memory
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is part of the spec and fixes CTS tests:
dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
At this point, the pitch is in bytes. We haven't yet divided the pitch
by 4 for tiled surfaces, so abs(pitch) may be larger than 32K. This
means the bit 15 trick won't work.
The caller now has signed integers anyway, so just pass those through
and do the obvious check.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Since LLVM revision 293359 DumpModule gets only implemented when
either a debug build or LLVM_ENABLE_DUMP is set.
This patch adds a direct replacement for the function for radv and
radeonsi, However, as I don't know a good place to put common LLVM
code for all three I inlined the implementation for LLVMPipe.
v2: Use the new code for LLVM 3.4+ instead of LLVM 5+ & fixed indentation
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This matches the behavior of most other drivers, including nouveau,
radeonsi, and i965.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This generally cuts an instruction when blending is enabled and we thus
have a single instruction generating the color value.
total instructions in shared programs: 91759 -> 91634 (-0.14%)
instructions in affected programs: 5338 -> 5213 (-2.34%)
shader-db results:
total instructions in shared programs: 92611 -> 91764 (-0.91%)
instructions in affected programs: 27417 -> 26570 (-3.09%)
The star is one shader in glmark2's terrain (drops 16% of its
instructions), but there are also wins in mupen64plus and glb2.7.
This has almost no effect on shader-db:
total instructions in shared programs: 92572 -> 92611 (0.04%)
instructions in affected programs: 4486 -> 4525 (0.87%)
Looking at 2 of the 7 different shaders that were hurt (all of which were
in mupen64), they all appear to be just differences in order of
instructions at the NIR level.
The advantage is that this should significantly reduce time in the compiler.
Applications may delete a shader program, create a new one, and bind it
before the next draw. With terrible luck, malloc may randomly return a
chunk of memory for the new gl_program that happened to be the exact
same pointer as our previously bound gl_program. In this case, our
logic to detect new programs in brw_upload_pipeline_state() would break:
if (brw->vertex_program != ctx->VertexProgram._Current) {
brw->vertex_program = ctx->VertexProgram._Current;
brw->ctx.NewDriverState |= BRW_NEW_VERTEX_PROGRAM;
}
Because the pointer is the same, we'd think it was the same program.
But it could be wildly different - a different stage altogether,
different sets of resources, and so on. This causes utter chaos.
As unlikely as this seems, I believe I hit this when running a subset
of the CTS in a loop, in a group of tests that churns through simple
programs, deleting and rebuilding them. Presumably malloc uses a
bucketing cache of sorts, and so freeing up a gl_program and allocating
a new one fairly quickly causes it to reuse that memory.
The result was that brw->vertex_program->info.num_ssbos claimed the
program had SSBOs, while brw->vs.base.prog_data.binding_table claimed
that there were none. This was crazy, because the binding table is
calculated from info.num_ssbos - the shader info appeared to change
between shader compile time and draw time. Careful use of watchpoints
revealed that it was being clobbered by rzalloc's memset when building
an entirely different program...
Fortunately, our 0xd0d0d0d0 canary for unused binding table entries
caused us to crash out of bounds when trying to upload SSBOs, or we
may have never discovered this heisenbug.
Fixes crashes in GL45-CTS.compute_shader.sso-case2 when using a hacked
cts-runner that only runs GL45-CTS.compute_shader.s* in EGL config ID 5
at 64x64 in a loop with 100 iterations.
Cc: "17.0 13.0 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This extension was not correctly supported, and it conflicts with the
VK_KHR_MAINTENANCE1 spec.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
this fixes deferred shadows with geom shaders enabled.
but I think this fix is fine by itself.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch implements a new type of struct brw_fence, one that is based
struct sync_file.
This completes support for EGL_ANDROID_native_fence_sync.
* Background
Linux 4.7 added a new file type, struct sync_file. See
commit 460bfc41fd52959311ed0328163f785e023857af
Author: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Date: Thu Apr 28 10:46:57 2016 -0300
Subject: dma-buf/sync_file: de-stage sync_file headers
A sync file is a cross-driver explicit synchronization primitive. In a
sense, sync_file's relation to synchronization is similar to dma_buf's
relation to memory: both are primitives that can be imported and
exported across drivers (at least in theory).
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Rename to brw_fence_insert_locked(). This is correct because the fence's
mutex is effectively locked, as all callers are also *creators* of the
fence, and have not yet returned the new fence.
This reduces noise in the next patch, which defines and uses
brw_fence_insert(), an unlocked variant.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Pre-patch, brw_sync.c ignored the return value of
intel_batchbuffer_flush().
When intel_batchbuffer_flush() fails during eglCreateSync
(brw_dri_create_fence), we now give up, cleanup, and return NULL.
When it fails during glFenceSync, however, we blindly continue and hope
for the best because there does not exist yet a way to tell core GL that
sync creation failed.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This a refactor patch; no expected changed in behavior.
Add `enum brw_fence_type` and brw_fence::type. There is only one type
currently, BRW_FENCE_TYPE_BO_WAIT. This patch reduces a lot of noise in
the next, which adds new type BRW_FENCE_TYPE_SYNC_FD.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Required to implement EGL_ANDROID_native_fence_sync on i965.
Specifically, i965 needs drm_intel_gem_bo_exec_fence(),
I915_PARAM_HAS_EXEC_FENCE, and libsync.h.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Analogous to previous commit(s), with a minor detail - here we set the
macros when building both C and C++ sources.
Resolving that is a more challenging task that we'll sort out another
day.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The variable replacement was unused when building w/o
ENABLE_SHADER_CACHE. Since we can mix variable declarations and code,
move it to where its used.
Fixes: 9f8dc3bf03 "utils: build sha1/disk cache only with
Android/Autoconf"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Using return foo() is incorrect even if foo itself returns void.
Spotted by AppVeyor, as below:
teximage.c(3653) : warning C4098: 'copyteximage' : 'void' function returning a value
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Does not match the function definition or how it's used. Triggers the
following warning in AppVeyor
svga_cmd_vgpu10.c(1301) : warning C4028: formal parameter 2 different from declaration
Cc: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
MSVC warns about different const qualifiers. Add the extra const to
silence it.
nir_phi_builder.c(244) : warning C4090: 'initializing' : different 'const' qualifiers
nir_phi_builder.c(245) : warning C4090: 'initializing' : different 'const' qualifiers
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
MSVC warns about implicit conversion as below. Annotate the literal
appropriately to silence the warning.
nir_gather_info.c(249) : warning C4334: '<<' : result of 32-bit shift
implicitly converted to 64 bits (was 64-bit shift intended?)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
b3119a3 introduced a strict LLVM requirement for r300 on all
architectures and thus configure fails on architectures where LLVM is
not available or buggy.
r300 doesn't strictly require LLVM, but for performance reasons we
highly recommend LLVM usage. So require it at least on x86 and x86_64
architectures as we have done before b3119a3.
Fixes: b3119a3 ("configure.ac: Check gallium LLVM version in gallium_require_llvm")
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
All of these have had support for the TGSI opcodes since before most of
the glsl compiler work landed.
Also update the docs accordingly, including the missing note about i965.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: add conversion opcodes.
v3 (idr): Rebase on replacemtn of TGSI_OPCODE_I2U64 with
TGSI_OPCODE_I2I64.
v4 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b. Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.
v5 (nha): add clarifying comment about a subtle assumption
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v1.1: move to using a normal CAP. (Marek)
v2: fill in the cap everywhere
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
which is not applicable for "all slices at each lod". Current
logic makes one to believe it has some purpose. When miptree
layout is calculated brw_miptree_layout_texture_array() sets
the qpitch unconditionally but later on ignores it altogether
for ALL_SLICES_AT_EACH_LOD.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Such as comment states for intel_miptree_hiz_buffer::mt, hiz_mt
only exists for gen6. In addition, intel_hiz_miptree_buf_create()
uses MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD unconditionally.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
In intel_hiz_miptree_buf_create() intel_miptree_aux_buffer::bo
is unconditionally initialised to point to the same buffer
object as hiz_mt does. The same goes for
intel_miptree_aux_buffer::pitch/qpitch.
This will make following patches simpler to read.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
In intel_hiz_miptree_buf_create() intel_miptree_aux_buffer::bo
is unconditionally initialised to point to the same buffer
object as hiz_mt does. Also intel_miptree_aux_buffer::offset
is initialised to zero (calloc()).
This will make following patches significantly simpler to read.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Only caller, brw_workaround_depthstencil_alignment(), returns
early for gen6+.
While at it, reduce scope for brw_get_depthstencil_tile_masks() as
well.
Reviewed-by: Samuel Iglesias Gons\341lvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
There exact same check earlier in brw_miptree_layout() which
intel_miptree_create_layout() in turn calls unconditionally.
Reviewed-by: Samuel Iglesias Gons\341lvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
In addition, let intel_miptree_create_layout() release the
miptree - it is the allocator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
# Install python wheels, necessary to install SCons via pip
- python -m pip install wheel
# Install SCons
- python -m pip install --egg scons==2.4.1
- python -m pip install scons==2.5.1
- scons --version
# Install flex/bison
- if not exist "%WINFLEXBISON_ARCHIVE%" appveyor DownloadFile "https://downloads.sourceforge.net/project/winflexbison/old_versions/%WINFLEXBISON_ARCHIVE%"
@@ -213,7 +213,7 @@ If you don't already have GLUT installed, you should grab
<h2>2.4 Where is the GLw library?</h2>
<p>
GLw (OpenGL widget library) is now available from a separate <ahref="http://cgit.freedesktop.org/mesa/glw/">git repository</a>. Unless you're using very old Xt/Motif applications with OpenGL, you shouldn't need it.
GLw (OpenGL widget library) is now available from a separate <ahref="https://cgit.freedesktop.org/mesa/glw/">git repository</a>. Unless you're using very old Xt/Motif applications with OpenGL, you shouldn't need it.
</p>
@@ -276,7 +276,7 @@ If you're using a hardware accelerated driver you want <code>direct rendering: Y
</p>
<p>
If your DRI-based driver isn't working, go to the
<ahref="http://dri.freedesktop.org/">DRI website</a> for trouble-shooting information.
<ahref="https://dri.freedesktop.org/">DRI website</a> for trouble-shooting information.
</p>
@@ -284,7 +284,7 @@ If your DRI-based driver isn't working, go to the
<p>
Make sure the ratio of the far to near clipping planes isn't too great.
<ahref="relnotes/17.2.2.html">Mesa 17.2.2</a> is released.
This is a bug-fix release.
</p>
<h2>September 25, 2017</h2>
<p>
<ahref="relnotes/17.1.10.html">Mesa 17.1.10</a> is released.
This is a bug-fix release.
</p>
<h2>September 17, 2017</h2>
<p>
<ahref="relnotes/17.2.1.html">Mesa 17.2.1</a> is released.
This is a bug-fix release.
</p>
<h2>September 8, 2017</h2>
<p>
<ahref="relnotes/17.1.9.html">Mesa 17.1.9</a> is released.
This is a bug-fix release.
</p>
<h2>September 4, 2017</h2>
<p>
<ahref="relnotes/17.2.0.html">Mesa 17.2.0</a> is released. This is a
new development release. See the release notes for more information
about the release.
</p>
<h2>August 28, 2017</h2>
<p>
<ahref="relnotes/17.1.8.html">Mesa 17.1.8</a> is released.
This is a bug-fix release.
</p>
<h2>August 21, 2017</h2>
<p>
<ahref="relnotes/17.1.7.html">Mesa 17.1.7</a> is released.
This is a bug-fix release.
</p>
<h2>August 7, 2017</h2>
<p>
<ahref="relnotes/17.1.6.html">Mesa 17.1.6</a> is released.
This is a bug-fix release.
</p>
<h2>July 14, 2017</h2>
<p>
<ahref="relnotes/17.1.5.html">Mesa 17.1.5</a> is released.
This is a bug-fix release.
</p>
<h2>June 30, 2017</h2>
<p>
<ahref="relnotes/17.1.4.html">Mesa 17.1.4</a> is released.
This is a bug-fix release.
</p>
<h2>June 19, 2017</h2>
<p>
<ahref="relnotes/17.1.3.html">Mesa 17.1.3</a> is released.
This is a bug-fix release.
</p>
<h2>June 5, 2017</h2>
<p>
<ahref="relnotes/17.1.2.html">Mesa 17.1.2</a> is released.
This is a bug-fix release.
</p>
<h2>June 1, 2017</h2>
<p>
<ahref="relnotes/17.0.7.html">Mesa 17.0.7</a> is released.
This is a bug-fix release.
<br>
NOTE: It is anticipated that 17.0.7 will be the final release in the 17.0
series. Users of 17.0 are encouraged to migrate to the 17.1 series in order
to obtain future fixes.
</p>
<h2>May 25, 2017</h2>
<p>
<ahref="relnotes/17.1.1.html">Mesa 17.1.1</a> is released.
This is a bug-fix release.
</p>
<h2>May 12, 2017</h2>
<p>
<ahref="relnotes/17.0.6.html">Mesa 17.0.6</a> is released.
This is a bug-fix release.
</p>
<h2>May 10, 2017</h2>
<p>
<ahref="relnotes/17.1.0.html">Mesa 17.1.0</a> is released. This is a
new development release. See the release notes for more information
about the release.
</p>
<h2>April 28, 2017</h2>
<p>
<ahref="relnotes/17.0.5.html">Mesa 17.0.5</a> is released.
This is a bug-fix release.
</p>
<h2>April 17, 2017</h2>
<p>
<ahref="relnotes/17.0.4.html">Mesa 17.0.4</a> is released.
This is a bug-fix release.
</p>
<h2>April 1, 2017</h2>
<p>
<ahref="relnotes/17.0.3.html">Mesa 17.0.3</a> is released.
This is a bug-fix release.
</p>
<h2>March 20, 2017</h2>
<p>
<ahref="relnotes/13.0.6.html">Mesa 13.0.6</a> and
<ahref="relnotes/17.0.2.html">Mesa 17.0.2</a> are released.
These are bug-fix releases from the 13.0 and 17.0 branches, respectively.
<br>
NOTE: It is anticipated that 13.0.6 will be the final release in the 13.0
series. Users of 13.0 are encouraged to migrate to the 17.0 series in order
to obtain future fixes.
</p>
<h2>March 4, 2017</h2>
<p>
<ahref="relnotes/17.0.1.html">Mesa 17.0.1</a> is released.
This is a bug-fix release.
</p>
<h2>February 20, 2017</h2>
<p>
<ahref="relnotes/13.0.5.html">Mesa 13.0.5</a> is released.
This is a bug-fix release.
</p>
<h2>February 13, 2017</h2>
<p>
<ahref="relnotes/17.0.0.html">Mesa 17.0.0</a> is released. This is a
new development release. See the release notes for more information
about the release.
</p>
<h2>February 1, 2017</h2>
<p>
<ahref="relnotes/13.0.4.html">Mesa 13.0.4</a> is released.
This is a bug-fix release.
</p>
<h2>January 23, 2017</h2>
<p>
<ahref="relnotes/12.0.6.html">Mesa 12.0.6</a> is released.
@@ -162,7 +319,7 @@ This is a bug-fix release.
</p>
<p>
Mesa demos 8.3.0 is also released.
See the <ahref="http://lists.freedesktop.org/archives/mesa-announce/2015-December/000191.html">announcement</a> for more information about the release.
See the <ahref="https://lists.freedesktop.org/archives/mesa-announce/2015-December/000191.html">announcement</a> for more information about the release.
You can download it from <ahref="ftp://ftp.freedesktop.org/pub/mesa/demos/8.3.0/">ftp.freedesktop.org/pub/mesa/demos/8.3.0/</a>.
</p>
@@ -477,7 +634,7 @@ This is a bug-fix release.
<p>
Mesa demos 8.2.0 is released.
See the <ahref="http://lists.freedesktop.org/archives/mesa-announce/2014-July/000100.html">announcement</a> for more information about the release.
See the <ahref="https://lists.freedesktop.org/archives/mesa-announce/2014-July/000100.html">announcement</a> for more information about the release.
You can download it from <ahref="ftp://ftp.freedesktop.org/pub/mesa/demos/8.2.0/">ftp.freedesktop.org/pub/mesa/demos/8.2.0/</a>.
</p>
@@ -656,7 +813,7 @@ This is a bug fix release.
<p>
Mesa demos 8.1.0 is released.
See the <ahref="http://lists.freedesktop.org/archives/mesa-dev/2013-February/035180.html">announcement</a> for more information about the release.
See the <ahref="https://lists.freedesktop.org/archives/mesa-dev/2013-February/035180.html">announcement</a> for more information about the release.
You can download it from <ahref="ftp://ftp.freedesktop.org/pub/mesa/demos/8.1.0/">ftp.freedesktop.org/pub/mesa/demos/8.1.0/</a>.
</p>
@@ -1352,7 +1509,7 @@ and primarily just incorporates bug fixes.
<h2>December 28, 2003</h2>
<p>
The Mesa CVS server has been moved to <ahref="http://www.freedesktop.org">
The Mesa CVS server has been moved to <ahref="https://www.freedesktop.org">
freedesktop.org</a> because of problems with SourceForge's anonymous
CVS service.
</p>
@@ -1924,7 +2081,7 @@ Here's what's new:</p>
</pre>
<h2>March 23, 2000</h2>
<p>I've just upload the Mesa 3.2 beta 1 files to SourceForge at <ahref="http://sourceforge.net/project/showfiles.php?group_id=3">http://sourceforge.net/project/filelist.php?group_id=3</a></p>
<p>I've just upload the Mesa 3.2 beta 1 files to SourceForge at <ahref="https://sourceforge.net/project/showfiles.php?group_id=3">https://sourceforge.net/project/filelist.php?group_id=3</a></p>
<p>3.2 (note even number) is a stabilization release of Mesa 3.1 meaning it's mainly
just bug fixes.</p>
<p>Here's what's changed:</p>
@@ -1972,7 +2129,7 @@ After 3.2 is wrapped up I hope to release 3.3 beta 1 soon afterward.</p>
<h2>December 17, 1999</h2>
<p>A Slashdot interview with Brian about Mesa (questions submitted by Slashdot readers)
can be found at <ahref="http://slashdot.org/interviews/99/12/17/0927212.shtml">http://slashdot.org/interviews/99/12/17/0927212.shtml</a>.</p>
can be found at <ahref="https://slashdot.org/interviews/99/12/17/0927212.shtml">https://slashdot.org/interviews/99/12/17/0927212.shtml</a>.</p>
<h2>December 14, 1999</h2>
<p>Mesa 3.1 is released!</p>
@@ -2006,7 +2163,7 @@ BOF meeting is now available.</p>
<p>-Brian</p>
<h2>August 14, 1999</h2>
<p><ahref="http://www.mesa3d.org">www.mesa3d.org</a> is having
<p><ahref="https://www.mesa3d.org">www.mesa3d.org</a> is having
technical problems due to hardware failures at VA Linux systems. The Mac pages,
ftp, and CVS services aren't fully restored yet. Please be patient.</p>
<p>-Brian</p>
@@ -2015,9 +2172,9 @@ ftp, and CVS services aren't fully restored yet. Please be patient.</p>
<p>RPMS of the nVidia RIVA server can be found at <code>ftp://ftp.mesa3d.org/mesa/misc/nVidia/</code>.</p>
<h2>June 2, 1999</h2>
<p><ahref="http://www.nvidia.com/">nVidia</a> has released some Linux binaries for
<p><ahref="https://www.nvidia.com/">nVidia</a> has released some Linux binaries for
xfree86 3.3.3.1, along with the <b>full source</b>, which includes GLX acceleration
based on Mesa 3.0. They can be downloaded from <code>http://www.nvidia.com/Products.nsf/htmlmedia/software_drivers.html</code>.</p>
based on Mesa 3.0. They can be downloaded from <code>https://www.nvidia.com/Products.nsf/htmlmedia/software_drivers.html</code>.</p>
<h2>May 24, 1999</h2>
<p>Beta 2 of Mesa 3.1 has been make available at <code>ftp://ftp.mesa3d.org/mesa/beta/</code>.
@@ -2065,11 +2222,11 @@ grateful.
<p>The new webpages are now online. Enjoy, and let me know if you find any errors.
<h2>February 16, 1999</h2>
<p><ahref="http://www.sgi.com/">SGI</a> releases its
this stand-alone example</a>. See the llvm-c/Core.h file for reference.
</li>
</ul>
@@ -264,18 +264,18 @@ for posterior analysis, e.g.:
<li>
<p>Rasterization</p>
<ul>
<li><ahref="http://www.cs.unc.edu/~olano/papers/2dh-tri/">Triangle Scan Conversion using 2D Homogeneous Coordinates</a></li>
<li><ahref="https://www.cs.unc.edu/~olano/papers/2dh-tri/">Triangle Scan Conversion using 2D Homogeneous Coordinates</a></li>
<li><ahref="http://www.drdobbs.com/parallel/rasterization-on-larrabee/217200602">Rasterization on Larrabee</a> (<ahref="http://devmaster.net/posts/2887/rasterization-on-larrabee">DevMaster copy</a>)</li>
<li><ahref="http://devmaster.net/posts/6133/rasterization-using-half-space-functions">Rasterization using half-space functions</a></li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92634">Bug 92634</a> - gallium's vl_mpeg12_decoder does not work with st/va</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94512">Bug 94512</a> - X segfaults with glx-tls enabled in a x32 environment</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94900">Bug 94900</a> - HD6950 GPU lockup loop with various steam games (octodad[always], saints row 4[always], dead island[always], grid autosport[sometimes])</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98263">Bug 98263</a> - [radv] The Talos Principle fails to launch with "Fatal error: Cannot set display mode."</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98914">Bug 98914</a> - mesa-vdpau-drivers: breaks vdpau for mpeg2video</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99144">Bug 99144</a> - Incorrect rendering using glDrawArraysInstancedBaseInstance and first != 0 on Skylake</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99154">Bug 99154</a> - Link time error when using multiple builtin functions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99158">Bug 99158</a> - vdpau segfaults and gpu locks with kodi on R9285</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98421">Bug 98421</a> - src/loader/loader.c:111:40: error: unknown type name ‘drmDevicePtr’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99532">Bug 99532</a> - Compute shader doesn't give right result under some circumstances</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99631">Bug 99631</a> - segfault with OSVRTrackerView and openscenegraph git master</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99633">Bug 99633</a> - rasterizer/core/clip.h:279:49: error: ‘const struct API_STATE’ has no member named ‘linkageCount’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99692">Bug 99692</a> - [radv] Mostly broken on Hawaii PRO/CIK ASICs</li>
</ul>
<h2>Changes</h2>
<p>Bartosz Tomczyk (2):</p>
<ul>
<li>r600: Fix stack overflow</li>
<li>r600/sb: Fix memory leak</li>
</ul>
<p>Bruce Cherniak (1):</p>
<ul>
<li>swr: [rasterizer core] Remove dead code Clipper::ClipScalar()</li>
</ul>
<p>Chad Versace (1):</p>
<ul>
<li>i965/mt: Disable HiZ when sharing depth buffer externally (v2)</li>
</ul>
<p>Dave Airlie (3):</p>
<ul>
<li>radv: change base aligmment for allocated memory.</li>
<li>radv: fix cik macroModeIndex.</li>
<li>radv: adopt some init config workarounds from radeonsi.</li>
</ul>
<p>Derek Foreman (1):</p>
<ul>
<li>egl/dri2: add image_loader_extension back into loader extensions for wayland</li>
</ul>
<p>Emil Velikov (26):</p>
<ul>
<li>docs: add sha256 checksums for 13.0.4</li>
<li>configure.ac: list radeon in --with-vulkan-drivers help string</li>
<li>i965: automake: correctly set MKDIR_GEN</li>
<li>freedreno: automake: correctly set MKDIR_GEN</li>
<li>i965: automake: include builddir prior to srcdir</li>
<li>i915: automake: include builddir prior to srcdir</li>
<li>egl: automake: include builddir prior to srcdir</li>
<li>clover: automake: include builddir prior to srcdir</li>
<li>st/dri: automake: include builddir prior to srcdir</li>
<li>d3dadapter9: automake: include builddir prior to srcdir</li>
<li>glx: automake: include builddir prior to srcdir</li>
<li>glx/apple: automake: include builddir prior to srcdir</li>
<li>glx/windows: automake: include builddir prior to srcdir</li>
<li>loader: automake: include builddir prior to srcdir</li>
<li>mapi: automake: include builddir prior to srcdir</li>
<li>radeon, r200: automake: include builddir prior to srcdir</li>
<li>dri/swrast: automake: include builddir prior to srcdir</li>
<li>dri/osmesa: automake: include builddir prior to srcdir</li>
<li>mesa/tests: automake: include builddir prior to srcdir</li>
<li>bin/get-extra-pick-list: use git merge-base to get the branchpoint</li>
<li>bin/get-extra-pick-list: rework to use already_picked list</li>
<li>bin/get-typod-pick-list.sh: limit `git grep ...' to only as needed</li>
<li>bin/get-pick-list.sh: limit `git grep ...' only as needed</li>
<li>bin/get-pick-list.sh: remove ancient way of nominating patches</li>
<li>bin/get-fixes-pick-list.sh: add new script</li>
<li>Update version to 13.0.5</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>vc4: Avoid emitting small immediates for UBO indirect load address guards.</li>
</ul>
<p>Hans de Goede (1):</p>
<ul>
<li>glx/glvnd: Fix GLXdispatchIndex sorting</li>
</ul>
<p>Ian Romanick (11):</p>
<ul>
<li>linker: Slight code rearrange to prevent duplication in the next commit</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68504">Bug 68504</a> - 9.2-rc1 workaround for clover build failure on ppc/altivec: cannot convert 'bool' to '__vector(4) __bool int' in return</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99456">Bug 99456</a> - Firefox crashing when opening about:support with WebGL2 enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99677">Bug 99677</a> - heap-use-after-free in glsl</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99715">Bug 99715</a> - Don't print: "Note: Buggy applications may crash, if they do please report to vendor"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99850">Bug 99850</a> - Tessellation bug on Carrizo</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100049">Bug 100049</a> - "ralloc: Make sure ralloc() allocations match malloc()'s alignment." causes seg fault in 32bit build</li>
</ul>
<h2>Changes</h2>
<p>Alex Smith (2):</p>
<ul>
<li>radv: Emit pending flushes before executing a secondary command buffer</li>
<li>radv: Flush before copying with PKT3_WRITE_DATA in CmdUpdateBuffer</li>
</ul>
<p>Bartosz Tomczyk (1):</p>
<ul>
<li>glsl: fix heap-buffer-overflow</li>
</ul>
<p>Bas Nieuwenhuizen (8):</p>
<ul>
<li>radv: Pass CMASK alignment to application.</li>
<li>radv: Pass DCC alignment to application.</li>
<li>radv: Never try to create more than max_sets descriptor sets.</li>
<li>radv: Reset emitted compute pipeline when calling secondary cmd buffer.</li>
<li>radv: Only use PKT3_OCCLUSION_QUERY when it doesn't hang.</li>
<li>radv: Use correct size for availability flag.</li>
<li>radv: Disable HTILE for textures with multiple layers/levels.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91281">Bug 91281</a> - Tonga VCE 2160p encode fails with BO to small for addr</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92234">Bug 92234</a> - [BDW] GPU hang in Shogun2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92634">Bug 92634</a> - gallium's vl_mpeg12_decoder does not work with st/va</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92760">Bug 92760</a> - Add FP64 support to the i965 shader backends</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92925">Bug 92925</a> - Incorrect GEN for ASTC in Surface Format Table</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93551">Bug 93551</a> - Divinity: Original Sin Enhanced Edition(Native) crash on start</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94512">Bug 94512</a> - X segfaults with glx-tls enabled in a x32 environment</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94900">Bug 94900</a> - HD6950 GPU lockup loop with various steam games (octodad[always], saints row 4[always], dead island[always], grid autosport[sometimes])</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=95460">Bug 95460</a> - Please add more drivers (freedreno, virgl) to features.txt status document</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=96959">Bug 96959</a> - nop.sat generated by pow workaround?</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97804">Bug 97804</a> - Later precision statement isn't overriding earlier one</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97952">Bug 97952</a> - /usr/include/string.h:518:12: error: exception specification in declaration does not match previous declaration</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98005">Bug 98005</a> - VCE dual instance encoding inconsistent since st/va: enable dual instances encode by sync surface</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98012">Bug 98012</a> - [IVB] Segfault when running Dolphin twice with Vulkan</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98134">Bug 98134</a> - dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.draw_buffers wants a different GL error code</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98172">Bug 98172</a> - Concurrent call to glClientWaitSync results in segfault in one of the waiters.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98238">Bug 98238</a> - witcher 2: objects are black when changing lod</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98245">Bug 98245</a> - GLES3.1 link negative dEQP "expected linking to fail, but passed."</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98263">Bug 98263</a> - [radv] The Talos Principle fails to launch with "Fatal error: Cannot set display mode."</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98297">Bug 98297</a> - Can't configure a desktop with 3x4k monitors in one row</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98421">Bug 98421</a> - src/loader/loader.c:111:40: error: unknown type name ‘drmDevicePtr’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99119">Bug 99119</a> - swr_fence_work.cpp(42): error: argument of type "std::nullptr_t" is incompatible with parameter of type "unsigned long"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99144">Bug 99144</a> - Incorrect rendering using glDrawArraysInstancedBaseInstance and first != 0 on Skylake</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99154">Bug 99154</a> - Link time error when using multiple builtin functions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99158">Bug 99158</a> - vdpau segfaults and gpu locks with kodi on R9285</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99214">Bug 99214</a> - Crash in library libswrAVX.so when assigning vertex buffer object pointers with elements of type GL_DOUBLE</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99219">Bug 99219</a> - The Stanley Parable GPU hang when starting a new game</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99229">Bug 99229</a> - [G33] thousands of tests crash</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99231">Bug 99231</a> - [HSW][i965] Crash in upload_3dstate_streamout()</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99303">Bug 99303</a> - [REGRESSION][BISECTED] DMs are crashing on start with "radeon"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99419">Bug 99419</a> - Crash(Segmentation fault) si_shader_select in Master Of Orion</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99450">Bug 99450</a> - [amdgpu] Payday 2 visual glitches on some models</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99451">Bug 99451</a> - polygon offset use after free</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99456">Bug 99456</a> - Firefox crashing when opening about:support with WebGL2 enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99631">Bug 99631</a> - segfault with OSVRTrackerView and openscenegraph git master</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99633">Bug 99633</a> - rasterizer/core/clip.h:279:49: error: ‘const struct API_STATE’ has no member named ‘linkageCount’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99637">Bug 99637</a> - VLC video has corrupted colors when using VDPAU output on Radeon SI</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68504">Bug 68504</a> - 9.2-rc1 workaround for clover build failure on ppc/altivec: cannot convert 'bool' to '__vector(4) __bool int' in return</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97988">Bug 97988</a> - [radeonsi] playing back videos with VDPAU exhibits deinterlacing/anti-aliasing issues not visible with VA-API</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99484">Bug 99484</a> - Crusader Kings 2 - Loading bars, siege bars, morale bars, etc. do not render correctly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99715">Bug 99715</a> - Don't print: "Note: Buggy applications may crash, if they do please report to vendor"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100049">Bug 100049</a> - "ralloc: Make sure ralloc() allocations match malloc()'s alignment." causes seg fault in 32bit build</li>
</ul>
<h2>Changes</h2>
<p>Alex Smith (3):</p>
<ul>
<li>radv: Emit pending flushes before executing a secondary command buffer</li>
<li>radv: Flush before copying with PKT3_WRITE_DATA in CmdUpdateBuffer</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99246">Bug 99246</a> - [d3dadapter+radeonsi & bisect] EVE-Online : hang on wormhole sight</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100061">Bug 100061</a> - LODQ instruction generated with invalid dst mask</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100182">Bug 100182</a> - Flickering in The Talos Principle on Sky Lake GT4.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100201">Bug 100201</a> - Windows scons build with MSVC toolchain and LLVM 4.0 fails</li>
</ul>
<h2>Changes</h2>
<p>Alex Deucher (1):</p>
<ul>
<li>radeonsi: add new polaris12 pci id</li>
</ul>
<p>Andres Gomez (5):</p>
<ul>
<li>glsl: on UBO/SSBOs link error reset the number of active blocks to 0</li>
<li>cherry-ignore: add the Invalidate L2 for TRANSFER_WRITE barriers fix</li>
<li>cherry-ignore: add the Flush after unmap in gbm/dri fix</li>
<li>cherry-ignore: corrected typo in the Flush after unmap in gbm/dri fix</li>
<li>Update version to 17.0.3</li>
</ul>
<p>Axel Davy (2):</p>
<ul>
<li>st/nine: Resolve deadlock in surface/volume dtors when using csmt</li>
<li>st/nine: Use atomics for available_texture_mem</li>
</ul>
<p>Bas Nieuwenhuizen (1):</p>
<ul>
<li>radv: flush DB cache before and after HTILE decompress.</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>radv: fix primitive reset index emission</li>
</ul>
<p>Emil Velikov (1):</p>
<ul>
<li>docs: add sha256 checksums for 17.0.2</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>st/mesa: set result writemask based on ir type</li>
</ul>
<p>Jan Vesely (1):</p>
<ul>
<li>clover: use pipe_resource references</li>
</ul>
<p>Jason Ekstrand (9):</p>
<ul>
<li>anv/query: Invalidate the correct range</li>
<li>anv/GetQueryPoolResults: Actually implement the spec</li>
<li>anv/image: Return early when unbinding an image</li>
<li>anv/query: Fix the location of timestamp availability</li>
<li>anv: Make anv_get_layerCount a macro</li>
<li>anv/blorp: Use anv_get_layerCount everywhere</li>
<li>anv/cmd_buffer: Apply flush operations prior to executing secondaries</li>
<li>anv/cmd_buffer: Fix bad indentation</li>
<li>anv: Flush caches prior to PIPELINE_SELECT on all gens</li>
</ul>
<p>José Fonseca (1):</p>
<ul>
<li>c11/threads: Include thr/xtimec.h for xtime definition when building with MSVC.</li>
</ul>
<p>Juan A. Suarez Romero (1):</p>
<ul>
<li>tests/cache_test: allow crossing mount points</li>
</ul>
<p>Karol Herbst (1):</p>
<ul>
<li>nvc0/ir: treat FMA like MAD for operand propagation</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965: Fall back to GL 4.2/4.3 on Haswell if the kernel isn't new enough.</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>radeonsi: don't hang on shader compile failure</li>
</ul>
<p>Matt Turner (1):</p>
<ul>
<li>i965/fs: Don't emit SEL instructions for type-converting MOVs.</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>intel: Correct the BDW surface state size</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>mesa/main: fix MultiDrawElements[BaseVertex] validation of primcount</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97524">Bug 97524</a> - Samplers referring to the same texture unit with different types should raise GL_INVALID_OPERATION</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (16):</p>
<ul>
<li>cherry-ignore: Add the pci_id into the shader cache UUID</li>
<li>cherry-ignore: fix crash if ctx torn down with no rendering</li>
<li>cherry-ignore: Fix typos.</li>
<li>cherry-ignore: Revert "etnaviv: Cannot render to rb-swapped formats"</li>
<li>cherry-ignore: Revert "i965/fs: Don't emit SEL instructions for type-converting MOVs."</li>
<li>cherry-ignore: fix typo in a2b10g10r10 fast clear calculation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100925">Bug 100925</a> - [HSW/BSW/BDW/SKL] Google Earth is not resolving all the details in the map correctly</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (1):</p>
<ul>
<li>docs: add sha256 checksums for 17.0.6</li>
</ul>
<p>Bartosz Tomczyk (1):</p>
<ul>
<li>mesa: Avoid leaking surface in st_renderbuffer_delete</li>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>OpenGL 4.2 on i965/ivb</li>
<li>GL_ARB_gpu_shader_fp64 on i965/ivybridge</li>
<li>GL_ARB_gpu_shader_int64 on i965/gen8+, nvc0, radeonsi, softpipe, llvmpipe</li>
<li>GL_ARB_shader_ballot on nvc0, radeonsi</li>
<li>GL_ARB_shader_clock on nv50, nvc0, radeonsi</li>
<li>GL_ARB_shader_group_vote on radeonsi</li>
<li>GL_ARB_shader_precision on i965/ivb</li>
<li>GL_ARB_shader_viewport_layer_array on radeonsi</li>
<li>GL_ARB_sparse_buffer on radeonsi/CIK+</li>
<li>GL_ARB_transform_feedback2 on i965/gen6</li>
<li>GL_ARB_transform_feedback_overflow_query on i965/gen6+</li>
<li>GL_ARB_vertex_attrib_64bit on i965/ivb</li>
<li>GL_NV_fill_rectangle on nvc0</li>
<li>Geometry shaders enabled on swr</li>
</ul>
<h2>Bug fixes</h2>
<ul>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68504">Bug 68504</a> - 9.2-rc1 workaround for clover build failure on ppc/altivec: cannot convert 'bool' to '__vector(4) __bool int' in return</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84325">Bug 84325</a> - X.Org segfaults when starting DE on an Intel+Radeon laptop, caused by libpciaccess cleanup, patch attached</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93089">Bug 93089</a> - mesa fails to check for gcc atomic primitives before using them</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=95460">Bug 95460</a> - Please add more drivers (freedreno, virgl) to features.txt status document</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97338">Bug 97338</a> - Black squares in the Spec Ops: The Line chapter select screen</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97524">Bug 97524</a> - Samplers referring to the same texture unit with different types should raise GL_INVALID_OPERATION</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97988">Bug 97988</a> - [radeonsi] playing back videos with VDPAU exhibits deinterlacing/anti-aliasing issues not visible with VA-API</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98263">Bug 98263</a> - [radv] The Talos Principle fails to launch with "Fatal error: Cannot set display mode."</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98428">Bug 98428</a> - Undefined non-weak-symbol in dri-drivers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98502">Bug 98502</a> - Delay when starting firefox, thunderbird or chromium and dmesg spam</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98869">Bug 98869</a> - Electronic Super Joy graphic artefacts (regression,bisected)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99010">Bug 99010</a> - --disable-gallium-llvm no longer recognized</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99246">Bug 99246</a> - [d3dadapter+radeonsi & bisect] EVE-Online : hang on wormhole sight</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99450">Bug 99450</a> - [amdgpu] Payday 2 visual glitches on some models</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99451">Bug 99451</a> - polygon offset use after free</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99456">Bug 99456</a> - Firefox crashing when opening about:support with WebGL2 enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99465">Bug 99465</a> - vtn_vector_construct writing out of bounds when given multiple non-zero length sources</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99484">Bug 99484</a> - Crusader Kings 2 - Loading bars, siege bars, morale bars, etc. do not render correctly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99532">Bug 99532</a> - Compute shader doesn't give right result under some circumstances</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99542">Bug 99542</a> - vdpau logging errors since gallium/radeon: adjust the rule for using the LINEAR_ALIGNED layout</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99631">Bug 99631</a> - segfault with OSVRTrackerView and openscenegraph git master</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99633">Bug 99633</a> - rasterizer/core/clip.h:279:49: error: ‘const struct API_STATE’ has no member named ‘linkageCount’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99660">Bug 99660</a> - Not all of the int64 conversion opcodes got implemented</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99677">Bug 99677</a> - heap-use-after-free in glsl</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99692">Bug 99692</a> - [radv] Mostly broken on Hawaii PRO/CIK ASICs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99701">Bug 99701</a> - loader.c:353:8: error: implicit declaration of function 'geteuid' is invalid in C99 [-Werror,-Wimplicit-function-declaration]</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99715">Bug 99715</a> - Don't print: "Note: Buggy applications may crash, if they do please report to vendor"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99789">Bug 99789</a> - Memory leak on failure to create an ir_constant in calculate_iterations in loop_controls.cpp</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99842">Bug 99842</a> - GL_ARB_transform_feedback2 on i965 gen6</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99850">Bug 99850</a> - Tessellation bug on Carrizo</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99918">Bug 99918</a> - disk_cache.h:57:20: error: no member named 'st_mtim' in 'struct stat'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99953">Bug 99953</a> - device9.c:122:49: error: ‘PIPE_CAP_USER_INDEX_BUFFERS’ undeclared (first use in this function)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99955">Bug 99955</a> - [r600g] GPU load always displayed at 100% with GALLIUM_HUD=GPU-load</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100049">Bug 100049</a> - "ralloc: Make sure ralloc() allocations match malloc()'s alignment." causes seg fault in 32bit build</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100060">Bug 100060</a> - wsi/wsi_common_wayland.c:25:41: fatal error: wayland-drm-client-protocol.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100061">Bug 100061</a> - LODQ instruction generated with invalid dst mask</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100180">Bug 100180</a> - Build failure in GNOME Continuous</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100182">Bug 100182</a> - Flickering in The Talos Principle on Sky Lake GT4.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100201">Bug 100201</a> - Windows scons build with MSVC toolchain and LLVM 4.0 fails</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100223">Bug 100223</a> - marshal_generated.c:38:10: fatal error: 'X11/Xlib-xcb.h' file not found</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100236">Bug 100236</a> - Undefined symbols for architecture x86_64: "typeinfo for llvm::RTDyldMemoryManager"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100259">Bug 100259</a> - [EGL] [GBM] undefined reference to `gbm_bo_create_with_modifiers'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100288">Bug 100288</a> - clover unable to run OpenCL kernels since 03127bb radeonsi: compile all TGSI compute shaders asynchronously</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100303">Bug 100303</a> - Adding a single, meaningless if-else to a shader source leads to different image</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100531">Bug 100531</a> - [regression] Broken graphics in several games</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100562">Bug 100562</a> - u_debug_stack.c:59: undefined reference to `_Ux86_64_getcontext'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100569">Bug 100569</a> - core/resource.cpp:36:33: error: non-constant-expression cannot be narrowed from type 'int' to 'int16_t' (aka 'short') in initializer list [-Wc++11-narrowing]</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100574">Bug 100574</a> - anv_device.c:189: undefined reference to `anv_gem_supports_48b_addresses'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100854">Bug 100854</a> - YUV to RGB Color Space Conversion result is not precise</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100925">Bug 100925</a> - [HSW/BSW/BDW/SKL] Google Earth is not resolving all the details in the map correctly</li>
</ul>
<h2>Changes</h2>
<p>Alex Deucher (1):</p>
<ul>
<li>radeonsi: add new vega10 pci ids</li>
</ul>
<p>Andres Gomez (2):</p>
<ul>
<li>bin/get-fixes-pick-list.sh: don't warn if more than one, go over them</li>
<li>bin/get-fixes-pick-list.sh: bring back the warning</li>
</ul>
<p>Bruce Cherniak (1):</p>
<ul>
<li>swr: move msaa resolve to generalized StoreTile</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102844">Bug 102844</a> - memory leak with glDeleteProgram for shader program type GL_COMPUTE_SHADER</li>
</ul>
<h2>Changes</h2>
<p>Alexandre Demers (1):</p>
<ul>
<li>osmesa: link with libunwind if enabled (v2)</li>
</ul>
<p>Andres Gomez (12):</p>
<ul>
<li>docs: add sha256 checksums for 17.1.9</li>
<li>cherry-ignore: add "st/mesa: skip draw calls with pipe_draw_info::count == 0"</li>
<li>cherry-ignore: add "radv: use amdgpu_bo_va_op_raw."</li>
<li>cherry-ignore: add "radv: use simpler indirect packet 3 if possible."</li>
<li>cherry-ignore: add "radeonsi: don't always apply the PrimID instancing bug workaround on SI"</li>
<li>cherry-ignore: add "intel/eu/validate: Look up types on demand in execution_type()"</li>
<li>cherry-ignore: add "radv: gfx9 fixes"</li>
<li>cherry-ignore: add "radv/gfx9: set mip0-depth correctly for 2d arrays/3d images"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=77240">Bug 77240</a> - khrplatform.h not installed if EGL is disabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=95530">Bug 95530</a> - Stellaris - colored overlay of sectors doesn't render on i965</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=96958">Bug 96958</a> - [SKL] Improper rendering in Europa Universalis IV</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99467">Bug 99467</a> - [radv] DOOM 2016 + wine. Green screen everywhere (but can be started)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101071">Bug 101071</a> - compiling glsl fails with undefined reference to `pthread_create'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101252">Bug 101252</a> - eglGetDisplay() is not thread safe</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101294">Bug 101294</a> - radeonsi minecraft forge splash freeze since 17.1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100242">Bug 100242</a> - radeon buffer allocation failure during startup of Factorio</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101657">Bug 101657</a> - strtod.c:32:10: fatal error: xlocale.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101666">Bug 101666</a> - bitfieldExtract is marked as a built-in function on OpenGL ES 3.0, but was added in OpenGL ES 3.1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101703">Bug 101703</a> - No stencil buffer allocated when requested by GLUT</li>
</ul>
<h2>Changes</h2>
<p>Aaron Watry (1):</p>
<ul>
<li>radeon/winsys: Limit max allocation size to 70% of VRAM</li>
</ul>
<p>Aleksander Morgado (2):</p>
<ul>
<li>etnaviv: fix refcnt initialization in etna_screen</li>
<li>etnaviv: don't dereference etna_resource pointer if allocation fails</li>
</ul>
<p>Alex Smith (2):</p>
<ul>
<li>ac/nir: Use correct LLVM intrinsics for atomic ops on imageBuffers</li>
<li>ac/nir: Fix ordering of parameters for image atomic cmpswap intrinsics</li>
</ul>
<p>Andres Gomez (3):</p>
<ul>
<li>docs: add sha256 checksums for 17.1.4</li>
<li>cherry-ignore: i965: Fix anisotropic filtering for mag filter</li>
<li>Update version to 17.1.5</li>
</ul>
<p>Anuj Phogat (2):</p>
<ul>
<li>intel/isl: Use uint64_t to store total surface size</li>
<li>intel/isl: Add the maximum surface size limit</li>
</ul>
<p>Brian Paul (3):</p>
<ul>
<li>draw: check for line_width != 1.0f in validate_pipeline()</li>
<li>svga: clamp device line width to at least 1 to fix HWv8 line stippling</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97957">Bug 97957</a> - Awful screen tearing in a separate X server with DRI3</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101683">Bug 101683</a> - Some games hang while loading when compositing is shut off or absent</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101867">Bug 101867</a> - Launch options window renders black in Feral Games in current Mesa trunk</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101334">Bug 101334</a> - AMD SI cards: Some vulkan apps freeze the system</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101766">Bug 101766</a> - Assertion `!"invalid type"' failed when constant expression involves literal of different type</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102024">Bug 102024</a> - FORMAT_FEATURE_SAMPLED_IMAGE_BIT not supported for D16_UNORM and D32_SFLOAT</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102148">Bug 102148</a> - Crash when running qopenglwidget example on mesa llvmpipe win32</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=77240">Bug 77240</a> - khrplatform.h not installed if EGL is disabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=95530">Bug 95530</a> - Stellaris - colored overlay of sectors doesn't render on i965</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=96449">Bug 96449</a> - Dying Light reports OpenGL version 3.0 with mesa-git</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=96958">Bug 96958</a> - [SKL] Improper rendering in Europa Universalis IV</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97524">Bug 97524</a> - Samplers referring to the same texture unit with different types should raise GL_INVALID_OPERATION</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97957">Bug 97957</a> - Awful screen tearing in a separate X server with DRI3</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98238">Bug 98238</a> - Witcher 2: objects are black when changing lod on Radeon Pitcairn</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=98428">Bug 98428</a> - Undefined non-weak-symbol in dri-drivers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99467">Bug 99467</a> - [radv] DOOM 2016 + wine. Green screen everywhere (but can be started)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100925">Bug 100925</a> - [HSW/BSW/BDW/SKL] Google Earth is not resolving all the details in the map correctly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100937">Bug 100937</a> - Mesa fails to build with GCC 4.8</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100945">Bug 100945</a> - Build failure in GNOME Continuous</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100988">Bug 100988</a> - glXGetCurrentDisplay() no longer works for FakeGLX contexts?</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101071">Bug 101071</a> - compiling glsl fails with undefined reference to `pthread_create'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101088">Bug 101088</a> - `gallium: remove pipe_index_buffer and set_index_buffer` causes glitches and crash in gallium nine</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101110">Bug 101110</a> - Build failure in GNOME Continuous</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101189">Bug 101189</a> - Latest git fails to compile with radeon</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101252">Bug 101252</a> - eglGetDisplay() is not thread safe</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101254">Bug 101254</a> - VDPAU videos don't start playing with r600 gallium driver</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101340">Bug 101340</a> - i915_surface.c:108:4: error: too few arguments to function ‘util_blitter_default_src_texture’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101360">Bug 101360</a> - Assertion failure comparing result of ballotARB</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101401">Bug 101401</a> - [REGRESSION][BISECTED] GDM fails to start after 8ec4975cd83365c791a1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101418">Bug 101418</a> - Build failure in GNOME Continuous</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101538">Bug 101538</a> - From "Use isl for hiz layouts" commit onwards, everything crashes with Mesa</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101539">Bug 101539</a> - [Regresion] [IVB] Segment fault in recent commit in intel_miptree_level_has_hiz under Ivy bridge</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101558">Bug 101558</a> - [regression][bisected] MPV playing video via opengl "randomly" results in only part of the window / screen being rendered with Mesa GIT.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101596">Bug 101596</a> - Blender renders black UI elements</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101607">Bug 101607</a> - Regression in anisotropic filtering from "i965: Convert fs sampler state to use genxml"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101657">Bug 101657</a> - strtod.c:32:10: fatal error: xlocale.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101666">Bug 101666</a> - bitfieldExtract is marked as a built-in function on OpenGL ES 3.0, but was added in OpenGL ES 3.1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101683">Bug 101683</a> - Some games hang while loading when compositing is shut off or absent</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101703">Bug 101703</a> - No stencil buffer allocated when requested by GLUT</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101704">Bug 101704</a> - [regression][bisected] glReadPixels() from pbuffer failing in Android CTS camera tests</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101766">Bug 101766</a> - Assertion `!"invalid type"' failed when constant expression involves literal of different type</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101774">Bug 101774</a> - gen_clflush.h:37:7: error: implicit declaration of function ‘__builtin_ia32_clflush’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101775">Bug 101775</a> - Xorg segfault since 147d7fb "st/mesa: add a winsys buffers list in st_context"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101829">Bug 101829</a> - read-after-free in st_framebuffer_validate</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101831">Bug 101831</a> - Build failure in GNOME Continuous</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101851">Bug 101851</a> - [regression] libEGL_common.a undefined reference to '__gxx_personality_v0'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101867">Bug 101867</a> - Launch options window renders black in Feral Games in current Mesa trunk</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101876">Bug 101876</a> - SIGSEGV when launching Steam</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102024">Bug 102024</a> - FORMAT_FEATURE_SAMPLED_IMAGE_BIT not supported for D16_UNORM and D32_SFLOAT</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102148">Bug 102148</a> - Crash when running qopenglwidget example on mesa llvmpipe win32</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102573">Bug 102573</a> - fails to build on armel</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102844">Bug 102844</a> - memory leak with glDeleteProgram for shader program type GL_COMPUTE_SHADER</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102847">Bug 102847</a> - swr fail to build with llvm-5.0.0</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102904">Bug 102904</a> - piglit and gl45 cts linker tests regressed</li>
</ul>
<h2>Changes</h2>
<p>Alexandru-Liviu Prodea (1):</p>
<ul>
<li>Scons: Add LLVM 5.0 support</li>
</ul>
<p>Bas Nieuwenhuizen (1):</p>
<ul>
<li>radv: Check for GFX9 for 1D arrays in image_size intrinsic.</li>
</ul>
<p>Boris Brezillon (1):</p>
<ul>
<li>broadcom/vc4: Fix infinite retry in vc4_bo_alloc()</li>
</ul>
<p>Dave Airlie (3):</p>
<ul>
<li>radv/nir: call opt_remove_phis after trivial continues.</li>
<li>ac/surface: handle S8 on gfx9</li>
<li>st/glsl->tgsi: fix u64 to bool comparisons.</li>
</ul>
<p>David Airlie (1):</p>
<ul>
<li>radv: add gfx9 scissor workaround</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: add sha256 checksums for 17.2.1</li>
<li>automake: enable libunwind in `make distcheck'</li>
</ul>
<p>Eric Anholt (4):</p>
<ul>
<li>broadcom/vc4: Fix use-after-free for flushing when writing to a texture.</li>
<li>broadcom/vc4: Fix use-after-free trying to mix a quad and tile clear.</li>
<li>broadcom/vc4: Fix use-after-free when deleting a program.</li>
<li>broadcom/vc4: Keep pipe_sampler_view->texture matching the original texture.</li>
</ul>
<p>Gert Wollny (2):</p>
<ul>
<li>travis: force llvm-3.3 for "make Gallium ST Other"</li>
<li>travis: Add libunwind-dev to gallium/make builds</li>
</ul>
<p>Grazvydas Ignotas (1):</p>
<ul>
<li>configure: check if -latomic is needed for __atomic_*</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>nv20: Fix GL_CLAMP</li>
</ul>
<p>Jason Ekstrand (6):</p>
<ul>
<li>i965/blorp: Set r8stencil_needs_update when writing stencil</li>
<li>vulkan/wsi/wayland: Stop printing out the DRM device</li>
People who are concerned with stability and reliability should stick
with a previous release or wait for Mesa 17.3.1.
</p>
<p>
Mesa 17.3.0 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation
because compatibility contexts are not supported.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD.
</pre>
<h2>New features</h2>
<p>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>libtxc_dxtn is now integrated into Mesa. GL_EXT_texture_compression_s3tc and GL_ANGLE_texture_compression_dxt are now always enabled on drivers that support them</li>
<li>GL_ARB_indirect_parameters on i965/gen7+</li>
<li>GL_ARB_polygon_offset_clamp on i965, nv50, nvc0, r600, radeonsi, llvmpipe, swr</li>
<li>GL_ARB_transform_feedback_overflow_query on radeonsi</li>
<li>GL_ARB_texture_filter_anisotropic on i965, nv50, nvc0, r600, radeonsi</li>
@@ -226,7 +226,7 @@ did not exist in the 7.10 release series at all.</p>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36086">Bug 36086</a> - [wine] Segfault r300_resource_copy_region with some wine apps and RADEON_HYPERZ</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36182">Bug 36182</a> - Game Trine from http://www.humblebundle.com/ needs ATI_draw_buffers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36182">Bug 36182</a> - Game Trine from https://www.humblebundle.com/ needs ATI_draw_buffers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36268">Bug 36268</a> - [r300g, bisected] minor flickering in Unigine Sanctuary</li>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.