Never used. The GLSL compiler doesn't even look at EmitNoFunctions.
v2: add back "return" support in "main"
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
I noticed that glsl_to_tgsi_instruction is too huge.
sizeof(glsl_to_tgsi_instruction): 752 -> 464 (-38%)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Earlier commit replaced the default platform specific libglapi.so name
with an #error.
This may have been overzealous since the name is the correct for the BSD
platforms, at least. Reinstate the hunk - bringing back OpenBSD, et al.
to a successful build state.
Fixes: 7a9c92d071 ("egl/dri2: non-shared glapi cleanups")
[Emil Velikov: format the patch from Eric, add commit message and tag.]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Before we can read the fmask using the compute shader, we need
to decompress the fmask in place.
This fixes a bunch of remaining failure and hopefully multisampling
in Talos.
This adds some comments and adds defines for the user sgprs,
so that we can move them around easier later and not have
to change/revalidate every one of these.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This drops all the radv WSI code in favour of using
the new shared code that was ported from anv
This regresses Talos for now, Jason has pointed out
the bug is in Talos and we should wait for them to fix it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This moves the shared code to a common subdirectory
and makes anv linked to that code instead of the copy
it was using.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The per-element fetch has quite some calculations which are constant,
these can be moved outside both the per-element as well as the main
shader loop (llvm can figure out it's constant mostly on its own, however
this can have a significant compile time cost).
Similarly, it looks easier swapping the fetch loops (outer loop per attrib,
inner loop filling up the per vertex elements - this way the aos->soa
conversion also can be done per attrib and not just at the end though again
this doesn't really make much of a difference in the generated code). (This
would also make it possible to vectorize the calculations leading to the
fetches.)
There's also some minimal change simplifying the overflow math slightly.
All in all, the generated code seems to look slightly simpler (depending
on the actual vs), but more importantly I've seen a significant reduction
in compile times for some vs (albeit with old (3.3) llvm version, and the
time reduction is only really for the optimizations run on the IR).
v2: adapt to other draw change.
No changes with piglit.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previous attempts to zero initialize all inputs were not really optimal
(though no performance impact was measurable). In fact this is not really
necessary, since we know the max number of inputs used.
Instead, just generate fetch for up to max inputs used by the shader,
directly replacing inputs for which there was no vertex element by zero.
This also cleans up key generation, which previously would have stored
some garbage for these elements.
And also drop the assertion which indicates such bogus usage by a
debug_printf (the whole point of initializing the undefined inputs was to
make this case safe to handle).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Compilation to actual machine code can easily take as much time as the
optimization passes on the IR if not more, so print this out too.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
For the texturing packs, things looked pretty terrible. For every
lerp, we were repacking the values, and while those look sort of cheap
with 128bit, with 256bit we end up with 2 of them instead of just 1 but
worse, plus 2 extracts too (the unpack, however, works fine with a
single instruction, albeit only with llvm 3.8 - the vpmovzxbw).
Ideally we'd use more clever pack for llvmpipe backend conversion too
since we actually use the "wrong" shuffle (which is more work) when doing
the fs twiddle just so we end up with the wrong order for being able to
do native pack when converting from 2x8f -> 1x16b. But this requires some
refactoring, since the untwiddle is separate from conversion.
This is only used for avx2 256bit pack/unpack for now.
Improves openarena scores by 8% or so, though overall it's still pretty
disappointing how much faster 256bit vectors are even with avx2 (or
rather, aren't...). And, of course, eliminating the needless
packs/unpacks in the first place would eliminate most of that advantage
(not quite all) from this patch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This moves to the shared vk_alloc inlines for vulkan
memory allocations.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves all the alloc/free in anv to the generic helpers.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
vulkan allocation allows for overriding the allocator used,
add some macros for anv/radv to share for this.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This removes the vector code from radv in favour of sharing
code with anv.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just removes the anv vector code and uses the new helper.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
During dual instance encoding submission, if the second encode task and first
encode task have no reference dependency, e.g. p following with idr-frame,
there is a chance the second task will use for its reconstructed picture
buffer the same buffer used by first task for its reference/reconstructed
picture. In this case, buffer corruption may occur depending on encoding
speed. Fix is to force flush these two tasks separately to avoid race condition
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98005
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Since commit 63c5d5c6c4, the surfaceless
platform has allowed creation of pbuffer surfaces. But the vtable entry
for eglSwapBuffers has remained NULL.
Discovered by running a little pbuffer test.
Cc: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Add mesa_glinterop.h to the list of headers that will get included
in the distfile as it is required to build Mesa itself.
Corrects a regression introduced in a89faa2022.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
% pattern rules are a GNU extension. Convert the use of one to a
inference rule to allow this to build on OpenBSD.
This is a related change to the one made in
e3d43dc5ea
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Rather than having 4-5 places which do the explicit check/message just
polish the gallium helper and use it everywhere.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Previously it was used to differentiate between the different codepaths
in the loader. Although strictly speaking the (core) of the loader is
only used when a hardware device is available. The latter of which in
itself requires libdrm (one of the codepaths available).
That said, all the configure toggles which relate to enabling/using hw
device should attribute and require libdrm, so there's no need to keep
this code around.
With this gallium_require_drm_loader becomes an empty stub, so nuke that
one as well.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
With previous patches nearly all the original code (as seen in the
various loaders) is gone.
Update the copyright/license section to reflect that.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reminiscent from the pre-loader days, were we had multiple instances of
the loader logic in separate places and one could build a "GALLIUM_ONLY"
version.
Since that is no longer the case and the loaders (glx/egl/gbm) do not
(and should not) require to know any classic/gallium specific we can
drop the argument and the related code.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Effectively everyone with actual hardware and/or requesting the
"device_name" requires a working libdrm. Thus they could/should already
be using the (now only) codepath.
Apart from the code simplification, we can slim down our configure.ac
even further. But that will be done in separate patch(es).
Cc: Gary Wong <gtw@gnu.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The gbm_device_get_backend_name() provides an (somewhat) internal name
of the implementation/backend used. Is has nothing to do with the udev,
one cannot and should not attempt to derive the name from it.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Currently not everyone has libudev and with follow-up patches we'll
completely remove the divergent codepaths.
Use the libdrm drm device API to construct the required ID_PATH_TAG-like
string, to preserve the current functionality for libudev users and
allow others to benefit from it as well.
v2: Drop ranty comments, pick the correct device
v3: \n -> \0 in PCI_ID_PATH_TAG_LENGTH comment (Axel).
v4: Use snprintf (Nicolai)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Currently mesa has three code paths in the loader - libudev, manual
sysfs and drm ioctl one.
Considering the issues we had with libudev - strip those down in favour
of the libdrm drm device API. The latter can be implemented in any way
depending on the platform and can be reused by others.
v2: Use correct message on drmGetDevice failure. (Nicolai)
Cc: Jonathan Gray <jsg@jsg.id.au>
Cc: Jean-Sébastien Pédron <dumbbell@FreeBSD.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
With follow on work, we'll untangle and simplify all the different
codepaths in loader. Then again, we forget to set have_pci_id when
libdrm is present (one of the codepaths available).
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The indirect handle has to come right after the coordinates, so if there
was a sample/bias/depth compare/offset, everything would end up being
shifted by one argument position.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
As specified in va.h, default value should be set on attributes
not present in the input list.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Mark Thompson <sw@jkqxz.net>
dEQP-GLES31.functional.geometry_shading.instanced.geometry_1_invocations
draws using a geometry shader that specifies
layout(points, invocations = 1) in;
and then uses gl_InvocationID. According to the Haswell PRM, the
"GS Instance ID 0" (and 1) thread payload fields are undefined in
dual object mode:
"If 'dispatch mode' is DUAL_OBJECT this field is not valid."
But there's no point in using them - if there's only one invocation,
the ID will be 0. So just load a constant.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
All code that would have once called this can now call the gen-specific
version. The switching version is no longer needed.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
It really should have gone here all along. We were trying a bit too hard
to make it gen-agnostic just because it didn't have any #if's.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
With one small genxml change, the two versions were basically identical.
The only differences were one #define for HSW+ and a field that is missing
on Haswell but exists everywhere else.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
vkBeginCommandBuffer and vkCmdExecuteCommands both call into the
gen-specific emit_state_base_address function and vkEndCommandBuffer
belongs with begin.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This has two primary advantages. First, it means that the batch_chain code
knows less about the actual command buffer contents which is good because
improves separation. Second, it means that it only gets re-emitted once
after all of the secondaries instead of once after each secondary which is
just wasteful. It also has the advantage of cleaning the code up a bit.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
V2. add i965/hsw+ to list
V3. rebased on master.
V4. 'DONE' -> 'DONE ()'.
V5. remove i965/hsw+ from list :/
Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Once upon a time, this was used to extract prototypes from the shader
containing GLSL built-in functions. This was removed by f5692f45 in
November 2010 for Mesa 7.10.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
This makes the stream of commands a bit easier to read.
v2 (Ken): Use bold text on green headers for easier readability;
swap the green and blue headers so the majority stay blue.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The corresponding opcodes for integers need to be treated the same as F2D.
Fixes GL45-CTS.gpu_shader_fp64.conversions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When more than one atomic counter buffer is in use, UniformStorage[n].opaque
is set up to contain indices that are contiguous across all used buffers.
This appears to be used by i965 via NIR, but for TGSI we do not treat atomic
counter buffers as opaque, so using the data in the opaque array is incorrect.
Fixes GL45-CTS.compute_shader.resource-atomic-counter.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Properly handle the case where there is a gap in the assigned output locations,
e.g. a fragment shader writes to color buffer 2 but not to color buffers 0 & 1.
Fixes GL45-CTS.gtf33.GL3Tests.explicit_attrib_location.explicit_attrib_location_pipeline.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Generally, we only check for the presence of compute shaders during
parsing when we find any language (like layout qualifiers) that are
specific to compute shaders, however, it is possible to define an
empty compute shader does not use any language specific to compute
shaders at all and we should fail the compilation anyway. dEQP checks
this.
This patch adds a check for compute shader availability after we have
parsed the source code. At this point we know the effective GLSL version
and also extensions enabled in the shader.
Fixes a subcase of the following dEQP tests:
dEQP-GLES31.functional.debug.negative_coverage.callbacks.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.get_error.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.log.shader.compile_compute_shader
The tests still fail because there is one more subcase that fails that needs
another fix.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This was some kind of leftover in commit acd35c8 and format_count
array variable (declared in outer scope) should be used instead.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Fixes: acd35c8758 ("egl/android: tweak droid_add_configs_for_visuals()")
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Vulkan doesn't set these fields even though it doesn't use HiS.
HiS is disabled by programming DB_SRESULTS_COMPARE_STATEn to 0.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This has apparently never existed in GLSL ES.
Fixes dEQP-GLES3.functional.shaders.texture_functions.invalid
.textureoffset_sampler2darrayshadow_vec4_ivec2_vertex and
.textureoffset_sampler2darrayshadow_vec4_ivec2_fragment
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98244
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
DEQP's clear tests like to give us x + w < 0 or y + h < 0. Since we
were comparing to an unsigned, it would get promoted to unsigned and come
out as bignum >= width or height and we would clear the whole fb instead
of none of the fb.
Fixes 10 tests under deqp-gles2/functional/color_clear.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When !use_ib_bos, we can't easily chain ibs one to another. If the
required cs size grows over 1Mi - 8 dwords just fail the cs so that we
won't assert-fail in radv_amdgpu_winsys_cs_submit later on.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Several conformance tests violate this requirement:
ES31-CTS.core.tessellation_shader.max_patch_vertices
ES31-CTS.core.tessellation_shader.tessellation_control_to_tessellation_evaluation.data_pass_through
I submitted a merge request to fix the conformance tests, but Khronos
opted to drop this GLSL ES specific requirement in favor of making flat
qualification of VS outputs optional, matching modern desktop GL.
Note that there were 7 Piglit tests which enforce this rule:
tests/spec/glsl-es-3.00/compiler/interpolation/qualifiers/*nonflat*
but these were deleted in Piglit commit acc0a2fabbd714bc704c16f1675e7c0.
Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=15465#c7
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
We also remove the redundant zero defaults since everything without an
explicit default gets zeroed automatically.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Split the source immediate value into new values and move them into the
original defs set by the split. Since we can only have up to 64-bit
immediates, this is largely beneficial for F64 (and, in the future, U64)
operations.
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
[imirkin: always use U32, set newi for foldCount tracking]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Everything is in place. There are still conformance issues to sort out,
but we may as well turn it on in master.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Without meta, we no longer need the _init helpers and the ability to back
an image view with surface states allocated out of the command buffer.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Now that meta is gone and we're using blorp, we don't need all of the usage
hacks. Instead, the usage provided by the app is exactly the usage that we
want because the app is the only thing creating image views.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Now that we don't have meta, we have no need for a gen-agnostic pipeline
create path. We can, instead, just generate one Create*Pipelines function
per gen and be done with it.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
In order for things such as the ANV_CALL and the ifuncs to work, we used to
have a singleton gen_device_info structure that got assigned the first time
you create a device. Given that the driver will never be used
simultaneously on two different generations of hardware, this was fairly
safe to do. However, it has caused a few hickups and isn't, in general, a
good plan. Now that the two primary reasons for this singleton are gone,
we can get rid of it and make things quite a bit safer.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
This macro was needed by meta in order to make gen-specific calls from
gen-agnostic code. Now that we don't have meta, the remaining two uses are
fairly trivial to get rid of.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Now that we no longer have meta, all pipelines get created via the normal
Vulkan pipeline creation mechanics. There is no more need for this bit of
extra magic data that we've been passing around.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
If we don't, we can end up with corruption in the portion of the depth
buffer that lies outside the render area when we do a HiZ resolve at the
end. The only reason we weren't seeing this before was that all of the
meta-based clears such as VkCmdClearDepthStencilImage were internally using
HiZ so the HiZ buffer never truly got out-of-sync. If the CTS ever tested
a depth upload (which doesn't care about HiZ) and then a partial render we
would have seen problems. Soon, we will be using blorp to do depth clears
and it won't bother with HiZ so we would get CTS regressions without this.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
When I initially brought up Vulkan blorp, I completely missed that this
was already factored out. There's no good reason for us to hand-roll it.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
In Vulkan, we want to be able to use blorp to perform clears inside of a
render pass. If blorp stomps the depth/stencil buffers packets then we'll
have to re-emit them. This gets tricky when secondary command buffers get
involved. Instead, we'll simply guarantee that the depth and stencil
buffers we pass to blorp (if any) match those already set in the hardware.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This never mattered before because the only time we used blorp
depth/stencil only was to do HiZ operations on gen6-7. It may have worked
in that case (and maybe it didn't) but slow depth clears actually do depth
rendering so they need a valid render target.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This gives a slightly smarter way to check whether or not a particular
surface exists than looking at the address.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This should now set the pipeline up properly for doing depth and/or stencil
clears by plumbing through depth/stencil test values. We are now also
emitting color calculator state for blorp operations without an actual
shader because that is where the stencil reference value goes pre-SKL.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The newly reworked depth/stencil config code can properly handle having
depth, stencil, both, or neither. We no longer need to predicate it on
having depth or stencil.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
While we're here, we also make depth without HiZ work.
v2:
- Use the correct surface type for 1-D on SKL+
- Set QPitch on BDW+
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We want to be able to start doing slow depth clears with blorp. This
allows us to adjust the depth we're clearing to.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Due to conflicting symbol names (between i915 and i965) in the
megadriver, we use a set of defines in i915/intel_screen.h.
With a recent commit we've introduced a symbol intelFenceExtension which
has different implementation for each driver, yet we forgot to add the
define.
Fixes: d11515ff1b ("i915/sync: Implement DRI2_Fence extension")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98264
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Document the EGL enum ranges for Mesa and those values allocated by the
following extensions:
EGL_MESA_drm_image
EGL_MESA_platform_gbm
EGL_MESA_platform_surfaceless
EGL_WL_bind_wayland_display
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Years ago Khronos replaced the registry's spec files with newfangled XML
files. Update the reference in doc/specs/enum.txt accordingly.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
It was the lone file in src/egl/docs. Move it to where the other specs
live, in $MESA_TOP/docs/specs.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Mesa's set of supported platform extensions depends on the autoconf
option --with-egl-platforms=foo,bar,baz. If --with-egl-platforms lacks
foo, then eglGetPlatformDisplay(EGL_PLATFORM_FOO, ...) unconditonally
fails.
So, if --with-egl-platforms lacks foo, then remove
EGL_VENDOR_platform_foo from the EGL client extension string.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
modulefinder wasn't searching for dependencies in the script dir.
It's not capable of detecting the sys.path manipulations scripts do
internally neither.
This change fixes the first issue, and hacks around the second.
Honestly, I've come to the conclusion that automatic Python dependency it will always be
too brittle. I think we should start manually typing the dependencies
like we do in automake. At very least it will enable any person to
eyeball and spot/fix missing dependencies, without dig into SCons internals.
Porting of the corresponding patch for i965.
Here follows the original commit message by Tomasz Figa:
"As the spec allows for {server,client}_wait_sync to be called without
currently bound context, while our implementation requires context
pointer.
v2: Add a mutex and acquire it for the duration of
brw_fence_client_wait() and brw_fence_is_completed() as suggested
by Chad."
NOTE: in i915 all references to 'brw' are replaced by 'intel'
Marshmallow-x86 boots ok with the following results of Android CTS.
Android CTS 6.0_r7 build:2906653
Session Pass Fail Not Executed
0(EGL) 1410 24 0
1(GLES2) 13832 82 0
I get the same results as per i965GM.
[Emil Velikov: Include Mauro's test results]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Here is the porting of corresponding patch for i965,
i.e. commit c636284 i965/sync: Implement DRI2_Fence extension
Here follows part of original commit message by Chad Versace:
"This enables EGL_KHR_fence_sync and EGL_KHR_wait_sync."
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
This is the porting of corresponding patch for i965,
i.e. commit 2516d83 i965/sync: Replace prefix 'intel_sync' -> 'intel_gl_sync'
The only difference compared to i965 one is that intel_check_sync() was renamed
to intel_gl_check_sync() here, as it is more appropriate.
Here follows original commit message by Chad Versace:
"I'm about to implement DRI2_Fenc in intel_syncobj.c. To prevent
madness, we need to prefix functions for GL_ARB_sync with 'gl' and
functions for DRI2_Fence with 'dri'. Otherwise, the file will become
a jumble of similiarly named functions.
For example:
old-name: intel_client_wait_sync()
new-name: intel_gl_client_wait_sync()
soon-to-come: intel_dri_client_wait_sync()
I wrote this renaming commit separately from the commit that implements
DRI2_Fence because I wanted the latter diff to be reviewable."
[Emil Velikov: rename the outstanding intel_sync instances]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
... in dri2_x11_add_configs_for_visuals().
Currently the latter does not consider that, thus in such cases it adds
"empty" configs in the list.
Properly account for things and as we do that we can reuse count,
instead of calling _eglGetArraySize to determine if we've added any
configs.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Analogous to previous commits - with an extra bonus.
Current code, apart from not attributing the lack of 'per visual'
and overall configs also overwrites the newly added config.
Namely if the dpy supports two or more of the supported formats
(XRGB8888, ARGB8888 and RGB565) earlier configs will be overwritten
and the the final one will be stored, since the we use the same index
for all three in our dri2_add_config call.
v2: Use correct comparison in loop conditional (Eric)
Use valid C initializer (Gurchetan)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Analogous to previous commit.
v2: Use correct comparison in loop conditional (Eric)
Use valid C initializer (Gurchetan)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Iterate over the driver_configs first in order to cut down the number of
getConfigAttrib() calls by a factor of 5.
While we're here, also drop the sentinel of the visuals array. We
already know its size so we can use that and save a few bytes.
v2: Use correct comparison in loop conditional (Eric)
Use valid C initializer (Gurchetan)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Factor out and rework the existing code so that it prints a debug
message if we have zero configs for any visual.
As a nice side effect we now provide a correct (sequential ID) when
creating a config (via dri2_add_config).
v2: Use correct comparison in loop conditional (Eric)
Use valid C initializer (Gurchetan)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Currently we print a debug message if the total configs is non-zero only
to do the same (at an error level) as we return from the function.
Rework the message to print if we're missing a config for the given
format.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Introduce a helper and use it throughout the platform code. This allows
us to reduce the amount of ifdef(s) and (potentially) use
kms_swrast_dri.so for !drm platforms (namely wayland and x11).
Note: in the future as other platforms (android, surfaceless) support
the extension they can reuse the helper.
v2: Rebase, check for device_name.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
The core egl/dri2 already sets the extension bit _only_ when possible -
which in Android's case is always.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Will allow us to reuse the function for optional extensions and fold a
bit of code.
v2: Make dri2_bind_extensions::optional flag an argument to
dri2_bind_extensions (Kristian).
Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Consistently indent with space rather than a mix of tab and
spaces.
v2: Keep the structs properly aligned (Eric).
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Remove the error prone fixed size array.
While we're here also rename to loader_extensions like in the GLX code.
v2: Rebase. Keep image_loader_extension within the wayland_drm
dri2_loader_extensions list.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Analogous to earlier commits.
Note: the actual version of the extension is 1, since it does not
implement .putImage2.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Similar to the dri2 one - the extension stored in struct
dri2_egl_display is unused.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Analogous to the earlier android and wayland patches. As we're here we
can drop exposing the old version of the extension.
Any dri loader/driver interface use lower bound checking thus exposing
dri2 loader v3 to a v2 capable driver is perfectly normal.
v2: Preserve compat with dri2_minor < 1. The driver does not know if
there is a protocol to manage getBuffersWithFormat(). It's up-to the
loader to expose the vfunc if there is one. (Kristian)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Earlier commit introduced support for image_loader and left the
dri2_loader code dangling/unused. Let's remove it.
Fixes: 63c5d5c6c4 ("Added pbuffer hooks for surfaceless platform")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
The extension stored in struct dri2_egl_display isn't used, thus we can
create a static const instance of the extension and point extensions[]
to it.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Fold duplicate conditional blocks and add a few extra comments ;-)
v2: Bring back the explicit "unbind" logic (Eric), remove NULL derefs.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
The indirection is meant to be used by the core EGL implementation in
main. Not in the drivers themselves.
Move the dri2_destroy_surface definition to avoid forward declaration of
the static function.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Currently all callers are careful enough not to do that, yet that will
not be the case in the future.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
All the platforms are duplicating what should be a driver/dri2 thing -
refcounting. Just fold it accordingly.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
For a while now we require shared glapi for EGL, thus we can drop a
few bits from the olden days. Namely - dlopen(NULL...) is not possible,
error out at build stage if so and drop the guard around dlclose().
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The documentation is clear - one must glFlush the old context on
eglMakeCurrent. Thus keeping it optional is not something we should be
doing. Furthermore if we cannot get the entry point we're likely having
a broken setup/stack.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Update the comment to reflect the correct filename and add a guard to
catch incorrect inclusion of the header.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
The function gen7_format_needs_valign2 has two callers - the gen7 only
gen7_choose_valign_el() and isl_gen6_filter_tiling(). The latter of
which already guarding the invocation appropriately.
To be extra cautious add a couple of asserts alongside the removal of the
runtime check.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
The rest of ISL already follows this approach. Be consistent and resolve
the final references.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Will allow us to catch issues (as fixed with previous patches) rather
than release a broken tarball.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The letter C was missing, thus in turn all the internal symbols were
exported.
As a result we hide ~150 symbols and cut ~36K from libvulkan_radeon.so.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Private/internal symbols should not be exported. Using the CXXFLAGS cuts
~300 exported symbols and ~23K from libvulkan_radeon.so.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Historically we use "device name" for the name of the kernel module and
"driver name" for the dri/other driver.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Likely unused since day 1, although I've only checked back until the
st/dri unification with commit 29ca7d2c94 ("st/dri: merge dri/drm and
dri/sw backends")
Based on the comment, referencing drmOpenByName it's not something we
want to bring back.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This commit effectively reverts c10dcb2ce8
and fixes the typedef redefinition which inspired it.
In order to prevent requiring X packages at build time earlier commit
forward declared the required X/GLX typedefs. Since that approach
introduced typedef redefinition (a C11 feature) it was reverted.
To avoid the redefinition while _not_ mandating X and related headers
forward declare the structs and use those through the header.
As anyone uses the mesa interop header they ensure that the X (or others
in terms of EGL) headers are included, which ensures that everything is
resolved within the compilation unit.
Cc: Vinson Lee <vlee@freedesktop.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Chih-Wei Huang <cwhuang@android-x86.org>
Fixes: c10dcb2ce8 ("Revert "mesa_glinterop: remove inclusion of GLX
header"")
Fixes: 8472045b16 ("mesa_glinterop: remove inclusion of GLX header")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96770
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Constrained baseline profile is supported, so use that instead. This
matches what the encoder already does (constraint_set1_flag is always
set in the output bitstream).
Reviewed-by: Christian König <christian.koenig@amd.com>
This makes the supported format actually match the configuration, and
allows the user to observe that NV12 is supported for video processing
where previously they couldn't (though it did always work if they
blindly tried to use it anyway).
Reviewed-by: Christian König <christian.koenig@amd.com>
Both YUV420 and RGB32 configurations are supported, so we need to be
able to distinguish which is being used.
Reviewed-by: Christian König <christian.koenig@amd.com>
The encoder attributes are needed for a user of the encoder to be
able to configure it sensibly without internal knowledge.
Reviewed-by: Christian König <christian.koenig@amd.com>
Commit cf804b4455
('glx: fix crash with bad fbconfig') introduced a check
in glXCreateNewContext() if the given config is a valid
fbconfig.
Unfortunately the check always checks the given config against
the fbconfigs of the DefaultScreen(dpy), instead of the
actual X-Screen specified in the config config->screen.
This leads to failure whenever a GL context is created
on a non-DefaultScreen(dpy), e.g., on X-Screen 1 of
a multi-x-screen setup, where the default screen is
typically 0.
Fix this by using config->screen instead of DefaultScreen(dpy).
Tested to fix context creation failure on a dual-x-screen setup.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Work in progress (disabled).
USE_8x2_TILE_BACKEND define in knobs.h enables AVX512 code paths
(emulated on non-AVX512 HW).
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
src2 was being given the wrong modifier, and we were not properly
managing the modifier on the SHL source either.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Rename to gl_vertex_attrib_array::BufferBindingIndex because this field
is an index into the array of buffer binding points. This makes some
code a little easier to follow since there's also a "VertexBinding" field
in gl_vertex_array_object.
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Use a bitmask to indicate which color buffers are integer-valued, rather
than a bool. Also, the old field was mis-computed. If an integer buffer
was followed by a non-integer buffer, the _IntegerColor field was wrongly
set to false.
This fixes the new piglit gl-3.1-mixed-int-float-fbo test.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
None of the drivers which implement this hook do anything with the
texture parameter value. Drivers just look at the pname and set a
dirty flag if needed.
We were doing some ugly casting and type conversion to setup the
argument so that all goes away.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously, the plan was "if the width/height we have to load/store isn't
the size the user is planning on writing, then we need to load the old
contents out beforehand to prevent writing back undefined".
However, when we're doing glTexImage() we often end up aligning the
width/height into the padding of the texture, and we don't actually
need to read out that padding.
Improves x11perf -aatrapezoid100 performance from ~460/sec to
~700/sec.
Regression introduced by
ba0274c7d6
Check the resource exists before assigning it
a flag (and use This->base.resource instead
of pResource, since the former may have a newly
allocate resource, while the latter would be
NULL).
This should reintroduce the behaviour of previous
code.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Since 1604efa6fd,
lconsti and lconstb don't need to be initialized.
Remove some leftovers from the previous code (which
has now invalid use of ARRAY_SIZE on a pointer instead
of an array).
Reported by Coverity.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Use uint64_t instead of int64_t in the calculation,
as the result is uint64_t.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
All ARB_enhanced_layouts piglit tests pass without any changes
in our compiler.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
These were changed in LLVM r284024.
v2:
- Only use float types for vdata of llvm.amdgcn.image.store. LLVM doesn't
support integer types for this intrinsic.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The identity swizzle should operate exactly
like an .r = R, .g = G, .b = B, .a = A swizzle.
This fixes a bunch of the 16-bit BGRA blit tests
dEQP-VK.api.copy_and_blit.blit_image.all_formats.b4g4r4a4*
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fix was found in the radv codebase when running dota2,
no idea if anyone has reported it on anv, but the same problem
occurs.
Once an image is acquired we need to mark it busy.
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
dota2 does multiple acquires followed by multiple queues,
this bug manifested itself as a hang in the xshmfence code
randomly when dota2 was doing it's menus. It also occured
when running dota2 under phoronix-test-suite.
The fix is once the image is acquired to mark it busy then
so nobody else can acquire. We have to trust vulkan apps
that they will eventually submit it.
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The table was copied from the Vulkan driver. The comment lines are as long
as the table for cosmetic reasons.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
so that decompress blits aren't needed and depth texturing needs less
memory bandwidth.
Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible
HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16.
The format promotion is not visible to state trackers.
This is part of TC-compatible renderbuffer compression, which has 3 parts:
DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now.
I don't see a measurable increase in performance though.
(I tested Talos Principle and DiRT: Showdown, the latter is improved by
0.5%, which is almost noise, and it originally used layered Z16,
so at least we know that Z16 promoted to Z32F isn't slower now)
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
For performance tuning in drivers. It filters out window system
framebuffers and OpenGL renderbuffers.
radeonsi will use this to guess whether a depth buffer will be read
by a shader. There is no guarantee about what will actually happen.
This is a departure from PIPE_BIND flags which are defined to be strict
but they are useless in practice.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Whether one or two slots are taken up by one API array depends on the
vertex shader, not on how the array is configured. When an array is
set up with fewer components than the shader expects, the high components
are undefined.
Fixes GL45-CTS.vertex_attrib_binding.basic-inputL-case1.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dave Airlie <airlied@redhat.com>
This fixes a bug with offsets from uniforms which seems to have only been
noticed as a crash in piglit's
arb_gpu_shader5/compiler/builtin-functions/fs-gatherOffset-uniform-offset.frag
on radeonsi.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes GL45-CTS.shader_image_load_store.basic-allTargets-atomic*
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If pPhysicalDevices is too small for all physical devices,
the driver must return VK_INCOMPLETE. Since only a single
physical device is supported, this is only the case when
pPhysicalDeviceCount == 0 && pPhysicalDevices != NULL.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Gallium is completely oblivious to whether the fbo is flipped or not.
Only flip the stipple pattern when the fbo is flipped as well. Otherwise
the driver has no idea when to unflip the pattern.
Fixes bin/gl-2.1-polygon-stipple-fs -fbo
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Recent fix for non-const offsets broke the case of a single offset (vs 4
offsets). The later code relies on the offs array to contain null values
to tell whether they should be added onto the srcs list.
Fixes: 5239bd592 ("nvc0/ir: fix overwriting of value backing non-constant gather offset")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
In order to be able to emit overlapping input and output array
declarations, we flip the logic of emitting those declarations on its
head: rather than iterating over slots and emitting the corresponding
declarations, we iterate over the declarations from GLSL and emit those.
v2: fix some regressions related to structs
v3: fix a regression in geometry and tessellation shader array handling
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v2)
Reviewed-by: Dave Airlie <airlied@redhat.com> (v2)
In some cases, a shader may have an input/output array but not use some
entries in the middle. This happens with eON games, for example.
We emit declarations that cover the entire array range even if there are
some unused gaps. This patch now reflects that in the InputsRead etc.
fields to ensure the various input/outputMapping arrays are actually
correct, which will be important when we re-jiggle the way declarations
are emitted.
v2: fix a typo (Edward O'Callaghan)
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This optimization is incorrect with 64-bit operations, because the
channel-splitting logic in emit_asm ends up being applied twice to
the source operands.
A lucky coincidence of how the writemask test works resulted in this
optimization basically never being applied anyway. As far as I can tell,
the only case where it would (incorrectly) have been applied is something
like
dvec2 d;
float x = (float)d.y;
which nobody seems to have ever done. But the moral equivalent does occur
in one of the component layout piglit test.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Empty writemasks mean "copy everything", so we can always just use the number
of vector elements (which uses the GLSL meaning here, i.e. each double is a
single element/writemask bit).
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
For specifying an exact location/component.
v2: change the order of parameters (Dave)
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net> (v1)
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
This is a screen cap because drivers are expected to support it either
for all shader types or for none of them.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
radeonsi no longer supports pixel shaders without interpolation optimizations,
which led to assertion failures in si_shader_ps when running shader-db.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
total instructions in shared programs :2286901 -> 2284473 (-0.11%)
total gprs used in shared programs :335256 -> 335273 (0.01%)
total local used in shared programs :31968 -> 31968 (0.00%)
local gpr inst bytes
helped 0 41 852 852
hurt 0 44 23 23
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
We shouldn't be using wildcard here in the first place, but changing that
is some effort. As it stands, make -p confirms that glapi_gen_mapi_deps only
contains mapi_abi.py when building outside the Mesa tree.
As a result, only some of the tables were updated when XML files change, but
not the tables for shared glapi. This change ensures that we pick up the
XML files and scripts from the source tree as dependencies also for shared
glapi.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This should make the code more robust if a shader tries to use inputs which
aren't defined by the vertex element layout (which usually shouldn't happen).
No piglit change.
Reviewed-by: Brian Paul <brianp@vmware.com>
If pPhysicalDevices is too small for all physical devices,
the driver must return VK_INCOMPLETE.
Since only a single physical device is supported, this is only the case
when pPhysicalDeviceCount == 0 && pPhysicalDevices != NULL.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Many of these can be "NULL if the pipeline has rasterization disabled."
Also, we should assert that pMultisampleState exists.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It's supposed to be how much at least we want to grow the cs, not the
minimum size of the cs after growth.
v2: Unbreak use_ib_bos.
Don't mask the ib_size when !use_ib_bos, since it's not needed.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This checks the kernel driver name is amdgpu before calling
libdrm_amdgpu.
This avoids the following error:
amdgpu_device_initialize: DRM version is 1.6.0 but this driver is only compatible with 3.x.x
when run on a machine with i915 graphics as well as amdgpu.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Probably unlikely however ensure we don't leak a heap allocation
on the fail path.
V.2:
also fix missing 'amdgpu_device_deinitialize()' calls (Emil Velikov).
Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Drop/add a few newlines where appropriate and drop a couple of
unnessary braces.
Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
SwrStoreTiles now takes a mask of surfaces to store. Reduces
overhead when storing multiple render targets.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
Align and use streaming store instructions for BE fifo queues.
Provides slightly faster enqueue and doesn't pollute the caches.
Add appropriate memory fences to ensure streaming writes are
globally visible.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
EGL_KHR_swap_buffers_with_damage is actually already supported, as it is
technically nothing but a rename of EGL_EXT_swap_buffers_with_damage.
To that effect, both extension are advertised depending on the same
condition, and the new entrypoint simply redirects to the previous one.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
New generated headers were introduced by commit 63a366a
"intel: aubinator: generate a standalone binary"
Android does not need aubinator yet, so in order to avoid building error,
aubinator required new genxml headers are defined in a separate list.
If required, building rules for Android will be added later.
[Emil Velikov: don't use a _HEADERS variable name (causes warnings)]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
The static library is analogous to the intel ISL, which is required for
both hardware and (to be added) testing library.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Patch changes function to use _mesa_lookup_shader_program_err both
in TransformFeedbackVaryings and GetTransformFeedbackVarying that
handles errors correctly for invalid values of shader program.
Fixes following dEQP test:
dEQP-GLES31.functional.debug.negative_coverage.get_error.shader.transform_feedback_varyings
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98135
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Add the miptree level/slice x/y_offset when count the surface offset
in brw_emit_surface_state. The surface offset has two parts, one is
from mt->offset, which should be 32 aligned in width/height for tiled
buffer; another is from mt->level[current_level].slice[current_slice].
x/y_offset.
This fix will solve 12 deqp failure
dEQP-EGL.functional.image.create.gles2_cubemap_negative_*_texture
Signed-off-by: Xu,Randy <randy.xu@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
On Intel Pineview M hardware, the i915 gallium driver doesn't output
the correct gl_FragCoord. It seems to always have an X coord of 0.0
and a Y coord of the window's height in pixels, e.g. 600.0f or such.
I believe this is a regression caused in part by this commit:
afa035031f
The old behavior used the output at index zero, while the new behavior
uses actual zeroes. In the case of gl_FragCoord the output at index
zero happened to be the correct one, so the behavior appeared correct
although the code already had a bug.
Fixed by checking for I915_SEMANTIC_POS when setting up texCoords. If
the generic_mapping is I915_SEMANTIC_POS, look for the
TGSI_SEMANTIC_POSITION instead of a TGSI_SEMANTIC_GENERIC output.
https://bugs.freedesktop.org/show_bug.cgi?id=97477
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Tested-by: Stéphane Marchesin <marcheu@chromium.org>
This reverts commit 8472045b16.
Conflicts:
include/GL/mesa_glinterop.h
This patch fixes this build error with GCC 4.4.
Compiling src/glx/dri_common_interop.c ...
In file included from src/glx/dri_common_interop.c:33:
include/GL/mesa_glinterop.h:62: error: redefinition of typedef ‘GLXContext’
include/GL/glx.h:165: note: previous declaration of ‘GLXContext’ was here
Fixes: 8472045b16 ("mesa_glinterop: remove inclusion of GLX header")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96770
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Return error instead of crashing on source surfaces
with format D3DFMT_NULL.
Fix for issue #236.
Tested on Windows 7.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Wine tests show that cubetextures always use
PIPE_TEX_WRAP_CLAMP_TO_EDGE regardless of set
sampler states.
Fixes failing d3d9 wine test test_cube_wrap.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Add an assert to make sure buffer creation doesn't fail.
Add error handling in calling functions.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
The removed code was there for two reasons:
1) Allow DF16, DF24, INTZ to be used as depth buffer
for swapchain, if the driver doesn't support
PIPE_BIND_SAMPLER_VIEW for the underlying format
2) Set PIPE_BIND_SAMPLER_VIEW if possible, such that
if StretchRect is called on the depth texture, it is happy.
1) The reason these formats needed a workaround is because
the check flags for them in CheckDeviceFormat were incorrect,
which led applications to think the formats were valid for
swapchains, even if they weren't supported.
2) StretchRect limitations for depth buffers force
the resource_copy_region path, which should be fine without
PIPE_BIND_SAMPLER_VIEW.
Thus fix the check for 1), and remove the code.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Advertise quality levels:
Each supported multisample count matches to one quality level.
The application doesn't know how much samples each quality level has.
For that reason it's not possible to set the multisample mask.
Return errors on quality level missmatch.
Fixes several old games not having multisample support until now.
Fix for issue #73.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Compare resource's nr_samples instead of D3D multisample level.
Required for multisample quality levels to work correct.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Use strict aliasing in SetPrivateData and struct pheader.
Casting char[1] to IUnknown** isn't allowed in strict aliasing.
Compute pointer to body by adding size of header to header pointer.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Remove {Set/Get/Free}PrivateData in resource9.
Functionality has been implement in IUnknown interface.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Remove {Set/Get/Free}PrivateData in volume9.
Functionality has been implement in IUnknown interface.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Switch {Set/Get/Free}PrivateData function to introduced IUnknown functions.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Implement {Set/Get/Free}PrivateData in iunknown to get rid
of duplicated code in resource9 and volume9.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
According to MSDN the device is returned for surfaces that do
not have a regular container.
Such surfaces are:
OffscreenPlainSurface, DepthStencilSurface and RenderTarget
Tested and verified on Windows.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Allocate resources in surface ctor.
Allows to use statetracker internal memory accounting.
Fix for issue #231.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
D3DFMT_NULL is mapped to PIPE_FORMAT_NONE.
Instead of relying on PIPE_FORMAT_NONE to
return a size, pick one.
The one picked is the same than Wine.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Add DBG calls to NineTexture9_GetLevelDesc and
NineTexture9_GetSurfaceLevel to ease debugging.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Tests showed that is allowed to call this method on
object that have a zero refcount.
Required for issue #230.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Same change than for vs ff.
This makes it easier to not introduce mistakes
reusing temporaries whose result shouldn't be
erased.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Error found with wine tests.
nine_shader was expecting another order
than the one device9 was using.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Thanks to wine tests.
Apparently 4x4 inverse is to be used, and
if the inverse can't be calculated, the
input matrix is to be used.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Wine tests for the passthrough feature are for positiont.
Nothing seems to indicate passthrough happens when positiont
it not used. However having passthrough with positiont makes
sense (to be used with ProcessVertices outputs).
Signed-off-by: Axel Davy <axel.davy@ens.fr>
We were (wrongly) adding specular to diffuse
in vertex shaders when SPECULARENABLE was set.
However the spec says specular has to be added
after texture processing (which is in ps).
Besides SPECULARENABLE is flagged as a pixel state.
There was unused support for SPECULARENABLE
in the ps ff code.
Remove the vs code, and use the ps code.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
There are several holes. This patch reduces
the holes a bit, which reduces the size of
the constant buffer uploaded.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This has been a real mess up to now: the temporaries
were allocated once, and shared after that between
the different parts of the code.
To help maintaining the code, the temporaries are now
allocated and released on need.
As surprising as it could be, this patch, which was
supposed to introduce no behaviour change, actually
solved a visual bug observed on a sample program.
This was due to ureg_normalize3 polluting a temporary
variable.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
When this state is set, the normals computed
in the vs ff shader should be normalized.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Software Vertex Processing allows:
. Less limitations for shaders (more loops, etc)
. Less limitations for ff (more enabled lights, 255
matrices for VertexBlend)
In particular shaders can get more constants.
This patch implements support for this (not using software
rendering, but hardware rendering, as llvmpipe and dx10+ hw
have the same limits...)
This is considered a second class path. Even apps asking for
"Mixed Vertex processing" (ie the ability to switch to swvp
on demand) do not use the feature much. Some just initialize
more constants than the normal limit at the start of the
application, but never use more than the normal limit.
When the apps do not need the software vertex processing
features, they do not seem to turn it on. This means it is
ok if that path is slow.
Thus no care has been made to make the path optimized.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This path has been disabled for some time because
of some bugs with it. It hasn't been updated to the
new features, and is not faster.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
In mixed vertex processing, the user can enable or disable
software vertex processing. It is on hardware by default.
This feature is not a state, and thus the setting doesn't
need to be recorded by stateblocks.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Buffers with this flag must be usable with both software
and hardware vertex processing. Use Staging for fast cpu access.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
ATIx are "unknown" formats that do not follow block format conventions.
Tests showed that pitch*height bytes are allocated.
apitrace used to depend on this behaviour.
It used to copy more bytes than it has to for the ATI1 block format,
but it didn't crash on Windows.
Increase buffersize for ATI1 to fix this crash.
The same issue was present in WINE but a patch has been sent by me.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
To implement the feature we copy the ps inputs to a temp array.
This is not optimal for performance, but it is the simplest solution.
This is a feature that is very very rarely used.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
On 32 bits system, application memory is quite limited.
softpipe uses application memory. To help prevent memory
exhaustion, limit reported memory availability to 2GB.
Some gallium nine apps do check reported memory by allocating
resources until memory is full. Gallium nine refuses allocations
when 80% of the reported memory limit is used. This change
helps some apps to start.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
On 32 bits system, application memory is quite limited.
llvmpipe uses application memory. To help prevent memory
exhaustion, limit reported memory availability to 2GB.
Some gallium nine apps do check reported memory by allocating
resources until memory is full. Gallium nine refuses allocations
when 80% of the reported memory limit is used. This change
helps some apps to start.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Fixes regression introduced by
ecd6fce261
and is more future proof than just clearing the next
field.
Other nine usages did already zero out the templates.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
When offset != 0, the valid range was wrong because the second
argument of util_range_add() is end, not size.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Normally the value is an immediate, which is moved to some temporary, so
there's no problem. In the case of a non-constant offset (as allowed by
ARB_gpu_shader5), we have to take care to copy it first before using it
to build up the bits.
This fixes a compilation error observed in F1 2015.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
CC glsl/tests/cache_test.o
glsl/tests/cache_test.c: In function ‘test_cache_create’:
glsl/tests/cache_test.c:160:4: error: implicit declaration of function ‘cache_destroy’ [-Werror=implicit-function-declaration]
cache_destroy(cache);
^
Fixes: 87ab26b2ab ("glsl: Add initial functions to implement an on-disk cache")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Pre-patch, there were two code paths for parsing EGLSync attribute
lists: one path for old-style EGLint lists, used by eglCreateSyncKHR,
and another for new-style EGLAttrib lists, used by eglCreateSync (1.5)
and eglCreateSync64 (EGL_KHR_cl_event2).
There were two attrib_list parsing functions,
_eglParseSyncAttribList(_EGLSync *sync, const EGLint *attrib_list)
_eglParseSyncAttribList64(_EGLSync *sync, const EGLattrib *attrib_list)
This patch unifies the two attrib_list parsing functions into one,
_eglParseSyncAttribList(_EGLSync *sync, const EGLattrib *attrib_list)
Many internal EGLSync function signatures had *two* attrib_list
parameters to accomodate both code paths: one parameter was an EGLint
list and other an EGLAttrib list. At most one of the parameters was
allowed to be non-null. This patch removes the `EGLint *attrib_list`
parameter, leaving only the `EGLAttrib *attrib_list` parameter, for all
internal EGLSync functions.
v2:
- Consistently use condition (sizeof(int_list[0]) ==
sizeof(attrib_list[0])). [for emil]
v3:
- Don't double-unlock the display in eglCreateSyncKHR.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v2)
A function with multiple returns would have had multiple preret settings
at the top of the function. While this is unlikely to have caused issues
since we don't use functions in earnest, it could have in some cases
overflowed the call stack, in case a function had a lot of early
returns.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Not sure if it's possible to avoid programming the block size twice (once for
the userdata and once for the dispatch).
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
According to the Vulkan spec 5.63.4 :
samplerAnisotropy indicates whether anisotropic filtering is supported. If
this feature is not enabled, the maxAnisotropy member of the
VkSamplerCreateInfo structure must be 1.0.
Since we already set maxAnisotropy to 16 and program the hardware according
to the VkSamplerCreateInfo.maxAnisotropy, it seems we can turn this on.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Changes make copy_propagation_elements pass faster, reducing link
time spent in test case of bug 94477. Does not fix the actual issue
but brings down the total time. No regressions seen in CI.
v2 (idr): Formatting / whitespace fixes. Embed the acp_ref in the
acp_entry.
v3 (idr): Delete unused copy constructor. Use while(pop_head) instead
of foreach() { remove }.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In conjuction with an intel_aubdump change, you can now look at your
application's output like this :
$ intel_aubdump -c '/path/to/aubinator --gen=hsw' my_gl_app
v2: Add print_help() comment about standard input handling (Eero)
Remove shrinked gtt space debug workaround (Eero)
v3: Use realloc rather than memcpy/free (Ben)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
Embed the xml files into the binary, so aubinator can be used from any
location.
v2: Split generation packing into another patch (Jason)
Check for xxd (Jason)
v3: Fix out of tree builds (Jason)
Generate custom variable name rather than names generated by xxd
(Lionel)
v4: Move generated _xml.h files to genxml/ (Sirisha)
v5: Remove newline from makefile (Jason)
v6: Add comment on gen*_xml.h creation (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery:
(rebase)
- Resolve conflicts with new anv_batch_emit macro
(amend)
- Handle a QPitch TODO
- Emit 3DSTATE_HIER_DEPTH_BUFFER on pre-BDW systems
- Only use HiZ for single-subpass renderpasses
- Emit the HiZ instruction before the stencil instruction to follow the
optimized clear sequence specified in the PRMs
- Don't modify clear params
- Enable resolves when a HiZ buffer is used to ensure depth buffer validity
Provides an FPS increase of ~15% on the Sascha triangle and multisampling
demos.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Create a function that performs one of three HiZ operations -
depth/stencil clears, HiZ resolve, and depth resolves.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Nanley Chery:
(rebase)
- Use isl_surf_get_hiz_surf()
(amend)
- Only add a HiZ surface onto a depth/stencil attachment
- Add comment above HiZ surface addition
- Hide HiZ behind INTEL_VK_HIZ prior to BDW
- Disable HiZ for untested cases
- Remove DISABLE_AUX_BIT instead of preventing it from being added
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
We will only hit this with multi-planar YUV external images, so we would
probably never hit this code path in the first place. But if we did, it
wouldn't do the right thing so just bail.
Signed-off-by: Rob Clark <robdclark@gmail.com>
According to the spec - 9.6. Pipeline Cache :
If pDataSize is less than the maximum size that can be retrieved by the
pipeline cache, at most pDataSize bytes will be written to pData, and
vkGetPipelineCacheData will return VK_INCOMPLETE.
Fixes the following test from Vulkan CTS :
dEQP-VK.pipeline.cache.pipeline_from_incomplete_get_data.vertex_stage_fragment_stage
dEQP-VK.pipeline.cache.pipeline_from_incomplete_get_data.vertex_stage_geometry_stage_fragment_stage
dEQP-VK.pipeline.cache.misc_tests.invalid_size_test
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Since the gen_device_info structs are no longer just constant memory, a
pointer to one is not a pointer to something in the .data section so we
shouldn't be storing it in a static variable. Instead, we should just
store the entire device_info structure.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
We have to be careful to not smash the value they're clearing to, but
other than that we're fine. Avoids quad clears in Processing, which likes
to do glClear(Z|S); glClear(Z).
Improves performance of Processing's QuadRendering demo at 5000 quads by
5.46507% +/- 1.35576% (n=15 before, 32 after)
We were incrementing the count at the end of vc4_start_draw(), except that
that function returns immediately if we've already started drawing on this
batch. It also failed to count the statechanges from the GFXH-515
workaround.
This incidentally allows repeated glClear() to be coalesced, because the
fast clears aren't counted in draw_calls_queued any more. Fixes most of
the extra flushes in Processing, which emits glClear(Z|S); glClear(Z);
glClear(C) during its frame setup.
Improves performance of Processing's QuadRendering demo at 5000 quads by
3.33538% +/- 2.05846% (n=21 before, 15 after)
This slightly reduces instructions on shader-db, but I think it's just
perturbing register allocation -- the allocator should have always
trivially colored these nodes, before. This commit is just to make QIR
code failing more intelligible when register allocation fails.
If a conditional assignment is only conditioned on the exec mask, that's
still screening off the value in the executed channels (and, since we're
not storing to the unexcuted channels, we don't care what's in there).
Fixes a bunch of extra register pressure on Processing's Ribbons demo,
which is failing to allocate.
We would assertion fail in setting up the simulator the second time
around. This at least postpones the assertion failure until we've closed
all of the first set of screens and started opening a new set.
Initially, we had intended set_subpass to be an interesting function that
did whatever (presumably a lot) setup we needed for a subpass. In reality,
it just sets a pointer and a dirty bit and then emits depth and stencil
state. When we call BeginCommandBuffer on a secondary, there's no point in
setting depth and stencil state since it will already be set by the
primary. Instead, the only thing we need to do at the start of a secondary
is set the subpass pointer and the dirty bit.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
We have a DIRTY_RENDER_TARGETS flag and that makes a lot more sense than
just dirtying fragment descriptors. We're checking for it in some of the
gen7 code but unfortunately, nothing was setting it and it didn't do what
it was supposed to do in cmd_buffer_flush_state.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Because WSI images are created with VkImageCreateInfo::flags explicitly set
to 0, they don't ever have the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT set.
This means that you can't create an image view of it with a different
format so applications can't render directly in sRGB (without automatic
encoding) unless we actually advertise UNORM formats. There are a lot of
applications that want to do their own sRGB conversion, so we should allow
for that. We do, however, make UNORM come after sRGB in the list so that
the default for dumb apps that just grab the first thing is to render in
linear and let the sRGB conversion happen automatically.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
If the user created a fence with VK_FENCE_CREATE_SIGNALED_BIT set, we
shouldn't fail to wait for a fence if it was not submitted since that is
not necessary.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This squashes all the radv development up until now into
one for merging.
History can be found:
https://github.com/airlied/mesa/tree/semi-interesting
This requires llvm 3.9 and is in no way considered
a conformant vulkan implementation. It can run a number
of vulkan applications, and supports all GPUs using
the amdgpu kernel driver.
Thanks to Intel for providing anv and spirv->nir,
and Emil Velikov for reviewing build integration.
Parts of this are:
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Authors: Bas Nieuwenhuizen and Dave Airlie
Signed-off-by: Dave Airlie <airlied@redhat.com>
Checking if MAD is supported is definitely wrong, and it's
more likely a typo I introduced few days ago which breaks
NV50 because SHLADD is not supported there.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
When the chipset is forced with NV50_PROG_CHIPSET, we actually
only want to output the binary if NV50_PROG_DEBUG is also
enabled. Otherwise, this pollutes the shader-db output.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Only expose 512 threads/block on Fermi to not be limited by
32 GPRs/thread.
v4: - use 512 threads on Fermi, 1024 on Kepler+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
When a variable local size is defined as specified by
ARB_compute_variable_group_size, the fixed local size is set to 0
and a SIGFPE occurs when we compute the maximum number of regs.
This allows to use 64 GPRs/thread.
v4: - use 512 threads on Fermi, 1024 on Kepler+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This extension is only exposed if the underlying driver supports
ARB_compute_shader and if PIPE_COMPUTE_MAX_VARIABLE_THREADS_PER_BLOCK
is set.
v3: - initialize max_variable_threads_per_block to 0
v2: - expose the ext based on that new cap
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
gl_LocalGroupSizeARB can be translated into TGSI_SEMANTIC_BLOCK_SIZE
which represents the block size in threads.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Compute shaders can now include a fixed local size as defined by
ARB_compute_shader or a variable size as defined by
ARB_compute_variable_group_size.
v2: - update formatting spec quotations (Ian)
- various cosmetic changes (Ian)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The ARB_compute_variable_group_size specification explains that
when a compute shader includes both a fixed and a variable local
size, a compile-time error occurs.
v2: - update formatting spec quotations (Ian)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is the new layout qualifier introduced by
ARB_compute_variable_group_size which allows to use a variable work
group size.
v4: - add missing '%s' in the monster format string
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v5: - replace fixed_local_size by !LocalSizeVariable (Nicolai)
v4: - slightly indent spec quotes (Nicolai)
- drop useless _mesa_has_compute_shaders() check (Nicolai)
- move the fixed local size outside of the loop (Nicolai)
- add missing check for invalid use of work group count
v2: - update formatting spec quotations (Ian)
- move the total_invocations check outside of the loop (Ian)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
helped some ue4 demos and divinity OS shaders
total instructions in shared programs : 2818674 -> 2818606 (-0.00%)
total gprs used in shared programs : 379273 -> 379273 (0.00%)
total local used in shared programs : 9505 -> 9505 (0.00%)
total bytes used in shared programs : 25837792 -> 25837192 (-0.00%)
local gpr inst bytes
helped 0 0 33 33
hurt 0 0 0 0
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Previously, the sampler view code was scattered across several different
files.
Note, the previous REALLOC(), FREE() for st_texture_object::sampler_views
are replaced by realloc(), free() to avoid conflicting macros in Mesa vs.
Gallium.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Before, st_get_texture_sampler_view_from_stobj() did a lot of work to
check if the texture parameters matched the sampler view (format,
swizzle, min/max lod, first/last layer, etc). We did this every time
we validated the texture state.
Now, we use a ctx->Driver.TexParameter() callback and a couple other
checks to proactively release texture views when we know that
view-related parameters have changed. Then, the validation step is
simplified:
- Search the texture's list of sampler views (just match the context).
- If found, we're done.
- Else, create a new sampler view.
There will never be old, out-of-date sampler views attached to texture
objects that we have to test.
Most apps create textures and set the texture parameters once. This
make sampler view validation much cheaper for that case.
Note that the old texture/sampler comparison code has been converted
into a set of assertions to verify that the sampler view is in fact
consistent with the texture parameters. This should help to spot any
potential regressions.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
To inform drivers of texture buffer offset/size changes, as we do for
other texture object parameters.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Before, we had code to compute the sampler view's format spread across two
different functions: in update_single_texture() and
st_get_texture_sampler_view_from_stobj(). Now it's all in one new function.
Also, use _mesa_texture_base_format() to simplify the code.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Now that the NIR casting functions have type assertions, we have a bunch of
assertions that aren't needed anymore.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
One of NIR's invariants is that control flow lists always start and end
with blocks. There's no good reason why we should return a cf_node from
these functions since we know that it's always a block. Making it a block
lets us remove a bunch of code.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This makes calling nir_foo_as_bar a bit safer because we're no longer 100%
trusting in the caller to ensure that it's safe. The caller still needs to
do the right thing but this ensures that we catch invalid casts with an
assert rather than by reading garbage data. The one downside is that we do
use the casts a bit in nir_validate and it's not a validate_assert.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Otherwise it won't be picked in the tarball and the build will fail.
Fixes: 91ec6e5664 ("radeonsi/compute: Use the HSA abi for non-TGSI
compute shaders v3")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Otherwise it won't be picked in the tarball and the build will fail.
Fixes: 0035f7f136 ("svga: add guest statistic gathering interface")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
The function deals with fb (style) configs, thus using "visual"
in the name is misleading. Which in itself had led to the use of
fbconfig_style_tags argument.
Rename the function to reflect what it does and drop the unneeded
argument.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Return GL_FALSE if we fail to find any fb/visual configs, otherwise we
end up with all sorts of chaos further down the GLX stack.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The comment/spec says - only for pbuffer drawables, while the code
clears the window/pixmap bit. Practise what you preach and apply the
trivial tweak. In practise this should not cause functional change.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This fixes a problem where GL headers would only get installed if
glx was enabled. So if osmesa was enabled but not glx, then the
GL headers required by osmesa would be missing from the install.
v2: Dropped unneeded mesa_glinterop.h redundant osmesa.h install
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Similar to earlier commit - symbol was never part of the public API so
we're safe to remove it.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The symbol is a no-op since, the EXTRA_DEBUG macro is not set in the
build. Unused by !Haiku people/platforms since 2010 (commit
a73c6540d9) while the Haiku C++ wrapper
has no obvious users.
Cc: Alexander von Gluck IV <kallisti5@unixzen.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Similar to the other 'tests', enable assertions in xvmc_bench.
This silences the GCC warnings about unused-variable(s), makes the
program actually useful, as the XvMC API called. Atm the function
calls are omitted, since they're called within the assert.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
The header was renamed with earlier commit, so update the
Makefile.sources respectively.
{vulkan/genX_multisample.h => common/gen_sample_positions.h}
Fixes: c779ad3e661("intel: Move Vulkan sample positions to common code")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
glsl_print_type() prints arrays of arrays incorrectly. For example,
a type with name float[3][7] would be printed as float[7][3]. (This
is an array of length 3 containing arrays of 7 floats.) cdecl says
that the type name is correct.
glsl_print_type() doesn't really do anything above and beyond printing
type->name, and glsl_print_struct() wasn't used at all. So, drop them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
According to chapters 16.5. (Timestamp Queries) and 30.2 (Limits) of the
Vulkan Specification 1.0.29, the .limits.timestampPeriod field returned
by vkGetPhysicalDeviceProperties is measured in nanoseconds, not in
seconds.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a step towards dropping the GLSL IR version of
do_set_program_inouts() in i965 and moving towards native nir support.
This is important because we want to eventually convert to nir and
use its optimisations passes before we can call this GLSL IR pass.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a step towards dropping the GLSL IR version of
do_set_program_inouts() in i965 and moving towards native nir support.
This is important because we want to eventually convert to nir and
use its optimisations passes before we can call this GLSL IR pass.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a step towards dropping the GLSL IR version of
do_set_program_inouts() in i965 and moving towards native nir support.
This is important because we want to eventually convert to nir and
use its optimisations passes before we can call this GLSL IR pass.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a step towards dropping the GLSL IR version of
do_set_program_inouts() in i965 and moving towards native nir support.
This is important because we want to eventually convert to nir and
use its optimisations passes before we can call this GLSL IR pass.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a step towards dropping the GLSL IR version of
do_set_program_inouts() in i965 and moving towards native nir support.
This is important because we want to eventually convert to nir and
use its optimisations passes before we can call this GLSL IR pass.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just say no to:
- brw->cs.base.prog_data = &brw->cs.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_cs_prog_data as needed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
Just say no to:
- brw->wm.base.prog_data = &brw->wm.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_wm_prog_data as needed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
Just say no to:
- brw->gs.base.prog_data = &brw->gs.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_gs_prog_data as needed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
Just say no to:
- brw->tes.base.prog_data = &brw->tes.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_tes_prog_data as needed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
Just say no to:
- brw->tcs.base.prog_data = &brw->tcs.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_tcs_prog_data as needed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
Just say no to:
- brw->vs.base.prog_data = &brw->vs.prog_data->base.base;
We'll just use the brw_stage_prog_data pointer in brw_stage_state
and downcast it to brw_vs_prog_data as needed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
What is the difference between a 'driver_fence' and a 'fence'? Do the
characters 'driver_' add anything helpful? Nope. They do, though, add an
extra 7 chars and pull your eyeballs away to ask "huh? what's that?" one
microsecond too many.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is yet another patch for the great renaming begun long ago.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We locked an unitialized mutex in the callstack
glClientWaitSync
intel_gl_client_wait_sync
brw_fence_client_wait_sync
because we forgot to initialize it in intel_gl_fence_sync.
(The EGLSync codepath didn't have this bug. It initialized the mutex in
intel_dri_create_sync).
We also forgot to tear down (mtx_destroy) the mutex when destroying
the sync object.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently, program binaries are only dumped at upload time, but
when the chipset has been forced via NV50_PROG_CHIPSET we might
want to show the generated code, especially with shaderdb.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
which is returned for GL_MAX_TEXTURE_BUFFER_SIZE.
It doesn't have any other use at the moment.
Bigger allocations are not rejected.
This fixes GL45-CTS.texture_buffer.texture_buffer_max_size on Bonaire.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
(v1 pushed, then reverted)
This fixes 9 randomly failing tests on radeonsi:
GL45-CTS.shader_multisample_interpolation.render.interpolate_at_centroid.*
v2: use input_interpolate[input] (correct) instead of
input_interpolate[index] (incorrect)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Use the vertex positions described in the PRMs. This has no effect on
rendering but quiets the simulator warnings seen when the vertices
appear out of order.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
It is now the only caller so there's no sense in keeping things split out.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Wire up the debug entrypoints to EGL dispatch, and add the extension
string to the client extension list.
v2:
- Lots of style fixes
- Fix missing EGLAPIENTRYs
- Factor out valid attribute check
- Lock display in eglLabelObjectKHR as needed, and use RETURN_EGL_*
- Move "EGL_KHR_debug" into asciibetical order in client extension
string
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.veliko@collabora.com>
This decorates every EGL entrypoint with _EGL_FUNC_START, which records
the function name and primary dispatch object label in the current
thread state. It also adds debug report functions and calls them when
appropriate.
This would be useful enough for debugging on its own, if the user set a
breakpoint when the report function was called. We will also need this
state tracked in order to expose EGL_KHR_debug.
v2:
- Clear the object label in more cases in _eglSetFuncName
- Pass draw surface (if any) to _EGL_FUNC_START in eglSwapInterval
v3:
- Set dummy thread's CurrentAPI to EGL_OPENGL_ES_API not zero
- Less ?: in _eglSetFuncName
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.veliko@collabora.com>
There should be no functional change here because Broxton and CHV are
both gt1. Without this code however, it might seem like broxton support
is missing.
While here, put the gt1 check in front to hopefully short-circuit the
condition for the mobile cases.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The difference to the virtually identical ARB_robustness (which is already
enabled unconditionally) is miniscule and handled elsewhere, but this cap
seems like the right thing to require for this extension.
v2: drop the device reset cap requirement (Ilia)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Check for device reset on flush. It would be nicer if the kernel just
reported this as an error on the submit ioctl (and similarly for fences),
but this will do for now.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is basically a re-write of the slab allocator into a design where
multiple child pools are linked to a parent pool. The intention is that
every (GL, pipe) context has its own child pool, while the corresponding
parent pool is held by the winsys or screen, or possibly the GL share group.
The fast path is still used when objects are freed by the same child pool
that allocated them. However, it is now also possible to free an object in a
different pool, as long as they belong to the same parent. Objects also
survive the destruction of the (child) pool from which they were allocated.
The slow path will return freed objects to the child pool from which they
were originally allocated. If that child pool was destroyed, the corresponding
page is considered an orphan and will be freed once all objects in it have
been freed.
This allocation pattern is required for pipe_transfers that correspond to
(GL) buffer object mappings when the mapping is created in one context
which is later destroyed while other contexts of the same share group live
on -- see the bug report referenced below.
Note that individual drivers do need to migrate to the new interface in
order to benefit and fix the bug.
v2: use singly-linked lists everywhere
v3: use p_atomic_set for page->u.num_remaining
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97894
This is motivated by the fact that p_atomic_read and p_atomic_set may
somewhat surprisingly not do the right thing in the old version: while
stores and loads are de facto atomic at least on x86, the compiler may
apply re-ordering and speculation quite liberally. Basically, the old
version uses the "relaxed" memory ordering.
The new ordering always uses acquire/release ordering. This is the
strongest possible memory ordering that doesn't require additional
fence instructions on x86. (And the only stronger ordering is
"sequentially consistent", which is usually more than you need anyway.)
I would feel more comfortable if p_atomic_set/read in the old
implementation were at least using volatile loads and stores, but I
don't see a way to get there without typeof (which we cannot use here
since the code is compiled with -std=c99).
Eventually, we should really just move to something that is based on
the atomics in C11 / C++11.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Anv programs the hardware to use L3 data cache if we use either SSBOs or
images in the shaders, we can program i965 the same way.
gl_shader_program has a bit of a confusing named field with
'NumAtomicBuffers'. It doesn't tell how many buffers are accessed by the
shader in an atomic way but instead the number of atomic counters
manipulated by the shader.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
ES3-CTS.functional.negative_api.buffer.framebuffer_texture2d expects
glFramebufferTexture[123]D to raise GL_INVALID_ENUM when
supplied a completely bogus textarget parameter (i.e. 0xffffffff).
This is at odds with the spec. GLES 3.1 says:
"An INVALID_OPERATION error is generated if texture is not zero and
textarget is not one of TEXTURE_2D, TEXTURE_2D_MULTISAMPLE, or one
of the cube map face targets from table 8.21."
(and GLES 3.0 and GL 4.5 both have similar text). However, GL has a
general guideline that says:
"If a command that requires an enumerated value is passed a symbolic
constant that is not one of those specified as allowable for that
command, an INVALID_ENUM error is generated."
Apparently other vendors reconcile these two rules as follows: GL should
raise INVALID_OPERATION for actual texture target enumeration values
which are not allowed for this particular glFramebufferTexture*D call.
Any value that is not a texture target should result in GL_INVALID_ENUM.
For example, glFramebufferTexture2D with GL_TEXTURE_1D would result in
INVALID_OPERATION because it is a real texture target, but not allowed
for the 2D version of the function. But calling it with GL_FRONT would
result in INVALID_ENUM, as that isn't even a texture target.
Fixes:
- {ES3-CTS,dEQP-GLES3}.functional.negative_api.buffer.framebuffer_texture2d
- {ES31-CTS,ES32-CTS,dEQP-GLES31}.functional.debug.negative_coverage.get_error.buffer.framebuffer_texture2d
References: https://gitlab.khronos.org/opengl/cts/merge_requests/387
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Having one top-level switch statement covering all known texture targets
will make the next change easier to implement.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
From the less man page:
"Warning: when the -r option is used, less cannot keep track of the
actual appearance of the screen (since this depends on how the
screen responds to each type of control character). Thus, various
display problems may result, such as long lines being split in the
wrong place."
Lines which are too long to fit in the terminal would be word wrapped,
but unfortunately less would get confused about which line it was on,
and text would be drawn on top of other text. The most noticable case
was shader assembly, which is frequently too wide for an 80 character
terminal, and thus would be drawn on top of the following state packets,
making them completely unreadable.
Using -R instead of -r fixes this problem by only allowing color escape
sequences. (Notably, Git's implicit pager invocation uses -R.)
Unfortunately, it means our "clear to the end of the line" hack for
extending the blue bar headers won't work anymore.
Word wrapping usually isn't terribly readable, anyway, so we also add
the -S option (chop long lines) to restrict it to the terminal width.
(You can hit the left and right arrow keys to scroll sideways.)
Then, for a new blue bar hack, we can use a printf specifier to pad
the command packet names to be 80 characters long (arbitrarily), which
extends them "far enough" to look good, and doesn't require us to use
ioctls to determine the terminal width.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Sirisha Gandikota <sirisha.gandikota@intel.com>
The atom that uploads push constants listens to _NEW_TRANSFORM for
legacy clip plane handling. On Sandybridge, the gen6_vs_state atom
emits 3DSTATE_CONSTANT_VS as well as 3DSTATE_VS, so it needs to listen
to the same set of conditions.
However, it looks like Gen7 doesn't need this. The push constant atom
emits 3DSTATE_CONSTANT_VS directly, and the gen7_vs_state atom that
emits 3DSTATE_VS doesn't have a dependency on ctx->Transform.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Delete some stray debug code notice by Iago.
v3: Massive rebase on new ir_function_signature::intrinsic_id mechanism.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [v1]
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Otherwise grepping for where atomic_counter_inc and friends are defined
is a very frustrating experience.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Just generate an __intrinsic_atomic_add with a negated parameter.
Some background on the non-obvious reasons for the the big change to
builtin_builder::call()... this is cribbed from some discussion with
Ilia on mesa-dev.
Why change builtin_builder::call() to allow taking dereferences and
create them here rather than just feeding in the ir_variables directly?
The problem is the neg_data ir_variable node would have to be in two
lists at the same time: the instruction stream and parameters. The
ir_variable node is automatically added to the instruction stream by the
call to make_temp. Restructuring the code so that the ir_variables
could be in parameters then move them to the instruction stream would
have been pretty terrible.
ir_call in the instruction stream has an exec_list that contains
ir_dereference_variable nodes.
The builtin_builder::call method previously took an exec_list of
ir_variables and created a list of ir_dereference_variable. All of the
original users of that method wanted to make a function call using
exactly the set of parameters passed to the built-in function (i.e.,
call __intrinsic_atomic_add using the parameters to atomicAdd). For
these users, the list of ir_variables already existed: the list of
parameters in the built-in function signature.
This new caller doesn't do that. It wants to call a function with a
parameter from the function and a value calculated in the function. So,
I changed builtin_builder::call to take a list that could either be a
list of ir_variable or a list of ir_dereference_variable. In the former
case it behaves just as it previously did. In the latter case, it uses
(and removes from the input list) the ir_dereference_variable nodes
instead of creating new ones.
text data bss dec hex filename
6036395 283160 28608 6348163 60dd83 lib64/i965_dri.so before
6036923 283160 28608 6348691 60df93 lib64/i965_dri.so after
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
text data bss dec hex filename
6036491 283160 28608 6348259 60dde3 lib64/i965_dri.so before
6036395 283160 28608 6348163 60dd83 lib64/i965_dri.so after
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
This necessetated renaming the is_intrinsic field to _is_intrinsic. The
next commit will remove the field.
text data bss dec hex filename
6036507 283160 28608 6348275 60ddf3 lib64/i965_dri.so before
6036491 283160 28608 6348259 60dde3 lib64/i965_dri.so after
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
text data bss dec hex filename
6038043 283160 28608 6349811 60e3f3 lib64/i965_dri.so before
6036507 283160 28608 6348275 60ddf3 lib64/i965_dri.so after
v2: s/ir_intrinsic_atomic_sub/ir_intrinsic_atomic_counter_sub/. Noticed
by Ilia.
v3: Silence unhandled enum in switch warnings in st_glsl_to_tgsi.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
text data bss dec hex filename
6037483 283160 28608 6349251 60e1c3 lib64/i965_dri.so before
6038043 283160 28608 6349811 60e3f3 lib64/i965_dri.so after
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
st_glsl_to_tgsi only calls lower_instructions once (instead of in a
loop), so the ir_binop_carry generated would not get lowered. Fixes
assertion failure
state_tracker/st_glsl_to_tgsi.cpp:2265: void glsl_to_tgsi_visitor::visit_expression(ir_expression*, st_src_reg*): Assertion `!"Invalid ir opcode in glsl_to_tgsi_visitor::visit()"' failed.
on softpipe in 16 piglit tests:
mesa_shader_integer_functions/execution/built-in-functions/fs-imulExtended-nonuniform.shader_test
mesa_shader_integer_functions/execution/built-in-functions/fs-imulExtended-only-msb-nonuniform.shader_test
mesa_shader_integer_functions/execution/built-in-functions/fs-imulExtended-only-msb.shader_test
mesa_shader_integer_functions/execution/built-in-functions/fs-imulExtended.shader_test
mesa_shader_integer_functions/execution/built-in-functions/fs-umulExtended-nonuniform.shader_test
mesa_shader_integer_functions/execution/built-in-functions/fs-umulExtended-only-msb-nonuniform.shader_test
mesa_shader_integer_functions/execution/built-in-functions/fs-umulExtended-only-msb.shader_test
mesa_shader_integer_functions/execution/built-in-functions/fs-umulExtended.shader_test
mesa_shader_integer_functions/execution/built-in-functions/vs-imulExtended-nonuniform.shader_test
mesa_shader_integer_functions/execution/built-in-functions/vs-imulExtended-only-msb-nonuniform.shader_test
mesa_shader_integer_functions/execution/built-in-functions/vs-imulExtended-only-msb.shader_test
mesa_shader_integer_functions/execution/built-in-functions/vs-imulExtended.shader_test
mesa_shader_integer_functions/execution/built-in-functions/vs-umulExtended-nonuniform.shader_test
mesa_shader_integer_functions/execution/built-in-functions/vs-umulExtended-only-msb-nonuniform.shader_test
mesa_shader_integer_functions/execution/built-in-functions/vs-umulExtended-only-msb.shader_test
mesa_shader_integer_functions/execution/built-in-functions/vs-umulExtended.shader_test
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This function is only ever used by an assert() this fixes an
unused function warning in release builds.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
v2: Make sure that with num_lods > 1 and min_filter != mag_filter we
still enter the splitting path. So this case would still use 4-wide aos
path (as a side note, the 4-wide aos sampling path could actually be
improved quite a bit if we have avx2, by just doing the filtering with
256bit vectors).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
_eglInitSync checked that the display supported the sync type (such as
EGL_SYNC_FENCE), and did it wrong. When the check failed it emitted
EGL_BAD_ATTRIBUTE, but sometimes EGL_BAD_PARAMETER is needed.
_eglCreateSync already does the error checking, and it does it right.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
When the function encountered an error, it effectively returned
immediately. However, it did so indirectly by breaking out of a loop.
Replace the loop breakout with a explicit 'return'.
Do the same for _eglParseSyncAttribList64 too.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This function converts an attribute list from EGLint[] to EGLAttrib[].
Will be used in following patches to cleanup EGLSync attribute parsing.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
When the user called eglCreateSync64KHR on a display without
EGL_KHR_cl_event2 (the only extension that exposes it), we returned
EGL_NO_SYNC but did not update the error code.
We also did the same for eglCreateSync on a display without EGL 1.5.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This patch causes 2 regressions in khronos' gles cts tests
on various intel platforms.
Failing tests:
ES3-CTS.functional.state_query.integers.viewport_getinteger
ES3-CTS.functional.state_query.integers.viewport_getfloat
Here is an explanation of what's causing the failures:
CTS tests are not clamping the x, y location of the viewport's
bottom-left corner as recommended by ARB_viewport_array and
OES_viewport_array:
"The location of the viewport's bottom-left corner, given by (x,y), are
clamped to be within the implementation-dependent viewport bounds range.
The viewport bounds range [min, max] tuple may be determined by
calling GetFloatv with the symbolic constant VIEWPORT_BOUNDS_RANGE_OES"
Khronos CTS merge request to fix the test case:
https://gitlab.khronos.org/opengl/cts/merge_requests/399
V2: Initialize the relevant variables for GL_OES_viewport_array on gen8+
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
For reference picture sets, there are cases that rps will not always
be used. Once detect the unused flag from encoded bitstream, we should
not add this rps to any list, otherwise pass the incorrect reference
and skip the correct rps.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
It will fix the corruption for frame, that only has one stort term ref
picture set, we set NULL rps for this case previously, causing taking
incorrect reference. Instead we should take that only short term set
as reference
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The video size from format container is not always compatible with
the size from codec bitstream, the HW decoder should take the size
information from bitstream, otherwise the corruption appears with clip
that has different size info between bitstream and format container
So we are passing width(height)_in_samples from sequence parameter
set to video decoder.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This reverts commit fc03ecfeaf.
Chad had already pushed the same change between me posting the patch and Jason
pushing it: 44bcf1ffcc (".gitignore: Ignore src/compiler/spirv2nir")
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is enabled automatically if shader printing is enabled, or separately
by R600_DEBUG=checkir. Catch mal-formed IR before it crashes in a later
pass.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This changes the order of basic blocks to be equal to the order of code in the
original TGSI, which is nice for making sense of shader dumps.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Some of the existing code is needlessly complicated. The basic principle
should be: control-flow opcodes emit branches to properly terminate the
current block, _unless_ the current block already has a terminator (which
happens if and only if there was a BRK or CONT).
This also fixes a bug where multiple terminators were created in a block.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97887
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
si_llvm_emit_ddxy is called once per element, so we don't have to generate
code for 4 elements at once.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
do it at bind time, so that pipe_sampler_view is immutable with regard to
buffer reallocations and we don't have to remember all existing buffer
views.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
In core profile, we support up to 16 viewports. However, in the
majority of cases, only 1 of them is actually used - we only need
the others if the last shader stage prior to the rasterizer writes
gl_ViewportIndex.
Processing all 16 viewports adds additional CPU overhead, which hurts
CPU-intensive workloads such as Glamor. This meant that switching to
core profile actually penalized Glamor to an extent, which is
unfortunate.
This patch tracks the number of relevant viewports, switching between
1 and ctx->Const.MaxViewports if gl_ViewportIndex is written. A new
BRW_NEW_VIEWPORT_COUNT flag tracks this. This could mean re-emitting
viewport state when switching, but hopefully this is offset by doing
1/16th of the work in the common case. The new flag is also lighter
weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case.
According to Eric Anholt, x11perf -copypixwin10 performance improves by
11.5094% +/- 3.10841% (n=10) on his Skylake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously, we were saving off the last nir_block in a vtn_block before
moving on so that we could find the nir_block again when it came time to
handle phi sources. Unfortunately, NIR's control flow modification code is
inconsistent when it comes to how it splits blocks so the block pointer we
saved off may point to a block somewhere else in the shader by the time we
get around to handling phi sources. In order to get around this, we insert
a nop instruction and use that as the logical end of our block. Since the
control flow manipulation code respects instructions, the nop will keeps
its place like any other instruction and we can easily find the end of our
block when we need it.
This fixes a bug triggered by a couple of vkQuake shaders.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97233
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This intrinsic has no destination, no sources, no variables, and can be
eliminated. In other words, it does nothing and will always get deleted by
dead code elimination. However, it does provide a quick-and-easy way to
temporarily tag a particular location in a NIR shader.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
We already do those checks in filter_tiling. There's no good reason to
repeat them in choose_msaa_layout. If anything they should have been
asserts and not "return false" checks. Also, this check was causing us to
outright reject multisampled HiZ surfaces which wasn't intended.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The HiZ and CCS tiling formats are always used for HiZ and CCS surfaces
respectively. There's no reason why we should go through filter_tiling and
it's much easier to always get HiZ and CCS right if we just handle them
directly.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
HiZ buffers can be multisampled and, on Broadwell and earlier, simply using
interleaved multisampling with a compression block size of 8x4 samples
yields the correct HiZ surface size calculations. Unfortunately,
choose_msaa_layout was rejecting multisampled HiZ buffers because of format
checks. Now that we have a simple helper for determining if a format
supports multisampling, that's an easy enough issue to fix.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Compressed 1-D textures are not well-defined thing in either GL or Vulkan.
However, auxiliary surfaces are treated as compressed textures in ISL and
we can do HiZ and CCS with 1-D so we need to be able to create them. In
order to prevent actually using them (the docs say no), we assert in the
state setup code.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The assertion that a format is uncompressed in the multisample layouts
isn't quite right. What we really want to assert is that the format
supports multisampling which is a bit more complicated query. We also want
to assert that it has a block size of 1x1 since we do nothing with the
block size in the phys_level0_sa assignment.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Previous fundamental change in stats gathering added a temporary
SwrWaitForIdle to begin_query and end_query. Code has been reworked to
remove stall.
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
Create worker pool now computes number of worker threads based on
things like topologies, etc. and creates the pool but doesn't actually
launch the threads. Instead there is a separate start thread pool
function. This allows thread resources to be constructed first before
threads start.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
- Immediately sleep threads until thread data is initialized
- Fix some compile bugs with AR enabled
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
- Move most jitter functionality into SwrJit namespace
- Avoid global "using namespace llvm" in headers
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
gcc-4 and earlier don't allow compound literals where a constant
is required in -std=c99/gnu99 mode, so we can't use ISL_SWIZZLE()
when populating the anv_formats[] array. There are a few ways around
it: First one would be -std=c89/gnu89, but the rest of the code
depends on c99 so it's not really an option. The second option
would be to upgrade to gcc-5+ where the compiler behaviour was relaxed
a bit [1]. And the third option is just to avoid using compound
literals. I chose the last option since it keeps gcc-4 and earlier
working.
[1] https://gcc.gnu.org/gcc-5/porting_to.html
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fixes: 7ddb21708c ("intel/isl: Add an isl_swizzle structure and use it for isl_view swizzles")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fixes a crash in egl-create-msaa-pbuffer-surface Piglit test
and same crash in many dEQP EGL tests.
I also found that some Qt example did a workaround because of this
crash: https://bugreports.qt.io/browse/QTBUG-47509
v2: Ian pointed out that v1 removed support for all multisample
configs, including window ones. This one removes pbuffer bit
when adding configs, now only pbuffer+msaa gets rejected and
window+msaa continues to work. Fixed also comment (Emil)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
While the current CFG code is valid in the case where a switch break also
happens to be a loop continue, it's a bit suboptimal. Since hardware is
capable of handling the continue as a direct jump, it's better to use a
continue instruction when we can than to bother with all of the nasty
switch break lowering.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
It is possible that the break block of a switch is actually the continue of
the loop containing the switch. In this case, we need to identify the
break block as a continue and break out of current level of CFG handling.
If we don't, the continue portion of the loop will get handled twice, once
by following after the break and a second time by the loop handling code
handling it explicitly.
This fixes 6 of the new Vulkan CTS tests:
- dEQP-VK.spirv_assembly.instruction.graphics.opphi.out_of_order*
- dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order*
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
These structs will be written to disk as part of the shader cache
so use uint32_t just to be safe.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Almost all of the other drawing validation code is in api_validate.c
so put this function there as well.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Detect all of the CPUs in the system. Expose metrics
for min, max and current frequency in Hz.
Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Regression introduced by "gallium/radeon: zero all query buffers".
Cc: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is only needed for r600 which doesn't have ARB_query_buffer_object and
therefore wouldn't really need the fences, but let's be optimistic about
filling in this feature gap eventually.
Cc: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We don't plan to use sub-allocated buffers with UVD, but just in case one
slips through, this increases the chances of things working out anyway.
Reviewed-by: Christian König <christian.koenig@amd.com>
We don't plan to use sub-allocated buffers with VCE, but just in case one
slips through, this increases the chances of things working out anyway.
Reviewed-by: Christian König <christian.koenig@amd.com>
Implement support for power based sensors, reporting units in
milli-watts and watts.
Also, minor cleanup - change the related if block to a switch.
Tested with two different power sensors, including the nouveau
'power1' sensors on a GTX950 card.
Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commutativity is not allowed with SHLADD, but src2 can accept
loads. To allow the load propagation pass to do its job, add a
special case like for SUCLAMP because src1 is always an immediate.
This IMAD to SHLADD optimization helps a bunch of shaders from Tomb
Raider, Victor Vran, UE4 demos (+15% perf with Elemental) and Shadow
Warrior.
GF100/GK104:
total instructions in shared programs :2838045 -> 2834712 (-0.12%)
total gprs used in shared programs :396684 -> 396386 (-0.08%)
total local used in shared programs :34416 -> 34416 (0.00%)
local gpr inst bytes
helped 0 326 1105 1105
hurt 0 55 3 3
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Unfortunately, we can't use the emit helpers for GF100/GK110
because src1 and src2 are swapped.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This instruction is available since SM20 (Fermi) and allow to do
(a << b) + c in one shot. In some situations, IMAD should be
replaced by SHLADD when b is a power of 2, and ADD+SHL should be
replaced by SHLADD as well.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
envyas now uses a much better representation for those control
codes and it displays the different flags instead of an
unreadable hex number.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Most of the time, even the 512 bytes that we now get is more than sufficient
(pipeline stats queries are the largest at 184 bytes per shot).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We will support the waiting option in ARB_query_buffer_object using
WAIT_REG_MEM on an appropriate fence-like dword. Some queries conveniently
write their results with the highest bit set, and we can just use that;
for others, we have to write a fence explicitly.
ZPASS_DONE for occlusion queries writes its results with the high bit
set, but it writes up to 8 pairs of results (one for each DB). We have
to wait for all of these results, so let's just add an explicit fence.
The new function provides summary information to be used by subsequent
patches.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Save compute shader state that will be used for the ARB_query_buffer_object
implementation.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
These functions extract the pipe state structure from the current
descriptors, for state saving.
v2: correctly dereference *buf (Bas)
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There are driver-specific context flags for barriers that are not covered
by the Gallium barrier interfaces.
The R600 settings of these flags may not be optimal, but we're not going
to use them yet anyway.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
V8: Feedback based on peer review
convert if block into a switch
Constify some func args
V7: Increase precision when measuring lmsensors volts
Flatten patch series.
V6: Feedback based on peer review
Simplify sensor initialization (arg passing).
Constify some func args
V5: Feedback based on peer review
Convert sprintf to snprintf
Convert char * to const char *
int arg converted to bool
Func changes to take a filename vs a larger struct.
Omit the space between '*' and the param name.
V4: Merged with master as of 2016/9/27 6pm
V3: Flatten the entire patchset ready for the ML
V2: Additional seperate patches based on feedback
a) configure.ac: Add a comment related to libsensors
b) HUD: Disable Block/NIC I/O stats by default.
Implement configuration option --enable-gallium-extra-hud=yes
and enable both statistics when this option is enabled.
c) Configure.ac: Minor cleanup to user visible configuration settings
d) Configure.ac: HUD stats - build system improvements
Move the -lsensors out of a deeper Makefile, bring it into the configure.ac.
Also, rename a compiler directive to more closely follow the standard.
V1: Initial release to the ML
Three new features:
1. Disk/block I/O device read/write stats MB/ps.
2. Network Interface RX/TX transfer statistics as a percentage
of the overall NIC speed.
3. lmsensor power, voltage and temperature sensors.
The lmsensor changes makes a dependency on libsensors so support
for the change is opt out by default.
Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The code already skips doing the depth stall on gen >= 8, and as we
enable new platforms this assertion will fail needlessly. Instead of
changing the caller, make this simple change.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I found this in a shader that was doing an alpha test when alpha is fixed
at 1.0.
v2: Rebase on master (now the const value is "u32" not "u").
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Fixes lots of piglit tests crashing due to using uninitialized memory.
Fixes: ecd6fce261 ("mesa/st: support lowering multi-planar YUV")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This code provides for an on-disk cache of objects. Objects are stored
and retrieved via names that are arbitrary 20-byte sequences,
(intended to be SHA-1 hashes of something identifying for the
content). The directory used for the cache can be specified by means
of environment variables in the following priority order:
$MESA_GLSL_CACHE_DIR
$XDG_CACHE_HOME/mesa
<user-home-directory>/.cache/mesa
By default the cache will be limited to a maximum size of 1GB. The
environment variable:
$MESA_GLSL_CACHE_MAX_SIZE
can be set (at the time of GL context creation) to choose some other
size. This variable is a number that can optionally be followed by
'K', 'M', or 'G' to select a size in kilobytes, megabytes, or
gigabytes. By default, an unadorned value will be interpreted as
gigabytes.
The cache will be entirely disabled at runtime if the variable
MESA_GLSL_CACHE_DISABLE is set at the time of GL context creation.
Many thanks to Kristian Høgsberg <krh@bitplanet.net> for the initial
implementation of code that led to this patch. In particular, the idea
of using an mmapped file, (indexed by a portion of the SHA-1), for the
efficent implementation of cache_has_key was entirely his
idea. Kristian also provided some very helpful advice in discussions
regarding various race conditions to be avoided in this code.
Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Needed to successfully link llvmpipe or swr when using shared llvm libs
built with inteljitevents enabled.
v2: Make adding inteljitevents component global rather than just
llvmpipe/swr, since libgallium will have a symbol dependency.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Small buffers are now handled via the slabs code, so separate buckets in
pb_cache have become redundant.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Only enable for chips with GPUVM, because older driver paths do not take the
required offset into account.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Note the logic for adding fences is somewhat different than for amdgpu,
because radeon has no scheduler and we therefore have no guarantee about
the order in which submissions from multiple threads are processed.
(Ironically, this is only an issue when "multi-threaded submission" is
disabled, because "multi-threaded submission" actually means that all
submissions happen from a single thread that happens to be separate from
the application's threads. If we only supported "multi-threaded
submission", the fence handling could be simplified by adding the fences
in that thread where everything is serialized.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When a buffer is added to a CS without the SYNCHRONIZED usage flag, we now
no longer add a dependency on the buffer's fence(s).
However, we still need to add a fence to the buffer during flush, so that
cache reclaim works correctly (and in the hypothetical case that the buffer
is later added to a CS _with_ the SYNCHRONIZED flag).
It is now possible that the submissions refererring to a buffer are no longer
linearly ordered, and so we may have to keep multiple fences around. We keep
the fences in a FIFO. It should usually stay quite short (# of contexts * 2,
for gfx + dma rings).
While we're at it, extract amdgpu_add_fence_dependency for a single buffer,
which will make adding the distinction between real buffer and slab cases
easier.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When passed to winsys->buffer_create, this flag will indicate that we require
a buffer that maps 1:1 with a kernel buffer handle.
This is currently set for all textures, since textures can potentially be
exported to other processes. This is not a huge loss, since the main purpose
of this patch series is to deal with applications that allocate many small
buffers.
A hypothetical application with tons of tiny textures might still benefit
from not setting this flag, but that's not a use case I'm worried about
just now.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is really the behavior we want most of the time, but having a
SYNCHRONIZED flag instead of an UNSYNCHRONIZED one has the advantage that
OR'ing different flags together always results in stronger guarantees.
The parent BOs of sub-allocated buffers will be added unsynchronized.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is a simple framework for slab allocation from buffers that fits into
the buffer management scheme of the radeon and amdgpu winsyses where bufmgrs
aren't used.
The utility knows about different sized allocations and explicitly manages
reclaim of allocations that have pending fences. It manages all the free lists
but does not actually touch buffer objects directly, relying on callbacks for
that.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The implementation of i915_drm_buffer_get_handle now handles
DRM_API_HANDLE_TYPE_FD in the same way that intel_winsys_import_handle
does, by calling drm_intel_bo_gem_create_from_prime.
Tested by successfully running Chrome's ozone_demo [1] with the
ozone-gbm backend on an Intel Pineview M machine. Without this change
it fails while trying to create a DMA-BUF.
[1] https://chromium.googlesource.com/chromium/src.git/+/master/ui/ozone/demo/ozone_demo.cc
Signed-off-by: Nicholas Bishop <nbishop@neverware.com>
[Emil Velikov: Fix coding style]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Change dri2_query_image to check the return value of resource_get_handle
and return GL_FALSE if an error occurs.
For reference this is an example callstack that should propagate the
error back to the user:
i915_drm_buffer_get_handle
i915_texture_get_handle
u_resource_get_handle_vtbl
dri2_query_image
gbm_dri_bo_get_fd
gbm_bo_get_fd
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Nicholas Bishop <nbishop@neverware.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
[Emil Velikov: Split from larger patch, polish coding style, cc stable]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Change gbm_dri_bo_get_fd to check the return value of queryImage and
return -1 (an invalid file descriptor) if an error occurs.
Update the comment for gbm_bo_get_fd to return -1, since (apart from the
above) we've already return -1 on error.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Nicholas Bishop <nbishop@neverware.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
[Emil Velikov: Split from larger patch, polish coding style, cc stable]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
For now that's never since advanced blend hasn't been piped through.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
As part of the shader-cache work an upcoming change will add new
references to _mesa_add_parameter and _mesa_new_parameter_list from
the glsl code. To prepare for that, and to allow the standalone
glsl_compiler to still link, here we add mesa/program/prog_parameter.c
to the libglsl_util sources.
Then, in order to get *that* to work, we also add to stubs to
standalone_scaffolding:
_mesa_program_state_flags
_mesa_program_state_string
These functions aren't actually used by the two functions in
prog_parameter.c that we are actually calling. They are used in other
functions in the same file. So we don't care what the implementation
of these stubs is, (they won't be called by glsl_compiler). We just
need the stubs present so that it can link.
Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Support multi-planar YUV for external EGLImage's (currently just in the
dma-buf import path) by lowering to multiple texture fetch's for each
plane and CSC in shader.
There was some discussion of alternative approaches for tracking the
additional UV or U/V planes:
https://lists.freedesktop.org/archives/mesa-dev/2016-September/127832.html
They all seemed worse than pipe_resource::next
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fixed the way the values that span two Dwords are decoded.
Based on the start and end indices of the field, the Dwords
are fetched and decoded accordingly.
v2: rename dw to qw in gen_field_iterator_next
and remove extra white space (Anuj)
v3: change all instances of dw to qw (Anuj)
Earlier, 64-bit fields (such as most pointers on Gen8+)
weren't decoded correctly. gen_field_iterator_next seemed
to walk one DWord at a time, sets v.dw, and then passes it
to field(). So, even though field() takes a uint64_t, we're
passing it a uint32_t (which gets promoted, so the top 32
bits will always be zero). This seems pretty bogus... (Ken)
Signed-off-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This adds a new envvar called NV50_PROG_CHIPSET which allows to
compile shaders with a different target, especially useful for
shader-db.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Based off of Ilia's original patch, but with output values replicated so
that it matches the TGSI semantics.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This is needed to be consistent with other drivers.
Signed-off-by: Emilio Cobos Álvarez <me@emiliocobos.me>
Reviewed-by: Brian Paul <brianp@vmware.com>
Usually, there's no user-specified texture swizzle so we can optimize
the swizzle_swizzle() function and skip the loop/switch.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Some demos, like Heaven, were creating and destroying a large number
of sampler views because of a swizzle issue.
Basically, we compute the sampler view's swizzle by examining the
texture format, user swizzle, depth mode, etc. Later, during validation
we recompute that swizzle (in case something like depth mode changes)
and see if it matches the view's swizzle.
In the case of PIPE_FORMAT_RGTC2_UNORM, get_texture_format_swizzle
returned SWIZZLE_XYZW but the u_sampler_view_default_template() function
was setting the sampler view's swizzle to SWIZZLE_XY01. This mismatch
caused the validation step to always "fail" so we'd destroy the old
sampler view and create a new one.
By removing the conditional, the sampler view's swizzle and the computed
texture swizzle match and validation "passes". When creating a new sampler
view, we always want to use the texture swizzle which we just computed.
Fixes VMware issue 1733389.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
When we create a depth/stencil texture, also check if we can render to
it and set the PIPE_BIND_DEPTH_STENCIL flag. We were previously doing
this for color textures (PIPE_BIND_RENDER_TARGET).
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This may have been needed years ago during development, but not now.
Prevents some regressions after introducing the next patch.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
To handle z/layer fix-ups for blitting and copying. Note that we weren't
doing this properly in svga_blit() before.
Also, remove redundant stex, dtex assignments.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The util_is_format_compatible() function didn't quite do what we wanted
for vgpu10. This check is more flexible and allows copies between
formats such as R32G32B32A32_FLOAT and R32G32B32A32_INT.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
With latest mesa and latest piglit tests srgb<->linear conversion
is not required as per GL4.4 rules
See commit b662c70aea.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This reverts commit f5a6aab403.
This broke some tests. It seems gl_transform_feedback_info gets memset
to 0 so we were losing the values in BufferStride before we used them.
Now that we generate built-in functions inline, there's no need to link
against the built-in shader, and no built-in prototypes to consider.
This lets us delete a bunch of code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by; Ian Romanick <ian.d.romanick@intel.com>
In the past, we imported the prototypes of built-in functions, generated
calls to those, and waited until link time to resolve the calls and
import the actual code for the built-in functions.
This severely limited our compile-time optimization opportunities: even
trivial functions like dot() were represented as function calls. We
also had no way of reasoning about those calls; they could have been
1,000 line functions with side-effects for all we knew.
Practically all built-in functions are trivial translations to
ir_expression opcodes, so it makes sense to just generate those inline.
Since we eventually inline all functions anyway, we may as well just do
it for all built-in functions.
There's only one snag: built-in functions that refer to built-in global
variables need those remapped to the variables in the shader being
compiled, rather than the ones in the built-in shader. Currently,
ftransform() is the only function matching those criteria, so it seemed
easier to just make it a special case.
On Skylake:
total instructions in shared programs: 12023491 -> 12024010 (0.00%)
instructions in affected programs: 77595 -> 78114 (0.67%)
helped: 97
HURT: 309
total cycles in shared programs: 137239044 -> 137295498 (0.04%)
cycles in affected programs: 16714026 -> 16770480 (0.34%)
helped: 4663
HURT: 4923
while these statistics are in the wrong direction, the number of
hurt programs is small (309 / 41282 = 0.75%), and I don't think
anything can be done about it. A change like this significantly
alters the order in which optimizations are performed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by; Ian Romanick <ian.d.romanick@intel.com>
We want to check prior to optimization - otherwise we might fail to
detect cases where barrier() is in control flow which is always taken
(and therefore gets optimized away).
We don't currently loop unroll if there are function calls inside;
otherwise we might have a problem detecting barrier() in loops that
get unrolled as well.
Tapani's switch handling code adds a loop around switch statements, so
even with the mess of if ladders, we'll properly reject it.
Enforcing these rules at compile time makes more sense more sense than
link time. Doing it at ast-to-hir time (rather than as an IR pass)
allows us to emit an error message with proper line numbers.
(Otherwise, I would have preferred the IR pass...)
Fixes spec/arb_tessellation_shader/compiler/barrier-switch-always.tesc.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by; Ian Romanick <ian.d.romanick@intel.com>
It makes more sense to have this here where we store the other values
from xfb qualifiers. The struct it was previously part of is now only
used to store values that come from the api.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This reverts commit e66a2b879b.
Which breaks the scons build in an interesting way, particularly when
BlendBarrier and PrimitiveBoundingBox are added to static_data.py's
functions list. This seems to be related to the fact that the unsuffixed
names are only in GLES3.2, but Desktop GL only has suffixed versions.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
From the Vulkan spec:
Size is the number of bytes to fill, and must be either a multiple of 4,
or VK_WHOLE_SIZE to fill the range from offset to the end of the buffer.
If VK_WHOLE_SIZE is used and the remaining size of the buffer is not a
multiple of 4, then the nearest smaller multiple is used.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now that we have gen_device_info mutable, we can update its values and drop
all copies we had in brw_context.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Make gen_device_info a mutable structure so we can update the fields that
can be refined by querying the kernel (like subslices and EU numbers).
This patch does not make any functional change, it just makes
gen_get_device_info() fill a structure rather than returning a const
pointer.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This just up-converts them to doubles. Not great, but this is what all
the other variants also do.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The expectation is that drivers will set this based on
OES_geometry_shader and ARB_viewport_array support. This is a separate
enable on the same reasoning as for OES_texture_cube_map_array.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This file currently uses a mixture of 3 and 4 space indent. I have
changed it all to 4 space indent, matching the settings in
$ROOT/.editorconfig.
This was generated with sed:
sed -i -e 's@^ "@ "@g'
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
OpAtomicLoad/Store should have pointer to images just like the rest of the
atomic operators. These couple of lines were poorly copied from the
ssbo/shared_vars cases (the only ones currently tests by the CTS).
Fixes 2afb950161 ("spirv/nir: Add support for OpAtomicLoad/Store")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
VC4 was running into a major performance regression from enabling control
flow in the glmark2 conditionals test, because of short if statements
containing an ffract.
This pass seems like it was was trying to ensure that we only flattened
IFs that should be entirely a win by guaranteeing that there would be
fewer bcsels than there were MOVs otherwise. However, if the number of
ALU ops is small, we can avoid the overhead of branching (which itself
costs cycles) and still get a win, even if it means moving real
instructions out of the THEN/ELSE blocks.
For now, just turn on aggressive flattening on vc4. i965 will need some
tuning to avoid regressions. It does looks like this may be useful to
replace freedreno code.
Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47
fps to 95 fps on vc4.
vc4 shader-db:
total instructions in shared programs: 101282 -> 99543 (-1.72%)
instructions in affected programs: 17365 -> 15626 (-10.01%)
total uniforms in shared programs: 31295 -> 31172 (-0.39%)
uniforms in affected programs: 3580 -> 3457 (-3.44%)
total estimated cycles in shared programs: 225182 -> 223746 (-0.64%)
estimated cycles in affected programs: 26085 -> 24649 (-5.51%)
v2: Update shader-db output.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
From reading the code, it's not obvious what is src/dest compatible.
The list of a->b copy-compatible formats comes from Jose's original
check-in message, with some format name updates.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It's already advertised because the version.c extension checks are
fulfilled, but we didn't actually claim support, so trying to create
a ES 3.2 context would fail.
It's all done, and the CTS results look good, so let's turn it on.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We already support all of the decorations that require this capability.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This was something that I wrote in the early days of the spirv_to_nir code
but deleted once we had a real driver. However, in the absence of a
shader_runner equivalent, it's extremely useful for debugging the
spirv_to_nir code so let's bring it back.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The offset should not always be 0. For example, if EGLImage is
created from a 2D texture with EGL_GL_TEXTURE_LEVEL=1, then the
offset should be the actual start of miplevel 1 in bo.
v2: Add version check of __DRIimageExtension implementation
(Suggested by Axel Davy).
v3: Don't add version check of __DRIimageExtension implementation.
Set the offset only when queryImage() succeeds. (Suggested by Emil
Velikov)
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
[Emil Velikov: coding style fixes]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Not [originally] intended for upstream. Should cause a GPU hang if
some thread is executed with a non-contiguous dispatch mask breaking
assumptions of brw_stage_has_packed_dispatch(). Doesn't cause any
CTS, DEQP or Piglit regressions, while replacing
brw_stage_has_packed_dispatch() with a dummy implementation that
unconditionally returns true on top of this patch causes multiple GPU
hangs.
v2: Refactor into a separate function instead of emitting the test
code directly from emit_nir_code(), drop VEC4 test and clean up
slightly for upstream. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This avoids emitting a few extra instructions required to take the
dispatch mask into account when it's known to be tightly packed.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The eliminate_find_live_channel optimization eliminates
FIND_LIVE_CHANNEL instructions in cases where control flow is known to
be uniform, and replaces them with 'MOV 0', which in turn unblocks
subsequent elimination of the BROADCAST instruction frequently used on
the result of FIND_LIVE_CHANNEL. This is however not correct in
per-sample fragment shader dispatch because the PSD can dispatch a
fully unlit sample under certain conditions. Disable the optimization
in that case.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Add devinfo argument to brw_stage_has_packed_dispatch() to
implement hardware generation check.
On at least Sky Lake, ce0 does not contain the full story as far as enabled
channels goes. It is possible to have completely disabled channels where
the corresponding bits in ce0 are 1. In order to get the correct execution
mask, you have to mask off those channels which were disabled from the
beginning by taking the AND of ce0 with either sr0.2 or sr0.3 depending on
the shader stage. Failure to do so can result in FIND_LIVE_CHANNEL
returning a completely dead channel.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: Francisco Jerez <currojerez@riseup.net>
[ Francisco Jerez: Fix a couple of typos, add mask register type
assertion, clarify reason why ce0 can have bits set for disabled
channels, clarify that this may only be a problem when thread
dispatch doesn't pack channels tightly in the SIMD thread. Apply
same treatment to Align16 path. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The state register sr0 is really a collection of dwords not a SIMD8
anything. It's much more convenient for brw_sr0_reg to return the
particular dword you're looking for rather than a giant blob you have to
massage into what you want.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
[ Francisco Jerez: Trivial simplification of brw_ud1_reg(). ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This enables 64-bit integer support in gallivm and
llvmpipe.
v2: add conversion opcodes.
v3:
- PIPE_CAP_INT64 is not there yet
- restrict DIV/MOD defaults to the CPU, as for 32 bits
- TGSI_OPCODE_I2U64 becomes TGSI_OPCODE_U2I64
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This adds all the opcodes to tgsi_exec for softpipe to use.
v2: add conversion opcodes.
v3:
- no PIPE_CAP_INT64 yet
- change TGSI_OPCODE_I2U64 to TGSI_OPCODE_U2I64
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This just adds the basic support for 64-bit opcodes,
and the new types.
v2: add conversion opcodes.
add documentation.
v3:
- make docs more consistent
- change TGSI_OPCODE_I2U64 to TGSI_OPCODE_U2I64
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
"intelScreen" is wordy and also doesn't fit our style guidelines.
"screen" is shorter, which is nice, because we use it fairly often.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I want to use "screen" as the variable name for a struct intel_screen
pointer. This means that we can't use it for __DRIscreen pointers.
Sometimes we called it "screen", sometimes "sPriv", sometimes
"driScrnPriv", and sometimes "psp" (Pointer to Screen Private?).
The last one is particularly confusing because we use "psp" to refer to
the Gen4 PIPELINED_STATE_POINTERS packet as well.
Let's be consistent. "dri_screen" is clear, and it's not used often
enough that I'm worried about the verbosity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This extension is a combination of AMD_vertex_shader_viewport_index and
AMD_vertex_shader_layer, making it rather trivial to implement.
For gallium I *think* this needs a new cap because of the addition of
support in tessellation evaluation shaders, and since I don't have any
hardware to test it on, I've left that for someone else to wire up.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
so that the texture is rendered to back buffer before calling
flush_frontbuffer and can be copied to a different buffer in
the function
v2: change comment style
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
so that the texture is rendered to back buffer before calling
flush_frontbuffer and can be copied to a different buffer in
the function
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
In case of prime when rendering is done on GPU other then the
server GPU, use a seprate linear buffer for each back buffer
which will be displayed using present extension.
v2: Use a seprate linear buffer for each back buffer (Michel)
v3: Change variable names and fix coding style (Leo and Emil)
v4: Use PIPE_BIND_SAMPLER_VIEW for back buffer in case when
a seprate linear buffer is used (Michel)
v4.1: remove empty line
v4.2: destroy the context and handle the case when
create_context fails (Emil)
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
This function was quite similar to nvc0_stage_set_sampler_views()
and I don't see any reasons to not remove it.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This helps shaders in UE4 demos, especially with Elemental
(+1% perf). This optimization reduces spilling usage in one
shader which explains the little gain.
GF100/GK104:
total instructions in shared programs :2838551 -> 2838045 (-0.02%)
total gprs used in shared programs :396706 -> 396684 (-0.01%)
total local used in shared programs :34432 -> 34416 (-0.05%)
local gpr inst bytes
helped 1 19 112 112
hurt 0 0 0 0
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
A couple of forward-declarations were causing warnings in clang:
'value' defined as a class here but previously declared as a struct
[-Wmismatched-tags]
Signed-off-by: Martina Kollarova <martina.kollarova@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Given robust access, we should just be returning zeroes if the user gives
us a base pointer that's too big, which is what was happens on a release
build. This was caught by a webgl conformance test for out-of-bounds
draws on servo.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes the following piglit test (for softpipe):
/spec/glsl-1.10/execution/fs-loop-return
Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch relaxes the restriction of compressed formats for texture
upload buffer. For now, 3D texture with compressed format
is still not supported in the texture upload buffer path.
As Brian noted, ETQW does many texture updates with glCompressedTexSubImage.
This patch greatly improves the performance of the ETQW trace.
Tested with ETQW, MTT piglit, glretrace, conform, viewperf
v2: Per Brian's suggestion, removed the subregion boundary check.
Reviewed-by: Brian Paul <brianp@vmware.com>
This reduces the number of times we flush in some situations (the
arbocclude demo is one trivial example).
Tested with Piglit, ETQW, Sauerbraten, arbocclude.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Since commit 99d8fe20ab we don't have to flush the command buffer when
we end a query.
Tested with Piglit, Sauerbraten, arbocclude, ETQW (noticably faster now).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
With this patch, when running with vgpu10, instead of mapping directly to the
guest backed memory for texture update, we'll use the texture upload buffer
and use the transfer from buffer command to update the host side texture memory.
This optimization yields about 20% performance improvement with
Lightsmark2008 and about 40% with Tropics.
Tested with Lightsmark2008, Tropics, Heaven, MTT piglit, glretrace, conform.
Reviewed-by: Brian Paul <brianp@vmware.com>
Split the functions into separate functions for dma and direct map to make
the code more readable.
Tested with MTT piglit, glretrace, viewperf, conform, various OpenGL apps
Reviewed-by: Brian Paul <brianp@vmware.com>
With this patch, single sample surface will be created as non-multisamples
surface.
Tested with piglit, glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch fixes a memory leak with sampler state when piglit
is run with HW version 11. Sampler state clean up was incorrectly skipped
in svga_cleanup_sampler_state() for vgpu9.
Tested with piglit.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Left over test code spotted by Sinclair.
Tested with piglit, Google Earth, Lightsmark, Heaven4, glretraces, etc.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Use SVGA3D_QUERYTYPE_MAX instead of SVGA_QUERY_MAX for
svga query type check.
Tested with various OpenGL apps with GALLIUM_HUD set.
Reviewed-by: Brian Paul <brianp@vmware.com>
Replace the num-resources-mapped hud with
num-textures-mapped and num-buffers-mapped, so we can
differentiate the map counts for these two different resources.
Reviewed-by: Brian Paul <brianp@vmware.com>
Some OpenGL apps, like Cinebench R15, have many glDrawElements(GL_QUADS)
calls. Since we don't directly support quads we have to convert these
calls into GL_TRIANGLES which involves generating a new index buffer.
This patch saves the new/translated index buffer in the hope that it
can be reused for a later draw call.
Cinebench R15 increases by about 20% with this change.
The NobelClinician Viewer app also hits this code.
Tested with full piglit run.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
If a consecutive sequence of drawing commands references the same
vertex/index buffers, there should be no need to rebind the surfaces
for the second and subsequent drawing commands.
Apps that use multiple display lists benefit from this since the vertex
data for several display lists is often stored in one buffer.
In the case of the legacy E&S Glaze demo, this reduces the size of our
command buffers from 91KB to 44KB. One WSI Fusion trace shows a 33%
reduction in command buffer sizes.
Tested with full piglit run.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Previously, every time we put shader constants into the default constant
buffer we called u_upload_alloc(), which mapped the buffer, and
u_upload_unmap(). We had to unmap the buffer before calling
svga_buffer_handle() to get the winsys handle for the buffer. But we
really only need to do that the first time we reference the const buffer.
Now we try to keep the upload manager's buffer mapped until we fill it or
flush the command buffer.
v2: add additional comment on the buffer unmapping code in
svga_context_flush(), per Charmaine.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
When we migrate a buffer from sw/malloc storage to a hardware buffer,
don't memcpy the whole buffer, just copy the part we've written to.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
PredCopyRegion support copy between same type of textures.
Instead of comparing src and dst pipe texture type, compare svga texture
type which can avoid some software fallback.
for example, it avoids a software blit with the Redway3D Aston demo.
Tested piglit tests on VGPU9 and VGPU10 on GL/DX11Renderer, Redway3D Aston demo
v2: some nit pick suggested by Charmaine.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This function returns svga texture type for corresponding pipe texture.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The comment for the commutative flags was wrong because OP_MUL is
before OP_MAD. While we are at it add missing opcodes, and fix
the comment about the short forms.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
This needs to be above the switch on API, as that can return true
(valid to render) before this error check even had a chance to run.
Fixes ESEXT-CTS.draw_elements_base_vertex_tests.invalid_mapped_bos,
which worked before commit 72f1566f90.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
This patch switches non-TGSI compute shaders over to using the HSA
ABI described here:
https://github.com/RadeonOpenCompute/ROCm-Docs/blob/master/AMDGPU-ABI.md
The HSA ABI provides a much cleaner interface for compute shaders and allows
us to share more code in the compiler with the HSA stack.
The main changes in this patch are:
- We now pass the scratch buffer resource into the shader via user sgprs
rather than using relocations.
- Grid/Block sizes are now passed to the shader via the dispatch packet
rather than at the beginning of the kernel arguments.
Typically for HSA, the CP firmware will create the dispatch packet and set
up the user sgprs automatically. However, in Mesa we let the driver do
this work. The main reason for this is that I haven't researched how to
get the CP to do all these things, and I'm not sure if it is supported
for all GPUs.
v2:
- Add comments explaining why we are setting certain bits of the scratch
resource descriptor.
v3:
- Use amdgcn-mesa-mesa3d triple instead of amdgcn--mesa3d.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This fixes 8 fs-interpolateat* piglit crashes on radeonsi, because it can't
handle non-input operands in interpolateAt*.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This fix getting the size of a struct arg. vec3 types still work ok.
Only buit-in args need to have power of two alignment, getTypeAllocSize
reports the correct size in all cases.
Acked-by: Francisco Jerez <currojerez@riseup.net>
The gallium interface defines these like DX10. Note that OpenGL ignores
these options if MSAA is disabled or the dest buffer doesn't support
MSAA.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Regardless of whether GL_MULTISAMPLE is enabled (it's enabled by default)
we should not set the alpha_to_coverage or alpha_to_one flags if the
current drawing buffer does not do MSAA.
This fixes the new piglit gl-1.3-alpha_to_coverage_nop test.
ETQW is a game that enables GL_SAMPLE_ALPHA_TO_COVERAGE without MSAA.
Shrubs along the side of roads were invisible because fragments with
alpha < 0.5 were being discarded (zero coverage).
v2: remove ctx->DrawBuffer != NULL check.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This adds support for the input attachments subpass type
to the SPIRV->NIR pass.
v1.1: drop handling from vtn_handle_texture
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
SPIR-V/Vulkan have a special image type for input attachments
called the subpass type. It has different characteristics than
other images types.
The main one being it can only be an input image to fragment
shaders and loads from it are relative to the frag coord.
This adds support for it to the GLSL types. Unfortunately
we've run out of space in the sampler dim in types, so we
need to use another bit.
v2: Fixup subpass input name (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Gen6 only has one additional restriction over Gen7+, so we just add it
to the existing gen7 function (which actually covers later gens too).
This should stop FINISHME spew when running GL on Sandybridge.
v2: Fix bytes per block vs. bits per block confusion (Jason) and
rename function to gen6_filter_tiling (Jason and Chad).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Note that ASTC support is not actually mandated for this extension to be
exposed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
From the ARB_gpu_shader5 spec:
The built-in functions interpolateAtCentroid() and interpolateAtSample()
will sample variables as though they were declared with the "centroid"
or "sample" qualifiers, respectively.
When running with persample dispatch forced by the API, we interpolate
anything that isn't flat as if it's qualified by "sample". In order to
keep interpolateAtCentroid() consistent with the "centroid" qualifier, we
need to make interpolateAtCentroid() do sample interpolation instead.
Nothing in the GLSL spec guarantees that the result of
interpolateAtCentroid is uniform across samples in any way, so this is a
perfectly fine thing to do.
Fixes 8 of the new dEQP-VK.pipeline.multisample_interpolation.* Vulkan CTS
tests that specifically validate consistency between the "sample" qualifier
and interpolateAtSample()
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Some apps issue redundant glLoadMatrixf() calls with the same matrix.
Try to avoid setting dirty state in that situation.
This reduces the number of constant buffer updates by about half in
ET Quake Wars.
Tested with Piglit, ETQW, Sauerbraten, Google Earth, etc.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Structurally, this is very similar to the existing Apple-DRI code, except I
have chosen to implement this using the __GLXDRIdisplay, etc. vtables (as
suggested originally in [1]), rather than a maze of ifdefs. This also means
that LIBGL_ALWAYS_SOFTWARE and LIBGL_ALWAYS_INDIRECT work as expected.
[1] https://lists.freedesktop.org/archives/mesa-dev/2010-May/000756.html
This adds:
* the Windows-DRI extension protocol headers and the windowsdriproto.pc
file, for use in building the Windows-DRI extension for the X server
* a Windows-DRI extension helper client library
* a Windows-specific DRI implementation for GLX clients
The server is queried for Windows-DRI extension support on the screen before
using it (to detect the case where WGL is disabled or can't be activated).
The server is queried for fbconfigID to pixelformatindex mapping, which is
used to augment glx_config.
The server is queried for a native handle for the drawable (which is of a
different type for windows, pixmaps and pbuffers), which is used to augment
__GLXDRIdrawable.
Various GLX extensions are enabled depending on if the equivalent WGL
extension is available.
This is supposed to be exposed with the GL_KHR_robustness extension,
which we support on ES 2.0 and later. On desktop GL, it's also exposed
by GL_ARB_robustness, which is supported by all drivers ("dummy_true").
so we also allow desktop GL.
Fixes:
- ES32-CTS.robust.robustness.noResetNotification
- ES32-CTS.robust.robustness.loseContextOnReset
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Without this bit set, the value in "L3 Atomic Disable" won't get applied by
the hardware so we won't properly get L3 atomic caching.
Fixes dEQP-VK.spirv_assembly.instruction.compute.opatomic.compex and 198 of
the dEQP-VK.image.atomic_operations.* tests on HSW
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The Vulkan driver sets 3DSTATE_DRAWING_RECTANGLE once to MAX_INT x MAX_INT
at the GPU initialization time and never sets it again. The GL driver sets
it every time the framebuffer changes. Originally, blorp set it to the
size of the drawing area but meant we had to set it back in the Vulkan
driver. Instead, we can easily just do that in the GL driver's blorp_exec
implementation and not set it in blorp core.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously, we relied on a driver hook for 3DSTATE_MULTISAMPLE. However,
now that Vulkan and GL use the same sample positions, we can set up
3DSTATE_MULTISAMPLE directly in blorp and delete the driver hook.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Non-16B-aligned pull constant loads are unlikely to be particularly
useful given that you can get roughly the same effect by using
swizzles on the result.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
It might be useful to actually handle this once copy propagation
becomes smarter about register-misaligned offsets.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
A better fix would be to do something along the lines of the FS
back-end spilling code and emit a scratch read before any instruction
that overwrites the register to spill partially due to a non-zero
sub-register offset. In the meantime mark registers used with a
non-zero sub-register offset as no-spill to prevent the spilling code
from miscompiling the program.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This prevents it from trying to propagate a copy through a
register-misaligned region. MOV instructions with a misaligned
destination shouldn't be treated as a direct GRF copy, because they
only define the destination GRFs partially. Also fix the interference
check implemented with is_channel_updated() to consider overlapping
regions with different register offset to interfere, since the
writemask check implemented in the function is only valid under the
assumption that the source and destination regions are aligned
component by component.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cmod propagation would misoptimize the program if the destination
offset of the generating instruction wasn't exactly the same as the
source region offset of the copy instruction. In preparation for
adding support for sub-GRF offsets to the VEC4 IR.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Because the pass already checks that the destination offset of each
'scan_inst' that needs to be rewritten matches 'inst->src[0].offset'
exactly, the final offset of the rewritten instruction is just the
original destination offset of the copy. This is in preparation for
adding support for sub-GRF offsets to the VEC4 IR.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
For simplicity just assume that two writes to the same GRF with
different sub-GRF offsets will potentially interfere and break the
dependency control chain. This is in preparation for adding sub-GRF
offset support to the VEC4 IR.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This should also have the side effect of fixing convert_to_hw_regs()
to handle sub-GRF register offsets.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The offset printing code in fs_visitor::dump_instruction() was doing
things differently for sources and destinations and for each register
file -- In some cases it would be added to the base register number
fs_reg::nr, in other cases it would follow the base register separated
with a plus sign, in other cases (uniforms) it would do both (!). The
sub-register offset was also being printed or not rather
inconsistently. Fix it.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Get rid of some leftover redundant arithmetic introduced during the
conversion to byte offsets and sizes that can be simplified easily.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
component() was generally a better alternative because of several
issues set_smear() had:
- It wouldn't take the original stride and offset of the register
into account, which means that set_smear() on the result of
e.g. another set_smear() call or an offset() call would give a
bogus region as result.
- It was an inherently destructive operation. See the
'nir_intrinsic_shader_clock' hunk below for how this could lead to
subtle bugs in cases where set_smear() was called multiple times on
the same register like 'r.set_smear(0), r.set_smear(1)' with the
expectation that each call would return a separate value instead of
a reference to the same subsequently mutated object.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Also changed the argument names since 'src' and 'dst' don't make that
much sense outside of the context of copy propagation.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This makes the function less annoying to use and more accurate -- We
shouldn't propagate a copy into a register region that wasn't fully
contained in the destination of the copy (IOW, a source region that
wasn't fully defined by the copy) just because the number of registers
written and read by each instruction happened to get rounded up to the
same GRF multiple.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Using component_size() is easier and generally more correct because it
takes into account the register type and stride for you.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
These were bashing the 'offset' and 'stride' values of several
registers without taking the previous value into account, which
probably didn't matter in practice for optimize_frontfacing_ternary()
because the 'tmp' register already had a known region, but it would
have given the wrong region as result in the other cases in
lower_integer_multiplication(). subscript(..., i) is a more
straightforward way to take the i-th field of a given type from each
channel of a register which should give the right answer as result
regardless of the original 'offset' and 'stride' parameters of the
register region.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
In the most common case this can now be implemented as a simple
addition because the offset is already encoded as a single scalar
value in bytes.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The 'scan_inst->dst.offset % REG_SIZE' term in the final
'scan_inst->dst.offset' calculation is obviously bogus. The offset
from the start of the copy destination register 'inst->dst' where the
destination of the generating instruction 'scan_inst' would be written
to (before compute-to-mrf runs) is just the offset of 'scan_inst->dst'
relative to the source of the copy instruction (AKA rel_offset in the
code below).
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This was dropping 'inst->dst.offset' on the floor. Nothing in the
code above seems to guarantee that it's zero and in that case the
offset of the register being coalesced into wouldn't be taken into
account while rewriting the generating instruction.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This makes sure that overlap checks are done correctly throughout the
back-end when the '*this' register starts before the register/size
pair provided as argument, and is actually less annoying to use than
in_range() at this point since regions_overlap() takes its size
arguments in bytes.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This is copy-pasted almost line by line from the FS back-end. The
only reason it cannot be implemented in terms of backend_reg is that
the backend_reg::nr field doesn't have the same meaning for uniforms
on both back-ends. It could be easily deduplicated by using a
template function.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Its only use left in the FS back-end should be using regions_overlap()
instead to avoid getting a false negative result in cases where source
and destination overlap but the former starts before the latter in the
VGRF file.
v2: Put back lost components factor (Iago).
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
fs_inst::overwrites_reg is rather easy to misuse because it cannot
tell how large the register region starting at 'reg' is, so in cases
where the destination region starts after 'reg' it may give a
misleading result. regions_overlap() is somewhat more verbose to use
but handles arbitrary overlap correctly so it should generally be used
instead.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
is_nop_mov() was broken for LOAD_PAYLOAD instructions in two ways: On
the one hand the original destination register offset wasn't being
taken into account which would give incorrect results if it was
already non-zero, and on the other hand all source registers were
being treated as if they had a size of 32B, which is almost never the
case in SIMD16 programs for non-header sources.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The previous overlap condition only made sure that the VGRF numbers or
GRF-aligned offsets were different without taking the amount of data
written and read by the instruction into consideration. Use the
regions_overlap() helper instead.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This could potentially have misoptimized a program in cases where
inst->src[0] had a non-zero sub-GRF offset.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Unlike the FS counterpart of this commit this was likely not (yet) a
bug, but let's fix it already in preparation for implementing support
for sub-GRF offsets in the VEC4 back-end.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
There was a workaround for this in fs_inst::size_read() for the
SHADER_OPCODE_MOV_INDIRECT instruction and FIXED_GRF register file
*only*. We should take this possibility into account for the sources
and destinations of all instructions on all optimization passes that
need to quantize dataflow in 32B increments by adding the amount of
misalignment to the size read or written from the regs_read() and
regs_written() helpers respectively.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This fixes regs_written() and regs_read() to return a more accurate
value when the padding left between components due to a stride value
greater than one causes the region bounds given by size_written or
size_read to overflow into the next register. This could become a
problem in optimization passes that keep track of dataflow using
fixed-size arrays with register granularity, because the overflow
register (not actually accessed by the region) may not have been
allocated at all which could lead to undefined memory access.
An alternative to this would be to subtract the trailing padding
already during the calculation of fs_inst::size_read and
::size_written, but that would break code that currently assumes that
::size_read and _written are whole multiples of the component size,
and would be hard to maintain looking forward because size_written is
assigned from a bunch of different places.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The LINTERP virtual instruction only reads three scalar components
from the first 16B of the second source, we can now teach size_read()
about it since its return value is represented with byte granularity.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The previous regs_read value can be recovered by rewriting each
reference of regs_read() like 'x = i.regs_read(j)' to 'x =
DIV_ROUND_UP(i.size_read(j), reg_unit)'.
For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The previous regs_read value can be recovered by rewriting each
reference of regs_read() like 'x = i.regs_read(j)' to 'x =
DIV_ROUND_UP(i.size_read(j), reg_unit)'.
For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The previous regs_written field can be recovered by rewriting each
rvalue reference of regs_written like 'x = i.regs_written' to 'x =
DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference
like 'i.regs_written = x' to 'i.size_written = x * reg_unit'.
For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The previous regs_written field can be recovered by rewriting each
rvalue reference of regs_written like 'x = i.regs_written' to 'x =
DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference
like 'i.regs_written = x' to 'i.size_written = x * reg_unit'.
For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This is in preparation for dropping vec4_instruction::regs_read and
::regs_written in favor of more accurate alternatives expressed in
byte units. The main reason these wrappers are useful is that a
number of optimization passes implement dataflow analysis with
register granularity, so these helpers will come in handy once we've
switched register offsets and sizes to the byte representation. The
wrapper functions will also make sure that GRF misalignment (currently
neglected by most of the back-end) is taken into account correctly in
the calculation of regs_read and regs_written.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This is in preparation for dropping fs_inst::regs_read and
::regs_written in favor of more accurate alternatives expressed in
byte units. The main reason these wrappers are useful is that a
number of optimization passes implement dataflow analysis with
register granularity, so these helpers will come in handy once we've
switched register offsets and sizes to the byte representation. The
wrapper functions will also make sure that GRF misalignment (currently
neglected by most of the back-end) is taken into account correctly in
the calculation of regs_read and regs_written.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The fs_reg::subreg_offset and ::offset fields are now redundant, the
sub-GRF offset can just be added to the single ::offset field
expressed in byte units. The current subreg_offset value can be
recovered by applying the following rule: Replace each rvalue
reference of subreg_offset like 'x = r.subreg_offset' with 'x =
r.offset % reg_unit', and each lvalue reference like 'r.subreg_offset
= x' with 'r.offset = ROUND_DOWN_TO(r.offset, reg_unit) + x'.
For the same reason as in the previous patches, this doesn't attempt
to be particularly clever about simplifying the result in the interest
of keeping the rather lengthy patch as obvious as possible. I'll come
back later to clean up any ugliness introduced here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The dst/src_reg::offset field in byte units introduced in the previous
patch is a more straightforward alternative to an offset
representation split between ::reg_offset and ::subreg_offset fields.
The split representation makes it too easy to forget about one of the
offsets while dealing with the other, which has led to multiple FS
back-end bugs in the past. To make the matter worse the unit
reg_offset was expressed in was rather inconsistent, for uniforms it
would be expressed in either 4B or 16B units depending on the
back-end, and for most other things it would be expressed in 32B
units.
This encodes reg_offset as a new offset field expressed consistently
in byte units. Each rvalue reference of reg_offset in existing code
like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and
each lvalue reference like 'r.reg_offset = x' is rewritten to
'r.offset = r.offset % reg_unit + x * reg_unit'.
Because the change affects a lot of places and is rather non-trivial
to verify due to the inconsistent value of reg_unit, I've tried to
avoid making any additional changes other than applying the rewrite
rule above in order to keep the patch as simple as possible, sometimes
at the cost of introducing obvious stupidity (e.g. algebraic
expressions that could be simplified given some knowledge of the
context) -- I'll clean those up later on in a second pass.
v2: Fix division by the wrong reg_unit in the UNIFORM case of
convert_to_hw_regs(). (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The fs_reg::offset field in byte units introduced in this patch is a
more straightforward alternative to the current register offset
representation split between fs_reg::reg_offset and ::subreg_offset.
The split representation makes it too easy to forget about one of the
offsets while dealing with the other, which has led to multiple
back-end bugs in the past. To make the matter worse the unit
reg_offset was expressed in was rather inconsistent, for uniforms it
would be expressed in either 4B or 16B units depending on the
back-end, and for most other things it would be expressed in 32B
units.
This encodes reg_offset as a new offset field expressed consistently
in byte units. Each rvalue reference of reg_offset in existing code
like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and
each lvalue reference like 'r.reg_offset = x' is rewritten to
'r.offset = r.offset % reg_unit + x * reg_unit'.
Because the change affects a lot of places and is rather non-trivial
to verify due to the inconsistent value of reg_unit, I've tried to
avoid making any additional changes other than applying the rewrite
rule above in order to keep the patch as simple as possible, sometimes
at the cost of introducing obvious stupidity (e.g. algebraic
expressions that could be simplified given some knowledge of the
context) -- I'll clean those up later on in a second pass.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
AEP requires ASTC, which is currently only enabled on Skylake and later.
(It may be possible to extend this to Cherryview/Braswell in the future,
but earlier hardware doesn't have ASTC support.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Numeric 2 is actually GLSL_SAMPLER_DIM_3D, which I don't think is what
was intended.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I want to re-use this in a different pass, so move to nir.h
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This moves the native pixmap fixup to a helper function so we don't
repeat ourselves.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This moves the native window fixup to a helper function so we don't
repeat ourselves.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Updated eglext.h to revision 33111 from the Khronos repository.
v2:
- Don't (re)move extension includes from eglext.h (Emil Velikov)
- Bump to revision 33111 (Adam Jackson)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Track rendering to each FBO independently and flush rendering only when
necessary. This lets us avoid the overhead of storing and loading the
frame when an application momentarily switches to rendering to some other
texture in order to continue rendering the main scene.
Improves glmark -b desktop:effect=shadow:windows=4 by 27%
Improves glmark -b
desktop:blur-radius=5:effect=blur:passes=1:separable=true:windows=4
by 17%
While I haven't tested other apps, this should help X rendering a lot, and
I've heard GLBenchmark needed it too.
This is done in vc4_flush currently, but I'm going to make the job always
track the surfaces it might be rendering to instead of putting in the
destinations at flush time.
It's really just an upgrade to attempting WHOLE_RESOURCE. Pulling the
logic out caught two bugs in it: We would try to do so on cubemaps (even
though we're only mapping 1 of the 6 slices), and we would break
persistent coherent mappings by trying to reallocate when we shouldn't.
The clear of Z or stencil will end up clearing the other as well, instead
of masking. There's no way around this that I know of, so if we are
clearing just one then we need to draw a quad.
Fixes a regression in the job-shuffling code, where the clear values move
to the job and don't just have the last clear's value laying around when
you do glClear(DEPTH) and then glClear(STENCIL) separately
(ext_framebuffer_multisample-clear 4 depth)).
This causes regressions in ext_framebuffer_multisample/multisample-blit
depth and ext_framebuffer_multisample/no-color depth, but these were
formerly false positives due to the reference image also being black. Now
the reference and test images are both being drawn, and it looks like
there's an incorrect resolve of depth during blitting to an MSAA FBO.
This also exposes them for ARB_ES3_2_compatibility.
While both specs refer to the new MS line width parameters being
separate from the existing AA line widths, reality begs to differ. It's
the same on all hardware currently supported by mesa. Should hardware
come along that wants these to be different, they're easy enough to
separate out.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Earlier, the loop pretends to loop over instructions from "start" to "end",
but the callers always pass 8192 for end, which is some huge bogus
value. The real loop termination condition is send-with-EOT or 0. (Ken)
Signed-off-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Skylake adds new SENDS and SENDSC opcodes, which should be
handled in the send-with-EOT check. Make an is_send() helper
that checks if the opcode is SEND/SENDC/SENDS/SENDSC (Ken)
v2: Make is_send() much more crispier, Mix declaration and
code to make the code compact (Ken)
Signed-off-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Remove the float/dword union and use the iter->p[f->start / 32]
directly as printf formatter %08x expects uint32_t (Ken)
v2: Make the cleanup much more crispier (Ken)
Signed-off-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since Vulkan doesn't allow single-slice 3D storage images, we need to just
set the base_array_layer and array_len to the full size of the 3-D LOD.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
This reverts commit 3943888c94. It turns out
that commit was pretty-much bogus since it breaks binding a 3-D texture as a
2-D storage image. The correct fix for the Vulkan CTS tests needs to be in
the Vulkan driver itself rather than ISL.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Everything that we were once using the blit2d framework for is now done
with blorp.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This is a lot cleaner and easier to read than the old piles of if
statements.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The original blorp_alloc_binding_table helper was supposed to return the
binding table offset and map along with the surface state maps. This isn't
quite what we want, however. What we really want is the binding table
offsets, surface state offsets, and surface state maps. In the GL driver,
the binding table map *is* an array of surface state offsets. However, in
Vulkan, this isn't quite true as the entries in the binding table are
surface state offsets combined with another binding table block offset.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
radeonsi depends on the interp flags a little bit too much.
This fixes 9 randomly failing tests:
GL45-CTS.shader_multisample_interpolation.render.interpolate_at_centroid.*
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Unify the clear and copy paths, clean up the definitions.
It looks more like a rework. It's a preparation for GDS support,
which might or might not come.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If si_sampler_view_add_buffer ends up flushing, then the code
in begin_new_cs would previously have added the buffer(s) for
whatever was previously bound to that slot. Now it would add only
the new buffer.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Heaven and Valley write gl_SampleMask and not Z.
Use 16_ABGR instead of 32_ABGR if Z isn't written.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Fixes segfaults in EG compute since:
commit 21de3be8e6
radeonsi: fix texture format reinterpretation with DCC
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fix VAAPI YV12/I420 convert to NV12 U/V reversal.
Input order is YVU when this is called.
Signed-off-by: Andy Furniss <adf.lists@gmail.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Previously, we were relying on the fact that VALGRIND_MEMPOOL_FREE came
later on in the function to prevent "link->bo = bo" from causing an invalid
write. However, in the case where the size requested by the user is very
small (less than sizeof(struct anv_bo)), this isn't sufficient. Instead,
we should call VALGRIND_MEMPOOL_FREE early and then use VG_NOACCESS_WRITE.
We do, however, have to call VALGRIND_MEMPOOL_FREE after reading bo_in
because it may be stored in the bo itself.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
The time we want to restrict the Z range of a 3-D surface is when rendering
to it. For storage surfaces, we always want he full range. However, we
still need to set MinimumArrayElement and RenderTargetViewExtent to
sensible values so we'll just set them to the reasonable defaults we used
before we started respecting the base_array_layer and array_len.
This fixes a bunch of Vulkan CTS regressions caused by 48f195d7c6.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97790
Reviewed-by: Chad Versace <chadversary@chromium.org>
Normally, using a non-linear tiling format helps improve cache locality by
ensuring that neighboring pixels are usually close-by in memory. For RGB
formats, this still sort-of holds, but it can also lead to rather terrible
memory access patterns where a single RGB pixel value crosses a tile
boundary and gets split into two pieces in different 4K pages. It also
makes for some rather awkward calculations because your tile size is no
longer an even multiple of surface element size. For these reasons, we
chose to simply never create tiled RGB images in the Vulkan driver.
The GL driver, however, is not so kind so we need to support it somehow. I
briefly toyed with a couple of different schemes but this is the best one I
could come up with. The fundamental problem is that a tile no longer
contains an integer number of surface elements. I briefly considered a
couple other options but found them wanting:
1) Using floats for the logical tile size. This leads to potential
rounding error problems.
2) When presented with a RGB format, just make the tile 3-times as wide.
This isn't so nice because now our tiles are no longer power-of-two
size. Also, it can force the row_pitch to be larger than needed which,
while not strictly a problem for ISL, causes incompatibility problems
with the way the GL driver chooses surface pitches.
The chosen method requires that you pay attention and not just assume that
your tile_info is in the units you think it is. However, it's nice because
it provides a nice "these are the units" declaration in isl_tile_info
itself. Previously, the tile_info wasn't usable as a stand-alone structure
because you had to also know the format. It also forces figuring out how
to deal with inconsistencies between tiling and format back to the caller
which is good because the two different consumers of isl_tile_info really
want to deal with it differently: Computation of the surface size wants
the fewest number of horizontal tiles possible while get_intratile_offset
is far more concerned with things aligning nicely.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Chad Versace <chadversary@chromium.org>
The restriction that Y-tiled surfaces must have valign == 4 only aplies to
render targets but we were applying it universally. This causes problems
if ISL_FORMAT_R32G32B32_FLOAT is used because it requires valign == 2; this
should be okay because you can't render to that format.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
When Ivy Bridge introduced array multisampling, someone made the decision
to do lots of stuff throughout the driver in terms of physical array layers
rather than logical array layers. In ISL, we use logical array layers most
of the time and it really makes no sense to use physical array layers in
the blorp API. Every time someone passes physical array layers into blorp
for an array multisampled surface, they're always divisible by the number
of samples and we divide right away.
Eventually, I'd like to rework most of the GL driver internals to use
logical array layers but that's going to be a big project and will probably
happen as part of the ISL conversion. For now, we'll do the conversion in
brw_blorp and let blorp just use the logical layers.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The result of this calculation goes into an fma() in the shader and we
would like it to be as precise as possible. The division in particular
was a source of imprecision whenever dst1 - dst0 was not a power of two.
This prevents regressions in some of the new Vulkan CTS tests for blitting
using a filtering of NEAREST.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While we're here, we also re-arrange the parameters to better match the
parameter order of blorp_blit.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
While it can be useful, the field has substantial limtations. In
particular, the bittom 2 or 3 bits is missing so your offset always has to
be a multiple of 4 or 8. While surface alignments usually work out to make
this ok, when you start trying to fake compressed surfaces as uncompressed
(which we will want to do) this falls apart. The easiest solution is to
simply align all offsets to a tile boundary and munge the regions we're
copying to account for the intratile offset.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The convert_to_single_slice operation is *mostly* idempotent. The only
non-repeatable thing it does is that, when it sets the intratile offset
fields, it just overwrites them instead of doing a += operation. This is
supposed to be ok because we have an early return at the top that should
make it bail of the surface is already a single slice. Unfortunately, the
if condition has been broken ever since it was first added in 96fa98c18.
This commit fixes the condition and adds an assert to ensure we don't stomp
any non-zero intratile offsets.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
If we use the view format, it may be an uncompressed view of a compressed
image which throws things off. Since we're computing offsets of images, we
want the actual surface offset anyway.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This should be more compact than the enum isl_channel_select[4] that we
were using before. It's also very convenient because we already had such a
structure in the Vulkan driver we just needed to pull it over.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Similar to commit 49c24d8a24 ("i965: fix noop_scissor range issue on
width/height") - take the X/Y into account to determine whether the
scissor covers the whole area or not.
Fixes the recently-added gl-1.0-scissor-depth-clear-negative-xy piglit
test.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: <mesa-stable@lists.freedesktop.org>
Android porting of the following commits:
f1f1ba3 "radeonsi: move sid.h/r600d_common.h to a common place."
69fca64 "amd/addrlib: move addrlib from amdgpu winsys to common code"
This patch fixes android building errors
Reviewed-by: Dave Airlie <airlied@redhat.com>
This reverts commit 6ba88bce64. The commit
was erroneous because GL has a separate limit, GL_MAX_FRAMEBUFFER_LAYERS
which guards the number of layers you are allowed to render into.
The GL 4.5 spec says:
"The framebuffer attachment point attachment is said to be framebuffer
attachment complete if [...] all of the following conditions are true:
[...]
If image is a three-dimensional, one- or two-dimensional array, or
cube map array texture and the attachment is layered, the depth or
layer count of the texture is less than or equal to the value of the
implementation-dependent limit MAX_FRAMEBUFFER_LAYERS."
and goes on to say that "framebuffer complete" requires all attachments to
be "framebuffer attachment complete".
On Sandy Bridge, we set GL_MAX_FRAMEBUFFER_LAYERS to 512 so creating a 3D
texture bigger than 512 is fine; you just can't render into all of the
slices at once.
Fixes ES3-CTS.gtf.GL3Tests.npot_textures.npot_tex_image on Sandy Bridge
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
In particular, this means that isl_view::base_array_layer and
isl_view::array_len get applied to 3-D textures but only when rendering.
We were already applying isl_view::base_array_layer for rendering into 3-D
textures so this isn't a huge deviation.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Copy the whole devinfo structure instead of just few fields (Ken)
Earlier, copied only couple of fields which added more code. So,
simplify code by copying the whole structure.
Signed-off-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This fixes a crash when using the prefered video format with vaapisink
on Nvidia hardwares.
Also caught by the following assert:
nouveau_vp3_video.c:91: Assertion `templat->interlaced' failed.
TEST= gst-launch-1.0 videotestsrc ! video/x-raw, format=NV12 ! vaapisink
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Tested-by: Víctor Manuel Jáquez Leal <vjaquez@igalia.com>
Tested-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Drivers which support ARB_tessellation_shader and ES 3.1 now will
expose OES_tessellation_shader and EXT_tessellation_shader as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
This provides a nice little place to share notes on what still needs to be
done and/or would be nice to have in BLORP.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Forgotten on commit "i965: Fix calculation of the image height at start level".
Thanks to Ilia Mirkin for point it.
Fixes the following regressions on Haswell and Broadwell:
ES2-CTS.gtf.GL2ExtensionTests.egl_image_external.TestSimpleUnassociated (crash back to pass)
ES2-CTS.gtf.GL2ExtensionTests.egl_image_external.TestSimple (crash back to fail)
ES2-CTS.gtf.GL2ExtensionTests.egl_image_external.TestVertexShader (crash back to fail)
https://bugs.freedesktop.org/show_bug.cgi?id=97761
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rather than using platform specific methods to retrieve the program
name pass it explicitly. The function is called directly from main().
Similarly - basename comes in two versions POSIX (can modify string,
always pass a copy) and GNU (never modifies the string).
Just printout the complete program name, esp. since the program is not
meant to be installed. Thus using $basename is unlikely to work, not to
mention it is misleading.
Reported-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jonathan Gray <jsg@jsg.id.au>
Keep the old name in the extension string, but refer to the KHR
extension internally.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
MESA_configless_context does not specify the interaction with
QueryContext at all, and the code to generate an error in this case
predates the Mesa extension. Since EGL_NO_CONFIG_{KHR,MESA} are
numerically identical there's no way to distinguish which one the
application asked for, so use the KHR behaviour.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
This patch enables variable bit-rate for vaapi encoding. According to va.h,
target bit-rate equals to maximum bit-rate multiplies by target_percentage.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The radeonsi driver doesn't and shouldn't care about the buffer index.
Only the virtual addresses matter.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The fence that is added to the BO during flush is guaranteed to be
signaled after all the fences that were in the fences array of the BO
before the flush, because those fences are added as dependencies for the
submission (and all this happens atomically under the bo_fence_lock).
Therefore, keeping only the last fence around is sufficient.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The idea is to have matching init/deinit functions so that deinit can be
re-used for cleanup in the error path of amdgpu_winsys_create.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Original commit added documentation explaining lossless compression
case:
commit 56f29911ec
Author: Topi Pohjolainen <topi.pohjolainen@intel.com>
Date: Tue Feb 2 10:00:41 2016 +0200
i965: Add a flag telling color resolve pass to ignore CCS_E
It, however, easily gives the impression that the sole purpose
of the intel_miptree_resolve_color() is to address lossless
compression. Original intention is to document the lack of
INTEL_MIPTREE_IGNORE_CCS_E flag given for the resolve call.
This patch fixes this along with a typo found spotted further
down.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will indicate target layer (Render Target Array Index) needed
for layered clears.
v2: Use 3DSTATE_VF_SGVS for gen8+
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
such as we do for compressed msaa. In case of non-compressed simgle
sampled buffers the allocation of mcs is deferred until there is
actually a clear operation that needs the mcs.
In case of render buffer compression the mcs buffer always needed
and there is no real reason to defer the allocation. By doing it
directly allows to drop quite a bit unnecessary complexity.
Patch leaves brw_predraw_set_aux_buffers() a no-op. Subsequent
patches will re-use it and it seemed cleaner to leave it instead
of removing and re-introducing.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Otherwise once mcs buffer gets allocated without delay for lossless
compression (same as we do for msaa), assert starts to fire in
piglit case: tex3d. The test uses depth of one which is in fact
supported even now.
v2 (Jason): Allow also 1D case as there is nothing in the specs
constraining it either.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Once mcs buffer gets allocated without delay for lossless
compression (same as we do for msaa), one gets regression in:
GL45-CTS.texture_barrier_ARB.same-texel-rw
Setting the auxiliary surface for both sampling engine and data
port seems to fix this. I haven't found any hardware documentation
backing this though.
v2 (Jason): Prepare also for the case where surface is sampled with
non-compressible format forcing also rendering without
compression.
v3: Split asserts and decision making.
v4: Detailed comment provided by Jason explaining the need for using
auxiliary buffer for texturing when the same surface is also
used as render target.
Added check for existence of renderbuffer before considering if
underlying miptree matches.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v3:
- Actually set the flags when needed instead of falsely
overwriting them (Jason).
- Use more generic name for flag (dropped RENDERBUFFER)
- Consult also shader images
v4:
- Consult only lossless compressd shader images
v5:
- Check the existence of renderbuffer before considering
if it matches the given miptree
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
And add plumbing to provide it all the way to surface state emitter.
This is not used yet but will be in subsequent patches to carry
additional constraints.
v2 (Jason): Use uint32_t instead of int as the type
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Blorp consults brw_is_color_fast_clear_compatible() to see if any
restrictions apply for fast clear in addition to the capablities
advertised in isl_format.c::format_info[]. On Gen8+ integer formats
are backlisted for plain old fast clear but there is no reason why
lossless compression shouldn't be supported. In fact, lossless
compression of integer formats is already supported for normal
render paths.
This patch prepares for dropping the delayed allocating of the mcs
buffer for lossless compression. Until now the skip of fast clear
also prevented the mcs being allocated and hence the lossless
compression being effectively turned off for integer formats.
Once the mcs buffer is allocated beforehand, the assertion addressed
here would start triggering.
v2: Drop the assert instead of relaxing it (Jason)
Fix typo while at it.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
After commit "i965: Fix calculation of the image height at start level", it is
not needed. This commit removes the "warning: unused variable ‘i’" warning.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This clears the last bits of the usecases of the hash table
located in mesa/program, allowing us to remove it.
V2: Rebase on top of changes to Makefile.sources
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
We are getting the util hash table through the include in
program/hash_table.h for the moment until we migrate the
string_to_uint_map to a separate file.
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The "locals" hash table is used as a set, so use a set to
avoid confusion and also spare some minor memory.
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
V2: Fix incorrect ordering on hash table insert
V3: null check value returned by _mesa_hash_table_search()
(Timothy Arceri)
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
V2: Rebase to the adaption of new hashing functions
V3: move previous_label declaration to where it is used
(Timothy Arceri)
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
And remove the now unused hash_table_replace.
V2: Actually do the equivalent thing, and don't leak memory
V3: fix minor typo in comment (Timothy Arceri)
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
It is included through the util/hash_table include in
the program hash_table, so this should be safe.
This will be needed when we start converting each use of
the program_hash_table, as some places need this function.
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Here we make the prog_hash_table functionally equivalent to
the one in util by wrapping the remaing functions that differ.
We also move the functions to the header so we can remove the c
file.
This enables us to do a step-by-step replacement of the table.
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Looks like the GM107 IPA op does not allow a separate offset when
using an indirect register. Instead we must use AL2P like we do for
indirect vertex operations on Kepler+.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We have to force it to write to predicate 7 (aka PT) in order for it not
to mess up another predicate. Unclear what would be returned in the
predicate, perhaps an error code for out-of-bounds requests. Blob
doesn't seem to check it.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
- Fixes CTS tests:
* GL44-CTS.shader_image_size.advanced-nonMS-cs-float
* GL44-CTS.shader_image_size.advanced-nonMS-cs-int
* GL44-CTS.shader_image_size.advanced-nonMS-cs-uint
* GL44-CTS.shader_image_size.advanced-nonMS-gs-float
* GL44-CTS.shader_image_size.advanced-nonMS-gs-int
* GL44-CTS.shader_image_size.advanced-nonMS-gs-uint
* GL44-CTS.shader_image_size.advanced-nonMS-tes-float
* GL44-CTS.shader_image_size.advanced-nonMS-tes-int
* GL44-CTS.shader_image_size.advanced-nonMS-tes-uint
* GL44-CTS.shader_image_size.advanced-nonMS-vs-float
* GL44-CTS.shader_image_size.advanced-nonMS-vs-int
* GL44-CTS.shader_image_size.advanced-nonMS-vs-uint
v1: (written by Dave Airlie) Always shift height images for levels.
Fixed the CTS test.
v2: Only shift height if the texture is not an 1D_ARRAY,
it fixes assertion in GL44-CTS.texture_view.gettexparameter
due to the original patch (Antia).
v3: Remove the loop. Do not shift height either for 1D textures.
Use an explicit switch and add an assertion (levels == 0) for
multisampled textures (Jason).
v4: Rectangle textures can not have levels either (Ilia Mirkin).
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Set config attributes EGL_MAX_PBUFFER_WIDTH and EGL_MAX_PBUFFER_HEIGHT to
hard-coded non-zero values. These two attributes are required on Android.
v2: use _EGL_MAX_PBUFFER_WIDTH/HEIGHT from egldefines.h
(based on discussion on the first version)
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Static library dependency is required to pull the generated
XML headers into the generated C file.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This change makes copy propagation pass faster. Complete link time
spent in test case attached to bug 94477 goes down to ~400 secs from
over 500 secs on my HSW machine. Does not fix the actual issue but
brings down the total. No regressions seen in CI.
v2: do not leak hash_table structure
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Blorp doesn't handle spilling so we set allow_spilling to false in that
case. The blorp 16x MSAA resolve shader spills in 16-wide but not 8-wide.
This commit makes it so that we fail the 16-wide compile and successfully
fall back to 8-wide instead of just assert-failing when trying to compile
the 16-wide shader.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Unlike the current CSE pass, global value numbering is capable of detecting
common values even if one does not dominate the other. For instance, in
you have
if (...) {
ssa_1 = ssa_0 + 7;
/* use ssa_1 */
} else {
ssa_2 = ssa_0 + 7;
/* use ssa_2 */
}
Global value numbering doesn't care about dominance relationships so it
figures out that ssa_1 and ssa_2 are the same and converts this to
if (...) {
ssa_1 = ssa_0 + 7;
/* use ssa_1 */
} else {
/* use ssa_1 */
}
Obviously, we just broke SSA form which is bad. Global code motion,
however, will repair this for us by turning this into
ssa_1 = ssa_0 + 7;
if (...) {
/* use ssa_1 */
} else {
/* use ssa_1 */
}
This intended to eventually mostly replace CSE. However, conventional CSE
may still be useful because it's less of a scorched-earth approach and
doesn't require GCM. This makes it a bit more appropriate for use as a
clean-up in a late optimization run.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On the RSxxx chip series, HW TCL is missing and r300_emit_vs_state()
is never called.
However, if R300_VAP_CNTL is never set, the hardware (at least the
RS690 I tested this on) comes up with rendering artifacts, and
parts that are uploaded before this "fix" remain broken in VRAM.
This causes artifacts as in fdo#69076 ("triangle flickering").
It seems like this setup needs to happen at least once after power on
for 3D rendering to work properly. In the DDX with EXA, this happens in
RADEON_SWITCH_TO_3D() when processing an XRENDER Composite or an
Xv request. So playing back a video or starting a GTK+2 application
fixes 3D rendering for the rest of the session. However, this auto-fix
doesn't happen when EXA is not used, such as with GLAMOR or Wayland.
This patch ensures the register is configured even in absence of
the DDX's EXA module.
The register setting is taken from:
xf86-video-ati -- RADEONInit3DEngineInternal()
mesa/src/mesa/drivers/dri/r300 -- r300EmitClearState()
Tested on RS690.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Max Staudt <mstaudt@suse.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The problem is that TC-compatible DCC clear codes translate
into different clear values when you change the format.
I have a new piglit reproducing the issue.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
In the original commit message in 56a1c10 it was wrongly used too:
- env GALLIUM_HUD_SIGNAL_TOGGLE: toggle visibility via signal
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Previously, we dind't apply variable decorations to the members of a split
structure variable. This doesn't quite work, unfortunately, because things
such as the "flat" qualifier may get applied to an entire structure instead
of propagated to the members. This fixes 9 of the new CTS tests in the
dEQP-VK.glsl.linkage.varying.struct.* group.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
program_invocation_short_name is a gnu extension. Limit use of it
to glibc and cygwin and otherwise use getprogname() which is available
on BSD and OS X.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Include libgen.h for basename as required by posix.
The definition is not found on at least OpenBSD otherwise.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
error() is a gnu extension and is not present on OpenBSD
and likely other systems.
Convert use of error to fprintf/strerror/exit.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The Makefile unconditionally linked libX11-xcb into libvulkan_intel.so.
But it's needed only if HAVE_PLATFORM_X11.
Fixes build of libvulkan_intel.so on Chromium OS, which has no X11
libraries.
Fixes: 71258e9462 ("anv/x11: Add support for Xlib platform")
Cc: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Only keep track of a single current context, instead of separate
contexts for GL and GLES.
In EGL 1.4 (and 1.5), EGL_OPENGL_API and EGL_OPENGL_ES_API are supposed
to be interchangeable for all purposes except for eglCreateContext.
The _EGLThreadInfo::CurrentContexts array is now a single pointer to the
current context, which may be a GL or GLES context. In addition, it now
keeps track of the current API as an enum instead of an index.
eglMakeCurrent will now replace the current context, regardless of which
client API is used for for the current and new contexts. It no longer
checks for a conflicting context. In addition, calling eglMakeCurrent
with EGL_NO_CONTEXT will now release the current context regardless of
the current API.
v2: Rebased against master (Adam Jackson)
Reviewed-by: Adam Jackson <ajax@redhat.com>
v2: make fence extension optional to not break non-i965 classic
drivers, and move __DRI2_FENCE into core extensions, based
on comments from Emil
Signed-off-by: Rob Clark <robdclark@gmail.com>
First off, as late as ES 3.2, GetInternalformat only supports
RENDERBUFFER and 2DMS(_ARRAY) targets.
Secondly, the _mesa_has_ext helpers are very accurate... a little too
accurate, some might say. If we only show an extension in compat
profiles because core profiles have the functionality guaranteed, they
will return false. Fix these to either check for a core profile
explicitly, or to a different-but-identical extension available in core
profile.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matteo Bruni <matteo.mystral@gmail.com>
Tested-by: Matteo Bruni <matteo.mystral@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Add a separate extension check for that format. Prevents glTexImage from
trying to find a matching format, which fails on drivers without support
for this format.
Fixes: sized-texture-format-channels (on a3xx)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: mesa-stable@lists.freedesktop.org
We had two almost identical copies of this code and they were both broken
but in different ways. The previous two commits fixed both of them. This
one just unifies them so that it's easier to handle in the future.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
SPIR-V has the two arguments in the opposite order from GLSL. NIR uses the
GLSL order so we had them backwards.
Fixes dEQP-VK.spirv_assembly.instruction.compute.opatomic.compex
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Increases the performance of legacy geometry-heavy apps
still using display lists.
Performance increase for a targeted testcase is on the
order of 8x, and applications like ParaView 4.x (5.x uses
no longer used display lists) improve by about 10%-20%.
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fix build with Python < 2.7.
File "./glsl/ir_expression_operation.py", line 360, in get_enum_name
return "ir_{}op_{}".format(("un", "bin", "tri", "quad")[self.num_operands-1], self.name)
ValueError: zero length field name in format
Fixes: e31c72a331 ("glsl: Convert tuple into a class")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Recent changes to generate ir_expression*.h header files broke Android
builds. This adds the generation rules. This change is complicated due to
creating a circular dependency between libmesa_glsl, libmesa_nir, and
libmesa_compiler. Normally, we add static libraries so that include paths
are added even if there's no linking dependency. That is the case here.
Instead, we explicitly add the include path using $(MESA_GEN_GLSL_H) to
libmesa_compiler. This in turn requires shuffling the order of make
includes. It also uncovered missing dependency tracking of glsl_parser.h.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
When creating interlaced video buffer, hegith set to "template.height =
align(tmpl->height/ array_size, VL_MACROBLOCK_HEIGHT);", and we use
"template.height *= array_size;" for the buffer height, so it actually
aligned with 32. With progressive video buffer it still aligned with 16,
thus causing different height between interlaced buffer and progressive
buffer for 4K (height=2160), and 720p (height=720).
When transcode the video, this will cause the 16 lines corruption
at the bottom of the encode video.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
As of commit d82f8d9772, we actually
parse and attempt to handle the 'patch' qualifier on interface blocks.
This patch fixes explicit locations for variables in such blocks.
Without it, many program interface query dEQP/CTS tests hit this
assertion in ir_set_program_inouts.cpp
if (is_patch_generic) {
assert(idx >= VARYING_SLOT_PATCH0 && idx < VARYING_SLOT_TESS_MAX);
bitfield = BITFIELD64_BIT(idx - VARYING_SLOT_PATCH0);
}
because the location was incorrectly based on VARYING_SLOT_VAR0.
Note that most of the tests affected currently fail before they hit
this, due to confusion about what the program interface query name
of those resources should be.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This just ports the simpler endian detection bits, addrlib
sharing wants this outside gallium.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Step one to merging radv would be to move some files around.
This only adds the include path to r600/radeonsi, because later
we want to avoid having to add it to the generic target paths.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Calculate depth ranges from viewport states and
pipe_rasterizer_state::clip_halfz.
The evergreend.h change is required to silence a warning.
This fixes this recently updated piglit: arb_depth_clamp/depth-clamp-range
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
DCC is limited in how texture formats can be reinterpreted using texture
views. If we get a view format that is incompatible with the initial
texture format with respect to DCC, disable DCC.
There is a new piglit which tests all format combinations.
What works and what doesn't was deduced by looking at the piglit failures.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Sometimes it was f32, other times it was i32. Now it's always i32.
This fixes:
GL45-CTS.texture_cube_map_array.image_texture_size.texture_size_compute_sh
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Invalidated buffers don't have to go through it.
Split r600_init_resource into r600_init_resource_fields and
r600_alloc_resource.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Currently, due to the inverse order, strcmp will produce negative result
when the needle is towards the start of the haystack. Thus on the next
iteration(s) we'll end up further towards the end and eventually fail to
locate the entry.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
We recently starting to always emit the NDV (== dall) bit for quadops.
However it was folded into the wrong code word.
Fixes: e0a067ed48 (nv50/ir: always emit the NDV bit for OP_QUADOP)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
Unfortunately a3xx does not have a separate disable for depth clipping,
so when depth clamp is enabled, we disable the whole 3d clipper logic.
This in turn also gets rid of the xy clip that it would normally do.
When we detect this would happen, instead we integrate the viewport into
the window scissor. This may have slightly different behavior around
wide points, but it's unlikely that anything depends on this.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97231
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
The hw clipper only handles up to 6 UCPs. If there are more than 6 UCPs,
or a clip vertex, or clip distances are in use, then we must use the
fallback discard-based clipping from the frag shader.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
This is the only remaining part of genX_l3.c and there's really no good
reason for it to be in its own file.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Now that we're using gen_l3_config.c, we no longer have one set of l3
config functions per gen and we can simplify a bit. Also, we know that
only compute uses SLM so we don't need to look for it in all of the stages.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
When Jordan first implement L3$ configuration for Vulkan, he copied+pasted
from the GL driver because we had no good place to share it. Now that we
have src/intel/common, we should be sharing these tables.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Generated by:
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In 144cbf8 ("nir: Make nir_opt_remove_phis see through moves."), Ken
made nir_opt_remove_phis able to coalesce phi nodes whose sources are
all moves with the same swizzle. However, he didn't add the logic
necessary for handling the fact that the phi may now have multiple
different sources, even though the sources point to the same thing. For
example, if we had something like:
if (...)
a1 = b.yx;
else
a2 = b.yx;
a = phi(a1, a2)
... = a
then we would rewrite it to
if (...)
a1 = b.yx;
else
a2 = b.yx;
... = a1
by picking a random phi source, which in this case is invalid because
the source doesn't dominate the phi. Instead, we need to change it to:
if (...)
a1 = b.yx;
else
a2 = b.yx;
a3 = b.yx;
... = a3;
Fixes 12 CTS tests:
ES31-CTS.functional.tessellation.invariance.outer_edge_symmetry.quads*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Curiously OES/EXT_tessellation_shader leave these out, while ES 3.2 adds
them in.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I left this out of my previous commit that went around enabling all of
the other ES 3.2 entrypoints.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is a newly added flag. We always pass false into it from
nv50_clear_texture, but other callers may want to respect the render
condition. (And the functions were originally spec'd to respect it.)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
When NIR was first introduced, Connor added this fake-edge hack to work
around issues related to unreachable blocks. Thanks to GLSL IR's jump
lowering code, the only unreachable code you can have is a block after an
infinite loop. With SPIR-V, we didn't have the jump lowering code so we
could also end up with the "if (...) { break; } else { continue; }" case
which generates an unreachable block after the if. Because of this, most
of NIR had to be fixed up for handling unreachable blocks. The only
remaining case of not handling unreachable blocks was specifically the
block-after-infinite-loop case in dead_cf which was fixed by the previous
commit. We can now delete the fake edge hack.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
When an application uses a ton of shaders, we need to evict them
when the code segment is full but this is not really a good solution
if monster shaders are used because code eviction will happen a lot.
To avoid this, it seems better to dynamically resize the code
segment area after each eviction. The maximum size is arbitrary
fixed to 8MB which should be enough.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
To avoid the bins list to grow up indefinitely when the code segment
size will be bumped, we need to separate that bin from the SCREEN
one because it contains other resources like the uniform bo.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This function will be helpful for resizing the code segment
area when we need to evict all shaders.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes a very old issue which happens when the code segment
size is full. A bunch of real applications like Tomb Raider,
F1 2015, Elemental, hit that issue because they use a ton of shaders.
In this case, all shaders are evicted (for freeing space) but all
currently bound shaders also need to be re-uploaded and SP_START_ID
have to be updated accordingly.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This refactoring will help for fixing the "out of code space"
eviction issue because we will need to reupload the code for
all currently bound shaders but it's slightly different than
uploading a new fresh code.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
If scissor X or Y was set to a negative value then the previous
code might have indicated noop scissors when the scissor range
actually was masking a portion of the framebuffer.
Since fb->_Xmin, _Xmax, _Ymin and _Ymax take scissors into
account, we can use these to test for a noop scissor.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Varying packing would like to mark certain variables as flat.
This works as long as both sides of the interfaces are changed
accordingly. However, with SSO, we disable varying packing on
the outermost stages. We also disable varying packing for
certain tessellation stages.
With SSO, we operate on the producer and consumer separately.
Checks based on the consumer stage and variable are risky, and
can easily lead to altering one half of the interface between
stages, breaking SSO pipeline IO validation.
Just stop monkeying around with interpolation modes unless
required for varying packing. There's no point. This also
disables it in unsafe SSO cases.
Fixes CTS tests:
*.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_MaxPatchVertices_Position_PointSize
Also fixes Piglit's spec/oes_geometry_shader/sso_validation:
- user-defined-gs-input-not-in-block.shader_test
- user-defined-gs-input-in-block.shader_test
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We handled the unsized case, implicitly sizing arrays to the value
of gl_MaxPatchVertices. But if a size was present, we failed to
raise a compile error if it wasn't the value of gl_MaxPatchVertices.
Fixes CTS tests:
*.tessellation_shader.compilation_and_linking_errors.
{tc,te}_invalid_array_size_used_for_input_blocks
Piglit's tcs-input-read-nonconst-* tests have recently been fixed.
This patch will break older copies of those tests, but the latest
should continue working. Update to Piglit 75819c13af2ed5.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When trying to get a device name for an fd using sysfs, it would always fail
as it was expecting key/value pairs to be delimited by '\0', which is not the
case.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The device name is only needed for WL_bind_wayland_display so make this clear
by only storing the device name when Wayland support is built.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
A few inline asserts in anv assume alignments are power of 2, but with
formats like R8G8B8 we have odd alignments.
v2: round up to power of 2 (Ilia)
v3: reuse util_next_power_of_two() from gallium/aux/util/u_math.h (Ilia)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The code was triggering asserts in DEBUG builds of the SVGA driver since
the reference count of the resource was never decremented before destroy.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Without this, we would pass over the instructions in the SIMD8 program
(which is located earlier in the buffer) when brw_set_uip_jip() is
called to handle the SIMD16 program.
The assertion about compacted control flow was bogus: halt, cont, break
cannot be compacted because they have both JIP and UIP. Instead, we
should never see a compacted instruction in this code at all.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The original motivation was that gen6_clip_state ignored _NEW_POLYGON
as it didn't care about early culling. The only other change was that
Gen6 ignored BRW_NEW_TES_PROG_DATA as it doesn't have tessellation
shaders, but listening to this is harmless as it'll never be signalled.
Now that we've added _NEW_POLYGON for is_drawing_lines/points, we can
merge the two as the distinction is meaningless.
This actually fixes a bug, though: Gen8+ was using the gen6_clip_state
atom because it doesn't care about early culling, but it also needs
BRW_NEW_TES_PROG_DATA, which was missing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
State upload code should use prog_data rather than poking at core
Mesa shader data structures wherever possible.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
calculate_attr_overrides() uses is_drawing_points(), which depends
on tessellation and geometry program state, as well as polygon state.
v2: Add missing _NEW_POLYGON as well. Caught by Iago Toral.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This has never been used because info->immd.bufSize is always 0
and anyways this is an experimental code which has never been
completed.
This gets rid of some unused code in the program validation process.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The code a few lines below expects to migrate the bo in question to
VRAM. Since we're filling the initial data via CPU, it's more efficient
to create the temporary buffer in GART. There is no "push" method
implemented, otherwise we'd use that instead.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
To support WL_bind_wayland_display an authentication function needs to be
provided but this was not being done for this platform as it's not strictly
necessary. However, as this isn't an optional function there's the potential
for a segfault to occur if authentication is mistakenly performed. Protect
against this by providing a function that prints an error.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Up until now, DRI3 was only used for devices that have render nodes, unless
overridden via an environment variable, with it falling back to DRI2 otherwise.
This limitation was there in order to support WL_bind_wayland_display as it
requires client opened device node fds to be authenticated, which isn't possible
when using DRI3. This is an unfortunate compromise as DRI3 provides security
benefits over DRI2.
Instead, allow DRI3 to be used for devices without render nodes but don't
advertise WL_bind_wayland_display in this case. Applications that need this
extension can still be run by disabling DRI3 support via the LIBGL_DRI3_DISABLE
environment variable.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Until hardware appears (in a gallium driver) that can make use of the
TCS-outputted gl_BoundingBox, we just request that the variable gets
assigned as a regular patch variable.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Requested by Anuj during review of
4a87e4ade7, adding as follow-up since it
led to assertion failures due to various GLSL bugs that should be
fixed now.
In the fragment shader OutputsWritten is a bitset of FRAG_RESULT_*
enumerants, which represent the location of each color output written
by the shader. The secondary and primary color outputs of a given
render target using dual-source blending have the same location, so
the 'idx' computation below will give the wrong bit as result if the
'var->data.index' term is non-zero -- E.g. if the shader writes the
primary and secondary colors of the FRAG_RESULT_COLOR output,
ir_set_program_inouts will think that the shader writes both
FRAG_RESULT_COLOR and FRAG_RESULT_SAMPLE_MASK, which is just bogus.
That would cause the brw_wm_prog_key::nr_color_regions computation
done in the i965 driver during fragment shader precompilation to be
wrong, which currently leads to unnecessary recompilation of shaders
that use dual-source blending, and triggers an assertion failure in
fs_visitor::emit_fb_writes() on my i965-fb-fetch branch.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
gl_SecondaryFragColorEXT should have the same location as gl_FragColor
for the secondary fragment color to be replicated to all fragment
outputs. The incorrect location of gl_SecondaryFragColorEXT would
cause the linker to mark both FRAG_RESULT_COLOR and FRAG_RESULT_DATA0
as being written to, which isn't allowed by the spec and would
ultimately lead to an assertion failure in
fs_visitor::emit_fb_writes() on my i965-fb-fetch branch.
This should also fix the code below for multiple dual-source-blended
render targets, which no driver currently supports but we have plans
to enable eventually in the i965 driver (the comment saying that no
hardware will ever support it seems rather hilarious).
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Currently the mesa state tracker relies on there being two bits set
per dual-source output in the gl_program::OutputsWritten bitset, but
that only worked due to a GLSL front-end bug that caused it to set the
OutputsWritten bit for both location and location+1 even though at the
GLSL level the primary and secondary color outputs used for
dual-source blending have the same location. Fix it by extending
outputMapping[] to 2*FRAG_RESULT_MAX elements in order to represent a
mapping from a (location, index) pair to its TGSI output, which should
also make it slightly easier to add support for dual-source blending
in combination with multiple render targets in the long run.
No Piglit regressions on llvmpipe.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
text data bss dec hex filename
7669233 277176 28624 7975033 79b079 i965_dri.so before generated code
7647081 277176 28624 7952881 7959f1 i965_dri.so before this commit
7669289 277176 28624 7975089 79b0b1 i965_dri.so with this commit
Looking at the generated assembly, it appears that some of changes made
in the generated code prevent some loops from being unrolled. Removing
the default cases (via unreachable()) allows these loops to unroll again.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
constant_template_common can now handle the case where the result type
is different from the input type by using type_signature_iter. This
changes the "shape" of all the cast-style operators, but they should
function the same.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
constant_template_common can now handle the case where the result type
is different from the input type by using type_signature_iter.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This template is mostly an artefact of the development of the original
patch series and to minimize the differences between the original code
and the generated code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The difference between these two templates were mostly an artefact of
the development of the original patch series and to minimize the
differences between the original code and the generated code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Immediately previous to this patch,
diff -wud src/glsl/ir_constant_expression.cpp \
src/glsl/ir_expression_operation_constant.h
should be "minimal."
v3: With much help from José Fonseca, fix the SCons build.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ir_triop_bitfield_extract is a little weird because the second and third
operand and aways int, so they may differ in type from the first
operand.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
The code generated is quite different from what was previously used. I
believe that it is still correct by the GLSL spec, and I believe, due to
C rules about shifts, the behavior will be the same.
Section 5.9 (Expressions) of the GLSL 4.50 spec says:
The result is undefined if the right operand is negative, or greater
than or equal to the number of bits in the left expression's base
type.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
ldexp is weird because its two operands have different types. Add
support for directly specifying the exact signatures of all the possible
variations of an operation.
v2: Use tuple() instead of () for clarity. Suggested by Dylan.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
These are operations like the pack functions that have separate
functions that assign multiple outputs from a single input.
v2: Correct the source and destination types. They were previously
transposed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Only operations where the implementation is identical code regardless of
type. The only such operations are ir_binop_all_equal and
ir_binop_any_nequal.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
v2: Remove extra int() cast in find_lsb. Suggested by Matt. 'for (a,
b) in d' => 'for a, b in d'. Suggested by Dylan.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Unary operations where all of the supported types use the same C
expression to evaluate them.
v2: 'for (a, b) in d' => 'for a, b in d'. Suggested by Dylan.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
(X & -X) calculates a value with only the least significant bit of X
set. Since there is only one bit set, the LSB is the MSB.
v2: Remove extra int() cast. Suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The operator_string functions gave us some protection against a
malformed table. Now that the table is generated from the same data
that generates the enum, this is not a concern. Just cut out the middle
man.
text data bss dec hex filename
7531892 273992 28584 7834468 778b64 i965_dri-64bit-before.so
7531828 273992 28584 7834404 778b24 i965_dri-64bit-after.so
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
'diff -ud' is clean.
v2: Massive rebase.
v3: With much help from José Fonseca, fix the SCons build.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
No change except to the copyright symbol. The next patch will generate
this file with Python, and Unicode + Python = pure rage.
v2: Massive rebase.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This ensures that they remain correct if the list is rearranged or new
opcodes are added. I checked a diff of before and after to ensure that
each ir_last_ had the same value.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
There are differences in where end-of-line comments are placed, but
'diff -wud' is clean.
v2: Massive rebase.
v3: With much help from José Fonseca, fix SCons build.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
The original pipeline cache the Kristian wrote was based on a now-false
premise that the shaders can be stored in the pipeline cache. The Vulkan
1.0 spec explicitly states that the pipeline cache object is transiant and
you are allowed to delete it after using it to create a pipeline with no
ill effects. As nice as Kristian's design was, it doesn't jive with the
expectation provided by the Vulkan spec.
The new pipeline cache uses reference-counted anv_shader_bin objects that
are backed by a large state pool. The cache itself is just a hash table
mapping keys hashes to anv_shader_bin objects. This has the added
advantage of removing one more hand-rolled hash table from mesa.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97476
Acked-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
This new anv_shader_bin struct stores the compiled kernel (as an anv_state)
as well as all of the metadata that is generated at shader compile time.
The struct is very similar to the old cache_entry struct except that it
is reference counted and stores the actual pipeline_bind_map. Similarly to
cache_entry, much of the actual data is floating-size and stored after the
main struct. Unlike cache_entry, which was storred in GPU-accessable
memory, the storage for anv_shader_bin kernels comes from a state pool.
The struct itself is reference-counted so that it can be used by multiple
pipelines at a time without fear of allocation issues.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Acked-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
All of these worked before because they were depending on prog_data to be
null. Soon, we won't be able to depend on a nice prog_data pointer and
it's nice to be more explicit anyway.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
The range from ANV_MIN_STATE_SIZE_LOG2 to ANV_MAX_STATE_SIZE_LOG2 should
be inclusive and we have asserts that ensure that you never try to allocate
a state larger than (1 << ANV_MAX_STATE_SIZE_LOG2). However, without
adding 1 to the difference, we allocate 1 too few bucckts and so, even
though we have an assert, anything landing in the last bucket will fail to
allocate properly..
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This script was broken for the last few days and I couldn't figure out why.
Turns out it was checking for the existence of a file that got renamed,
so rename it in here too.
Fixes: f926cf5bd0 ("docs: Rename GL3.txt to features.txt")
CC: Ian Romanick <ian.d.romanick@intel.com>
CC: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Caught by Coverity. Likely fixes real issues if an output component
is not present.
CID: 1372278
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
While we are at it, make it static and change the return values
policy to be consistent.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This silences a divergent error found with F1 2015.
Basically, the NDV bit has to be set when a FSWZ instruction is
inside divergent code, but it's not needed otherwise. The correct
fix should be to set it only in divergent code situations.
GM107 emitter already sets that bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
Topi asked to have the prefix removed because there's nothing gen7 about
it. However, now that everything is in a single file, there is no good
reason to have it split out into a helper function anyway. Let's just put
the contents in emit_urb_config and call it a day.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
No longer needed as of last commit, since we no longer add OPENGL to the
ClientAPIs thus, RenderType and Conformant don't have the desktop GL
bit set.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
In the rather unlikely case that the API is considered invalid, don't
add it to the (supported) ClientAPIs bitmask.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
---
Strictly speaking we only need this in the Android case for OpenGL.
Adding it everywhere doesn't hurt us since the compiler will const
propagate and optimise/remove these.
At the moment one can use OpenGL in eglBindAPI() only to clear the
EGL_OPENGL_BIT from RenderableType and Conformant for _each_ config.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
This avoids generating fbconfigs whose winsys framebuffers will be
incomplete (see nouveau_check_framebuffer_complete).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Experimentally, this is required for glxgears and others to display the
proper colors. This is also what the code used to do before the
referenced commit.
Fixes: c703658b39 (mesa: Drop _EnabledUnits.)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
NV34 and possibly other NV3x hardware has the capability of exposing the
NV25 graph class. This allows forcing nouveau_vieux to be used instead
of the gallium driver, primarily for testing purposes. (Among other
things, NV2x only ever came as AGP or inside an Xbox, never PCI/PCIe).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Commit 7413625ad3 flipped a few functions too many to use
pipe_shader_type. These functions actually take an integer that does not
correspond 1:1 with the enum.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Backing views/surfaces are used to handle the case when a resource is
bound both as a render target and as a sampler source (such as when
doing auto mipmap generation).
This patch fixes a bug where mapping a resource (to do a glReadPixels)
was reading the stale data in the original surface rather than the
backing surface which was rendered to.
We need to propagate the backing resource (which we rendered to) back
to the original resource before we read from it. The problem was the
svga_propagate_rendertargets() function was examining the wrong surface
views.
This fixes the "poc9" test described in VMware bug 1686661.
Also tested with Piglit, Cinebench, Lightsmark, etc.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We need to set the need_convert flag with each loop iteration, not
just when the rgba pointer is null.
Bug reported by Markus Müller <mueller@imfusion.de> on mesa-users list.
Fixes new piglit arb_texture_float-get-tex3d test.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This should be all that is required for cull distances to work
on radeonsi.
v1.1: whitespace cleanup, add docs fix clipdist_mask usage.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This keeps invalid surface states from leaking through and potentially
hanging the GPU. We shouldn't actually be hitting this on a regular basis,
but a helpful assert is better than a hang.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This allows us to use the actual render format as opposed to the texture
format. I don't know that the hardware actually cares in the case of fast
clears, but it certainly seems more correct.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
At this point, blorp is completely driver agnostic and can be safely moved
into its own folder. Soon, we hope to start using it for doing blits in
the Vulkan driver.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This has been the only caller since we deleted the meta fast clear code.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This commit switches all of blorp from taking a brw_context to taking a
blorp_context and, where useful, a void *batch. In the GL driver, we only
have one active batch at a time so the brw_context *is* the batch but in
Vulkan, batch will point to the anv_cmd_buffer in which we are building
instructions.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This gets rid of brw_context throughout the core of the state setup code.
Instead, it is replaced with blorp_batch which contains a pointer to the
blorp_context and a void* that the driver can use for its own blorp data.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Previously, we passed the buffer address (as per the latest offset from the
kernel) to ISL to use when it filled out the surface state. We then called
drm_intel_bo_emit_reloc() to add the relocation to the list. The newly
added blorp_surface_reloc helper adds the relocation to the list and then
writes the buffer address directly into the surface state.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This pulls most of the brw-specific bits into helpers with generic names.
Later, those will become the driver hooks for generic code.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This sanitizes blorp's access to the i965 driver's shader cache by patching
it through the blorp_context. When we start using blorp in Vulkan, we will
simply have to implement such a caching interface in the Vulkan driver.
Note: In my first attempt at this, I simplified it down to a single
upload_shader entrypoint and implemented the caching inside of blorp. This
doesn't work, however, because the i965 driver will, on occation, dump its
entire cache and start over. When this happens, blorp needs to be able to
recompile its shaders and re-upload them. It's easiest to just expose the
caching interface.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Android porting of commit bebc1a1 "intel: Flatten the makefile structure"
Automake approach was followed, by moving makefiles a level up,
naming them Android.genxml.mk and Android.isl.mk,
performing the necessary adjustments to the paths,
adding src/intel/Android.mk and fixing mesa top level makefile.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Small code clean up that removes magic numbers where a TGSI
opcode has been defined.
No functional change expected as each opcode is unsupported on
the respective hardware.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: James Harvey <lothmordor@gmail.com>
As reported by Clang, TGSI_OPCODE_DFMA (defined magic number 118) is
currently initialized twice for Cayman and Evergreen.
When Jan Vesely added double precision FMA opcode it did make sense
to locate it immediately after TGSI_OPCODE_DMAD, although this is
out of order.
This change cleans up the prior magic number definition and ensures
any later reordering of this struct will not create problems.
Prior change was:
commit 015e2e0fce
Author: Jan Vesely <jan.vesely@rutgers.edu>
Date: Sat Jul 2 16:14:54 2016 -0400
r600g: Add double precision FMA ops
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96782
Fixes: 54c4d525da ("r600g: Enable FMA on chips that support it")
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Tested-by: James Harvey <lothmordor@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Implement SCATTERPS as a dynamic loop based on mask set bits
instead of a static compile time loop.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
- use per-primitive viewports throughout the pipeline.
- track whether all available scissor rects are tile aligned.
Causes failures, so not taken into account when choosing rasterizer yet.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
We were allocating global variables for the maximum LDS size
which made the compiler think we were using all of LDS, which
isn't the case.
Reviewed-By: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v1 → v2:
- Fixed indentation (noted by Brian Paul)
- Removed second assert from nouveau's switch statements (suggested by
Brian Paul)
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Use temporary buffers so that we don't read and write to the
same surface at the same time. We don't need to use linear
layout now.
v2: rebase the patch against reverted change
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This reverts commit 09dff7ae2e.
Turned out this can cause some artifacts in the output. Let's revert
it for now until we have sorted out all issues.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
This hack was introduced in commit 03ac2c7223:
i965/gs: Fix up gl_PointSize input swizzling for DUAL_INSTANCED gs
Specifically to fixup the code we emitted to deal with gl_PointSize inputs
in dual instance mode, where we were emitting a MOV to copy the point
size from .w (where the hardware delivers it) to .x (because code will
expect this to be a float). This meant that we were emitting a MOV
to an ATTR destination that could have a width of 4 (in dual instanced
mode) so it was necessary to fix the execution size and regioning of the
instruction.
Fortunately, Ken fixed this in 67c5d00273:
i965/vec4/gs: Stop munging the ATTR containing gl_PointSize.
by using a WWWW swizzle instead of a MOV, and as the commit log in that
patch states, we no longer emit instructions with ATTR destinations, so
that makes the fixup code in the generator unnecessary.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
As retrieved from opengl.org and khronos.org. Maintained the APPLE hack
in GL/glext.h manually. Added gl32.h.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Dave Airlie <airlied@redhat.com>
This is identical to OES_texture_cube_map_array support. dEQP has tests
which use this extension. Also it is part of AEP.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This extension should just be available whenever ES 3.1 is available.
With the new extension verification infrastructure, it will only be
enable-able on a #version 310 es shader, rendering the original reason
for having a separate enable moot.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These are the only platforms that current expose OES_geometry_shader.
Once OpenGL ES 3.1 and OES_geometry_shader are enabled on Gen7, this
extension can be enabled there as well.
Gen6 will never get OpenGL ES 3.1, so it will never get this
extension... even though it has the desktop OpenGL extension. Alas.
NOTE: This causes a failure on Gen8+ platforms in
ES3-CTS.gtf.GL3Tests.texture_storage.texture_storage_texture_targets.
The test only fails because it doesn't know that 0x9009 is a valid
value when the extension exists.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This has a separate enable flag because this extension also requires
OES_geometry_shader. It is possible that some drivers may support
OpenGL ES 3.1 and ARB_texture_cube_map but not support
OES_geometry_shader.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen7 can get this extension (and GL_OES_shader_io_blocks) as soon as the
rest of OpenGL ES 3.1 is enabled.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When GL_OES_geometry_shader is enabled, this fixes
dEQP-GLES31.functional.shaders.linkage.geometry.uniform.rules.type_mismatch_1.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The checks in _mesa_has_shader_subroutine are slightly different than
_mesa_has_ARB_shader_subroutine, but they're not different in a way
that matters. The only way to have ctx->Version >= 40 is if
ctx->Extensions.ARB_shader_subroutine is set.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
GL_DOT3_RGB_EXT and GL_DOT3_RGBA_EXT. are nearly identical to
GL_DOT3_RGB and GL_DOT3_RGBA. The only difference is the _EXT
versions do not apply the post-scale. Just smash logscale to 0 so
that RC_OUT_SCALE_1 is always used.
NOTE: I have not actually tested this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixes long standing bug on NV10 and NV20 where using a non-1x RGB or A
post-scale with GL_DOT3_RGB or GL_DOT3_RGBA texture environment would
not work.
The old combiner math uses HALF_BIAS_NORMAL and HALF_BIAS_NEGATE. The
GL_NV_register_combiners defines these as
HALF_BIAS_NORMAL_NV max(0.0, e) - 0.5
HALF_BIAS_NEGATE_NV -max(0.0, e) + 0.5
In order to get the correct result from the dot-product, the
intermediate dot-product must be multiplied by 4. This is a literal
implementation of the GL_ARB_texture_env_dot3 spec. It also requires
using the register combiner post-scale. As a result, the post-scale
cannot be used for the post-scale set by the application.
The new combiner math uses EXPAND_NORMAL and EXPAND_NEGATE. The
GL_NV_register_combiners defines these as
EXPAND_NORMAL_NV 2.0 * max(0.0, e) - 1.0
EXPAND_NEGATE_NV -2.0 * max(0.0, e) + 1.0
Since this fully expands the value to [-1, 1] range, the intermediate
dot-product result is the desired value. This leaves the register
combiner post-scale available for application use.
NOTE: I have not actually tested this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: Note that GL_KHR_blend_equation_advanced and
GL_KHR_blend_equation_advanced_coherent are done.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a direct port of Marek Olšáks patch
"radeonsi: increase performance for DRI PRIME
offloading if 2nd GPU is CIK or VI" to r600.
It uses SDMA for the detiling blit from renderoffload VRAM
to GTT, as SDMA is much faster for tiled->linear blits from
VRAM to GTT.
Testing on a dual Radeon HD-5770 setup reduced the time
for the render offload gpu to get its rendering into
system RAM from approximately 16 msecs for simple rendering
at 1920x1080 pixel 32 bpp to 5 msecs, a > 3x speedup!
This was measured using ftrace to trace the time the radeon kms
driver waited on the dmabuf fence of the renderoffload gpu to
complete.
All in all this brought the time for a flip down from 20 msecs
to 9 msecs, so the prime setup can display at full 60 fps instead
of barely 30 fps vsync'ed.
The current r600 implementation supports SDMA on Evergreen and
later, but not R600/R700 due to some bugs apparently present
in their SDMA implementation.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
For gen < 8, we can't sample from the stencil buffer, which is
required for the ARB_stencil_texturing extension. We'll make a copy of
the stencil data into a new texture that we can sample using the
R8_UINT surface type.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
thread_width_max in the GPGPU walker command limits us to a maximum of
64 threads.
This fixes a crash on Haswell in the OpenGLES 3.1 conformance test
suite which tests the advertised limits of the max invocation counts.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This file was supposed to be added with the previous "svga: add guest
statistic gathering interface" patch but went MIA for some reason.
Reviewed-by: Brian Paul <brianp@vmware.com>
SDMA is much faster for tiled->linear blits from VRAM to GTT.
I have Bonaire in my second PCIe slot.
$ glxinfo | grep OpenGL.renderer
OpenGL renderer string: Gallium 0.4 on AMD TONGA ...
$ DRI_PRIME=1 glxinfo | grep OpenGL.renderer
OpenGL renderer string: Gallium 0.4 on AMD BONAIRE ...
Without SDMA:
$ DRI_PRIME=1 glxgears
8796 frames in 5.0 seconds = 1759.074 FPS
8899 frames in 5.0 seconds = 1779.672 FPS
With SDMA:
$ DRI_PRIME=1 glxgears
12765 frames in 5.0 seconds = 2552.788 FPS
12888 frames in 5.0 seconds = 2577.495 FPS
The 1st GPU is irrelevant. The improvement should be much lower at 60 fps,
but definitely measurable.
SI will get this once we add SDMA blit support for it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This regression is caused because of commit 3190c7ee97
Regression caused by following OpenGL 4.4 spec rules relates to
GL_FRAMEBUFFER_SRGB in Mesa.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Fixes piglit spec/ext_transform_feedback/overflow-edge-cases segfaults
because the query's fence pointer was null.
Tested with Piglit, Sauerbraten, ETQW.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We don't want to flush the command buffer or sync on the fence when ending
a query (that kind of defeats the whole purpose of async queries). Do that
instead in get_query_result().
Tested with Piglit, arbocclude, Sauerbraten game, Nobel Clinician Viewer,
ETQW.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This patch avoid emitting redundant DXSetSamplers command.
Tested with Lightsmark2008, Heaven, MTT piglit, glretrace, viewperf.
Reviewed-by: Brian Paul <brianp@vmware.com>
Put all the clearing related functions in svga_init_clear_functions()
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
define svga_init_clear_functions()
and svga_clear_texture as svga->pipe.clear_texture. This is part of
ARB_clear_texture extension
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
To clear texture this function can be used. This is part of
ARB_clear_texture extension. Basically this extension allows you to
clear texture with given color values.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Saving all blitter states will be done in begin_blit() so that
begin_blit() can be used before performing any blit operation.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
With this patch, guest statistic gathering interface is added to
svga winsys interface that can be used to gather svga driver
statistic. The winsys module can then share the statistic info with
the VMX host via the mksstats interface.
The statistic enums used in the svga driver are defined in
svga_stats_count and svga_stats_time in svga_winsys.h
Reviewed-by: Brian Paul <brianp@vmware.com>
If the shader has indirect access to non-indexable temporaries,
convert these non-indexable temporaries to indexable temporary array.
This works around a bug in the GLSL->TGSI translator.
Fixes glsl-1.20/execution/fs-const-array-of-struct-of-array.shader_test
on DX11Renderer.
Reviewed-by: Brian Paul <brianp@vmware.com>
From about kernel 4.9, GTT mmaps are virtually unlimited. A new
parameter, I915_PARAM_MMAP_GTT_VERSION, is added to advertise the
feature so query it and use it to avoid limiting tiled allocations to
only fit within the mappable aperture.
A couple of caveats:
- fence support is still limited by stride to 262144 and the stride
needs to be a multiple of tile_width (as before, and same limitation as
the current 3D pipeline in hardware)
- the max_gtt_map_object_size forcing untiled may be hiding a few bugs
in handling of large objects, though none were spotted in piglits.
See kernel commit 4cc6907501ed ("drm/i915: Add I915_PARAM_MMAP_GTT_VERSION
to advertise unlimited mmaps").
v2: Include some commentary on mmap virtual space vs CPU addressable
space.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This was found by obs:
I: Program returns random data in a function
E: Mesa no-return-in-nonvoid-function main/program_resource.c:109
v2: Remove the ! on the string (Ian Romanick)
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We always use a coherent read, and ignore the "opt out" enable flag.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This adds the extension enable (so drivers can advertise it) and the
extra boolean state flag, GL_BLEND_ADVANCED_COHERENT_KHR, which can
be set to request coherent blending.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We'll do blending in the shader in this case, so just disable the
hardware blending.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Many GPUs cannot handle GL_KHR_blend_equation_advanced natively, and
need to emulate it in the pixel shader. This lowering pass implements
all the necessary math for advanced blending. It fetches the existing
framebuffer value using the MESA_shader_framebuffer_fetch built-in
variables, and the previous commit's state var uniform to select
which equation to use.
This is done at the GLSL IR level to make it easy for all drivers to
implement the GL_KHR_blend_equation_advanced extension and share code.
Drivers need to hook up MESA_shader_framebuffer_fetch functionality:
1. Hook up the fb_fetch_output variable
2. Implement BlendBarrier()
Then to get KHR_blend_equation_advanced, they simply need to:
3. Disable hardware blending based on ctx->Color._AdvancedBlendEnabled
4. Call this lowering pass.
Very little driver specific code should be required.
v2: Handle multiple output variables per render target (which may exist
due to ARB_enhanced_layouts), and array variables (even with one
render target, we might have out vec4 color[1]), and non-vec4
variables (it's easier than finding spec text to justify not
handling it). Thanks to Francisco Jerez for the feedback.
v3: Lower main returns so that we have a single exit point where we
can add our blending epilogue (caught by Francisco Jerez).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This will be used for emulating GL_KHR_advanced_blend_equation features
in shader code. We'll pass in the blending mode that's in use, and use
that in (effectively) a switch statement in the shader.
v2: Use the new _AdvancedBlendMode field.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
I'm about to add more error conditions to this function, so I wanted to
move the current spec citation above the code that checks it. Indenting
it required reformatting, so I tried to move it to our newer style.
While there, I also decided to drop some GL type usage, and drop the
unnecessary "_mesa_" prefix on a static function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This will be useful for a number of things:
- Checking the current advanced blending mode against the shader's
blend_support_* qualifiers.
- Disabling hardware blending when emulating advanced blending.
- Uploading the current advanced blending mode as a state var.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Don't allow them in glBlendEquationSeparate[i], though, as required
by the spec.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Since each qualifier represents a blending mode the shader can be used
with, we take the union of all possible modes when linking.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2 (Ken): Add a BLEND_NONE enum value (no qualifiers in use).
v3 (Ken): Rename gl_blend_support_qualifier to gl_advanced_blend_mode.
v4 (Ken): Mark map[] as static const (Ilia).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2 (Ken): Fix enum values, drop _mesa_BlendBarrierKHR stub as Curro has
already implemented it.
v3 (Ken): Rework for _mesa_BlendBarrierKHR -> _mesa_BlendBarrier rename.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Note that _mesa_BlendBarrierMESA is not currently hooked up in the
glapi XML, so we can just rename it. We'll hook it up for the
KHR_blend_equation_advanced extension shortly.
We may as well use the ES 3.2 core name with no suffixes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We want to insert code in each of the predecessors of the end block.
This code includes a nir_if, which would split the block, altering
the set. To avoid that, I emitted a dead constant at the end of each
block before splitting it, so that the set of predecessors remained
unchanged. This was admittedly ugly.
Connor suggested instead saving a copy of the set, so we can iterate
it safely. This is also a little ugly, but a much better plan.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We want to insert the code at the end of the program. Looping over
all the functions (of which there was only one) was the old way of doing
this, but now we have nir_shader_get_entrypoint(), so let's use it.
Suggested by Connor Abbott.
v2: Update for nir_shader_get_entrypoint API change.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason suggested adding an assert(function->impl) here. All callers
of this function actually want ->impl, so I decided just to change
the API.
We also change the nir_lower_io_to_temporaries API here. All but one
caller passed nir_shader_get_entrypoint(), and with the previous commit,
it now uses a nir_function_impl internally. Folding this change in
avoids the need to change it and change it back.
v2: Fix one call I missed in ir3_compiler (caught by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This changes the pass internals to work with a nir_function_impl
directly rather than a nir_function. The next patch will change
the API.
v2: Rebase after framebuffer fetch landed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The reason why it was safe for the scheduler to ignore the side
effects of framebuffer write instructions was that its side effects
couldn't have had any influence on any other instruction in the
program, because we weren't doing framebuffer reads, and framebuffer
writes were always non-overlapping. We need actual memory dependency
analysis in order to determine whether a side-effectful instruction
can be reordered with respect to other instructions in the program.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We weren't checking the fs_inst::target field when comparing whether
two instructions are equal. For FB writes it doesn't matter because
they aren't CSE-able anyway, but this would have become a problem with
FB reads which are expression-like instructions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
brw_set_dp_read_message() was setting the data cache as send message
SFID on Gen7+ hardware, ignoring the target cache specified by the
caller. Some of the callers were passing a bogus target cache value
as argument relying on brw_set_dp_read_message not to take it into
account. Fix them too.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is not enabled on the original Gen4 part because it lacks surface
state tile offsets so it may not be possible to sample from arbitrary
non-zero layers of the framebuffer depending on the miptree layout (it
should be possible to work around this by allocating a scratch surface
and doing the same hack currently used for render targets, but meh...).
On Gen9+ even though it should mostly work (feel free to force-enable
it in order to compare the coherent and non-coherent paths in terms of
performance), there are some corner cases like 1D array layered
framebuffers that cannot be handled easily by the non-coherent path
because of the incompatible layout in memory of 1D and 2D miptrees (it
should be possible to work around this too by doing state-dependent
recompiles, but it's hard to care enough since Gen9 has native support
for coherent render target reads...)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a no-op if the platform supports coherent framebuffer fetch,
-- If it doesn't we just need to flush the render cache and invalidate
the texture cache in order for previous rendering to be visible to
framebuffer fetch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This iterates over the list of attached render buffers and binds
appropriate surface state structures to the binding table block
allocated for shader framebuffer read.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows the caller to bind a miptree using a texture target other
than the one it it was created with. The code should work even if the
memory layouts of the specified and original targets don't match, as
long as the caller only intends to access a single slice of the
miptree structure.
This will be exploited by the next commit in order to support
non-coherent framebuffer fetch of a single layer of a 3D texture
(since some generations lack the minimum array element control for 3D
textures bound to the sampler unit), and multiple layers of a 1D array
texture (since binding it as an actual 1D array texture would require
state-dependent recompiles because the same shader couldn't
simultaneously work for 1D and 2D array textures due to the different
texel fetch coordinate ordering).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit does three different things in a single pass in order to
keep the amount of churn low: Remove the for_gather boolean argument
which was unused, pass the isl_view argument by value rather than by
reference since I'll have to modify it from within the function, and
add a target argument to allow callers to bind textures using a target
other than the original. The prototype of the function now looks
like:
void brw_emit_surface_state(struct brw_context *brw,
struct intel_mipmap_tree *mt,
GLenum target, struct isl_view view,
uint32_t mocs, uint32_t *surf_offset, int surf_index,
unsigned read_domains, unsigned write_domains);
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The logic to calculate the right layout and dimensionality for a given
GL texture target is going to be useful elsewhere, factor it out from
intel_miptree_get_isl_surf().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is required because the sampler unit used to fetch from the
framebuffer is unable to interpret non-color-compressed fast-cleared
single-sample texture data. Roughly the same limitation applies for
surfaces bound to texture or image units, but unlike texture sampling,
non-coherent framebuffer fetch is by definition non-coherent with
previous rendering, so the brw_render_cache_set_check_flush() call can
be omitted except after resolve.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This gets rid of the duplication of logic between nir_setup_outputs()
and get_frag_output() by allocating fragment output temporaries lazily
whenever get_frag_output() is called. This makes nir_setup_outputs()
a no-op for the fragment shader stage.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The problem with the current approach is that driver output locations
are represented as a linear offset within the nir_outputs array, which
makes it rather difficult for the back-end to figure out what color
output and index some nir_intrinsic_load/store_output was meant for,
because the offset of a given output within the nir_output array is
dependent on the type and size of all previously allocated outputs.
Instead this defines the driver location of an output to be the pair
formed by its GLSL-assigned location and index (I've borrowed the
bitfield macros from brw_defines.h in order to represent the pair of
integers as a single scalar value that can be assigned to
nir_variable_data::driver_location). nir_assign_var_locations is no
longer useful for fragment outputs.
Because fragment outputs are now allocated independently rather than
within the nir_outputs array, the get_frag_output() helper becomes
necessary in order to obtain the right temporary register for a given
location-index pair.
The type_size helper passed to nir_lower_io is now type_size_dvec4
rather than type_size_vec4_times_4 so that output array offsets are
provided in terms of whole array elements rather than in terms of
scalar components (dvec4 is the largest vector type supported by the
GLSL so this will cause all individual fragment outputs to have a size
of one regardless of the type).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Most likely we had only ever used this macro on bitfields of less than
31 bits -- That's going to change shortly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm about to change how fragment shader output locations are
represented, so the generic nir_intrinsic_store_output implementation
that assumes that outputs are just contiguous elements in the big
nir_outputs array won't work anymore. This somewhat simplified
implementation of nir_intrinsic_store_output for fragment shaders
should be functionally equivalent to the current fall-back one.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be required for the next commit since the non-coherent path
makes use of the fragment coordinates implicitly, so they need to be
calculated.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The result of a framebuffer fetch from a multisample FBO is inherently
per-sample, so the spec requires at least those sections of the shader
that depend on the framebuffer fetch result to be executed once per
sample.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unfortunately due to the inconsistent meaning of some surface state
structure fields, we cannot re-use the same binding table entries for
sampling from and rendering into the same set of render buffers, so we
need to allocate a separate binding table block specifically for
render target reads if the non-coherent path is in use.
The slight noise is due to the change of
brw_assign_common_binding_table_offsets to return the next available
binding table index rather than void.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Some of the following changes in this series are specific to the
non-coherent path, so I need some way to tell whether the coherent or
non-coherent path is in use. The flag defaults to the value of the
gl_extensions::MESA_shader_framebuffer_fetch enable so that it can be
overridden easily on hardware that supports both framebuffer fetch
extensions in order to test the non-coherent path, like:
MESA_EXTENSION_OVERRIDE=-GL_EXT_shader_framebuffer_fetch
(Of course trying to force-enable the coherent framebuffer fetch
extension on hardware without native support won't work and lead to
assertion failures).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This boolean flag was being used for two different things:
- To set the brw_wm_prog_data::dual_src_blend flag. Instead we can
just set it based on whether the dual_src_output register is valid,
which will be the case if the shader writes the secondary blending
color.
- To decide whether to call emit_single_fb_write() once, or in a loop
that would iterate only once, which seems pretty useless.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This requires emitting a series of copies at the top of the program
from each output variable to the corresponding temporary. The initial
copy can be skipped for non-framebuffer fetch outputs whose initial
value is undefined, and the final copy needs to be skipped for
read-only outputs (i.e. gl_LastFragData), since it would be illegal to
emit a store output intrinsic for it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The NIR representation of framebuffer fetch is the same as the GLSL
IR's until interface variables are lowered away, at which point it
will be translated to load output intrinsics. The GLSL-to-NIR pass
just needs to copy the bits over to the NIR program.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need the source to be in r0-r3, so make a new register class for it.
It will be up to the surrounding passes to make sure that the r0-r3
allocation of its source won't conflict with anything other class
requirements on that temp.
Respect intel_miptree_slice::x_offset,y_offset and
intel_mipmap_tree::offset. All three may be non-zero when glReadPixels
is called on an EGLImage created from the non-base slice of a miptree.
Patch 2/2 that fixes test
'dEQP-EGL.functional.image.create.gles2_cubemap_*'.
Reported-by: Haixia Shi <hshi@chromium.org>
Diagnosed-by: Haixia Shi <hshi@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Change-Id: I4b397b27e55a743a7094d29fb0a6a4b6b34352b0
When glEGLImageTargetRenderbufferStorageOES() was given an EGLImage
created from the non-base slice of a miptree,
intel_image_target_renderbuffer_storage() forgot to apply the intra-tile
offsets __DRIimage::tile_x,tile_y to the miptree layout.
This patch fixes the problem with a quick hack suitable for
cherry-picking. A proper fix requires more thorough plumbing in
intel_miptree_create_layout() and brw_tex_layout().
Patch 1/2 that fixes test
'dEQP-EGL.functional.image.create.gles2_cubemap_*'.
Reported-by: Haixia Shi <hshi@chromium.org>
Diagnosed-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Change-Id: I8a64b0048a1ee9e714ebb3f33fffd8334036450b
This pulls isl and genxml into a single make file so that they can properly
build in parallel. This isn't terribly important now as genxml just
generates sources which happens serially first anyway but it will be more
important as we add more stuff to src/intel.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The tests assumed that isl would be in the include path but that usually
isn't the case. Instead, we usually have src/intel and you need to add an
"isl/" prefix.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If the surface has a layout of GEN4_2D then we need to compute a normal 2D
alignment and not use the magic linewar 1D alignment.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
The Sky Lake 1D layout is only used if the surface is linear. For tiled
surfaces such as depth and stencil the old gen4 2D layout is used.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
In some programs, we can have very deep dominance trees and the recursion
can cause us to risk stack overflows. Instead, we replace the recursion
with a pair of loops, one at the start and one at the end. This is
functionally equivalent to what we had before and it's actually a bit
easier to read in the new form without the recursion.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Prior to this commit rename_variables_block() is recursively called,
performing a depth-first traversal of the control flow graph. The
function uses a non-trivial amount of stack space for local variables,
which puts us in danger of smashing the stack, given a sufficiently deep
dominance tree.
XCOM: Enemy Within contains a shader with such a dominance tree (1574
nir_blocks in total, depth of at least 143).
Jason tells me that he believes that any walk over the nir_blocks that
respects dominance is sufficient (a DFS might have been necessary prior
to the introduction of nir_phi_builder).
In fact, the introduction of nir_phi_builder made the problem worse:
rename_variables_block(), walks to the bottom of the dominance tree
before calling nir_phi_builder_value_get_block_def() which walks back to
the top of the dominance tree...
In any case, this patch ensures we avoid that problem as well.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We can take advantage of the fact that multi_fence does the obvious thing
with NULL fences.
This fixes unflushed fences that can get stuck due to empty IBs.
Some front buffer rendering was in the wrong position. This included
scissored clears, glDrawPixels and glCopyPixels. The problem was the
y coordinate passed to putImage() didn't match the y coordinate passed
to getImage().
We fix this by setting xrb->map_y to the inverted coordinate in
swrast_map_renderbuffer() which is used later by the putImage() call.
Also pass xrb->map_y to getImage() to be symmetric.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97426
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
radeonsi needs to do some operations (DCC decompression) for OpenGL-OpenCL
interop and this is the only way to make it coherent with the current
context. It can optionally be set to NULL.
Reviewed-by: Brian Paul <brianp@vmware.com>
Always use 3 buffers when flipping. With only 2 buffers, we have to wait
for a flip to complete (which takes non-0 time even with asynchronous
flips) before we can start working on the next frame. We were previously
only using 2 buffers for flipping if the X server supports asynchronous
flips, even when we're not using asynchronous flips. This could result
in bad performance (the referenced bug report is an extreme case, where
the inter-frame stalls were preventing the GPU from reaching its maximum
clocks).
I couldn't measure any performance boost using 4 buffers with flipping.
Performance actually seemed to go down slightly, but that might have
been just noise.
Without flipping, a single back buffer is enough for swap interval 0,
but we need to use 2 back buffers when the swap interval is non-0,
otherwise we have to wait for the swap interval to pass before we can
start working on the next frame. This condition was previously reversed.
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97260
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The pipeline layout affects shader compilation because it is what
determines binding table locations as well as whether or not a particular
buffer has dynamic offsets. Since this affects the generated shader, it
needs to be in the hash. This fixes a bunch of CTS tests now that the CTS
is using a pipeline cache.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This option makes installed Vulkan ICD files contain only a driver library
name and not a path. This is intended for distros to help them work around
multi-arch issues.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This would cause gl_FragStencilRef to be counted as a color output
incorrectly during the precompile phase, which leads to unnecessary
recompilation on master and could trigger an assertion failure in
fs_visitor::emit_fb_writes() on my i965-fb-fetch branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is the set of shader outputs whose initial value is provided to
the shader by some external means when the shader is executed, rather
than computed by the shader itself.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since they cannot be written. This prevents adding fragment outputs
to the OutputsWritten set that are only read from via the
gl_LastFragData array but never written to.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Apparently this pass can only handle elimination of a single built-in
fragment output array, so the presence of gl_LastFragData (which it
wouldn't split correctly anyway) could prevent it from splitting the
actual gl_FragData array. Just match gl_FragData by name since it's
the only built-in it can handle.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The EXT_shader_framebuffer_fetch extension defines alternative
language for GLES2 shaders where user-defined fragment outputs are not
allowed. Instead of using inout user-defined fragment outputs the
shader is expected to read from the gl_LastFragData built-in array.
In addition this allows using the same language on desktop GLSL
versions prior to 4.2 that support the deprecated gl_FragData built-in
in preparation for the MESA_shader_framebuffer_fetch desktop GL
extension.
Both legacy and user-defined inout outputs have a common
representation at the GLSL IR level, so it shouldn't make any
difference for optimization passes and back-ends whether the
application is using gl_LastFragData or user-defined outputs, all
they'll see is a variable dereference of a fragment output at a
certain interface location with the fb_fetch_output bit set to one.
v2: Don't define the built-in variable on GLSL versions for which
gl_FragData exists but is deprecated. (Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
According to the EXT_shader_framebuffer_fetch extension the inout
qualifier can be used on ESSL 3.0+ shaders to declare a special kind
of fragment output that gets implicitly initialized with the previous
framebuffer contents at the current fragment coordinates. In addition
we allow using the same language to define FB fetch outputs in GLSL
1.3+ shaders in preparation for the desktop MESA_shader_framebuffer_fetch
extensions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The GLSL IR representation of framebuffer fetch amounts to a single
bit in the ir_variable object applicable to fragment shader outputs.
The flag indicates that the variable will be implicitly initialized to
the previous contents of the render buffer at the same fragment
coordinates and sample index.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Both MESA_shader_framebuffer_fetch_non_coherent and the non-coherent
variant of KHR_blend_equation_advanced will use this driver hook to
request coherency between framebuffer reads and writes. This
intentionally doesn't hook up glBlendBarrierMESA to the dispatch layer
since the extension isn't exposed to applications yet, see [1]
for more details.
[1] https://lists.freedesktop.org/archives/mesa-dev/2016-July/124028.html
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This can currently only give true as result since the only way you can
expose EXT_shader_framebuffer_fetch right now is by flipping the
MESA_shader_framebuffer_fetch bit, but that could potentially change
in the future, see [1] for an explanation.
[1] https://lists.freedesktop.org/archives/mesa-dev/2016-July/124028.html
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows drivers to expose EXT_shader_framebuffer_fetch in GLES2+
contexts if desired. Note that this adds boolean flags for two MESA
extensions, but only the EXT GLES-only extension is exposed for the
moment, see the cover letter of this series [1] for the rationale.
[1] https://lists.freedesktop.org/archives/mesa-dev/2016-July/124028.html
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Like Fermi, textures and samplers are aliased between 3D and compute,
especially the TIC_FLUSH/TSC_FLUSH methods and we have to re-validate
these resources when switching between the two pipelines.
This fixes a GPU hang with Elemental (and most likely with other UE4 demos).
Tested on GK107 and GM107.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
CC: <mesa-stable@lists.freedesktop.org>
Duplicate line is currently on 1535.
Identified by Clang, when run through Eric Anholt's Travis harness.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will hopefully fix wget from x.org (no real reason explained in
Travis CI bug reports), and may also mean that we can enable LLVM driver
builds.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
Travis has been broken a couple of times by configure.ac updates. To make
it useful, auto-update the version necessary.
This could potentially be used for other dependencies, too, but those get
bumped less frequently.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
This happens when three byte "00 00 03" is partly loaded to
vlc->buffer, thus at the bottom of buffer with valid bits is
"00" or "00 00" and left like "00 03" or "03" in the data,
so that it will not be detected by three byte emulation check.
The reason for that is the escaped bit was set to 0 from the
rbsp init.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
drmPrimeHandleToFD() will return the same GEM handle every time the same
buffer is imported, even from a different prime FD. Since GEM handles
are not reference counted, we need to make sure that each GEM handle is
referenced only by one display target struct, by looking it up in
kms_sw->bo_list first and bumping the refcount of the found dt on hit
and falling back to creating a new dt only on miss.
v2: Split into separate function.
Use helper function for lookup.
v3 [Emil Velikov]:
Rename kms_sw_displaytarget_{lookup,find_and_ref} (Jordan)
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com> (v2)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
As a preparation to use the lookup in more than once place, move the
code that looks up given KMS/GEM handle to a separate function. This
change should not introduce any functional changes.
v2: Split into separate patch.
Move lookup code into separate function.
v3 [Emil Velikov]:
Rename kms_sw_displaytarget_{lookup,find_and_ref} (Jordan)
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com> (v2)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Currently kms_sw_displaytarget_add_from_prime() allocates the struct and
fills in only some of the fields, resulting in a half-baked struct that
needs to be further completed by the caller. To make this a bit more
consistent, pass width, height and stride to this function and fill in
everything there, so that caller can take the returned struct as is.
v2: Split from one big patch into four fixing one thing at a time.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Currently the code creates a display target struct with refcount field
initialized to 1 and then the caller again increments it, leading to
a leaked reference. Let's remove the unnecessary increment.
v2: Split from one big patch into four fixing one thing at a time.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Both ARB_shader_subroutine and the GL core spec doesn't list any
error when the program is not linked.
We left a error generation for the uniform location, in order to be
consistent with other methods from the spec that generate them.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The label `out:` calls `destroy()` which dereferences `ctx`.
This is unnecessary as there is nothing to destroy.
Immediately return instead.
CovID: 1258255
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Before this commit, GetProgramInterfaceiv for pname ACTIVE_RESOURCES
and all the <shader>_SUBROUTINE_UNIFORM programInterface were
returning the count of resources on the shader program using that
interface, instead of the num of uniform resources. This would get a
wrong value (for example) if the shader has an array of subroutine
uniforms.
Note that this means that in order to get a proper value, the shader
needs to be linked, something that is not explicitly mentioned on
ARB_program_interface_query spec, but comes from the general
definition of active uniform. If the program is not linked we
return 0.
v2: don't generate an error if the program is not linked, returning 0
active uniforms instead, plus extra spec references (Tapani Palli)
Fixes GL44-CTS.program_interface_query.subroutines-compute
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Removes the following GCC warning:
../../../../../src/gallium/state_trackers/va/picture.c:542:17: warning:
unused variable 'coded_size' [-Wunused-variable]
unsigned int coded_size;
^~~~~~~~~~
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Several fixes have been added as part of this as listed below:
1) Fix the mask and add disassembler handling for STATE_DS, STATE_HS
as the mask returned wrong values of the fields.
2) Fix the GEN_TYPE_ADDRESS/GEN_TYPE_OFFSET decoding - the address/
offset were handled the same way as the other fields and that gives
the wrong values for the address/offset.
3) Decode nested/recurssive structures - Many packets contain nested
structures, ex: 3DSATE_SO_BUFFER, STATE_BASE_ADDRESS, etc contain MOC
structures. Previously, the aubinator printed 1 if there was a MOC
structure. Now we decode the entire structure and print out its fields.
4) Print out the DWord address along with its hex value - For a better
clarity of information, it is helpful to print both the address and
hex value of the DWord along with the DWord count. Since the DWord0
contains the instruction code and the instruction length, it is
unnecessary to print the decoded values for DWord0. This information
is already available from the DWord hex value.
5) Decode the <group> and the corresponding fields in the group- The
<group> tag can have fields of several types including structures. A
group can contain one or more number of fields and this has be correctly
decoded. Previously, aubinator did not decode the groups or the
fields/structures inside them. Now we decode the <group> in the
instructions and structures where the fields in it repeat for any number
of times specified.
v2: Fix the formatting (per Matt)
Make the start and end pos calculation to extract fields from a DWord
more appropriate by moving %32 away from mask() method
Signed-off-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
The Aubinator tool is designed to help the driver developers in debugging
the driver functionality by decoding the data in the .aub files.
Primary Authors of this tool are Damien Lespiau <damien.lespiau at intel.com>
and Kristian Høgsberg Kristensen <krh at bitplanet.net>.
v2: Review comments are incorporated by Sirisha Gandikota as below:
1) Make Makefile.am more crisp, reuse intel_aub.h from libdrm (per Emil)
2) Aubinator will use platform name instead of GEN number (per Matt)
3) Disassmebler gets created based on pciid rather then GEN number (per Matt)
4) Other formatting comments (per Ken, Matt and Emil)
Signed-off-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
In ca2a8e5628, we updated the format table to add more formats (most of
which are new on SKL) but accidentally marked some integer formats as
filterable. You can't filter an integer format.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Some hardware can't render to color/depth buffers of mixed bitness. When
that happens a fallback has to happen, but this allows the driver to
express that this isn't an optimal scenario. The purpose of this is to
remove such fbconfigs from the GLX/EGL config list.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Some GPUs, notably nv3x/nv4x can't render to mismatched color/zs
framebuffer depths. Fallbacks can be done by the driver, with shadow
surfaces, but no reason to encourage applications to select non-matching
glx visuals.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We can't actually clear these images normally because we can't render to
them. Instead, we have to manually unpack the rgb9e5 color value on the
CPU and clear it as R32_UINT. We still have a bit of work to do to clear
non-power-of-two images, but this should get all of the power-of-two clears
working on at least Haswell. This fixes three of the new Vulkan CTS tests
in the dEQP-VK.api.image_clearing.clear_color_image.* group.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
There were a lot of formats where support was added on Haswell or later but
we never updated the format table.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
We have implicitly been not advertising these formats since we had them
turned off in the format capabilities table. We are about to update that
table and this prevents a change in behavior. The only change in behavior
created by this patch is that we no longer advertise support for
R16G16B16_FLOAT which means that it's now renderable which seems like a
bonus. Maybe someday we'll want to change things to start supporting
16-bit RGB formats natively but, at the moment, there's no need.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The whole point of using RGBX is so that we can render to it so if it isn't
renderable, that kind-of defeats the purpose. Some formats (one example is
R32G32B32X32_SFLOAT) exist in the format table but aren't actually
renderable. Eventually, we'd like to get away from RGBX entirely, but this
fixes hangs on BDW today.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
In the case where dri2_initialize is called with a TestOnly display,
the display is not actually initialized, so dri2_egl_display always
fails, and we cannot do any reference counting.
Fixes piglit spec@egl_khr_create_context@verify gl flavor (reproducible
with LIBGL_ALWAYS_SOFTWARE=1).
Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reported-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This removes unnecessary error checks on return result of mtx_lock
and cnd_wait calls as in all other places in MESA source since there
is no chance that any of these functions return any of error codes
in current implementation.
This patch also removes a redundent _eglError call that follows
EGL_FALSE check in the bottom of dri2_client_wait_sync.
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This seems to make sense, the image is bound to a subset of the buffer
so the image size should be from the bound size not the underlying
object.
This fixes:
GL44-CTS.shader_image_size.advanced-nonMS-fs-int
v2: get mininum of the two values, same as we write to the hw.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The only reason we should throw INITIALIZATION_FAILED is if we have found
useable intel hardware but have failed to bring it up for some reason.
Otherwise, we should just throw INCOMPATIBLE_DRIVER which will turn into
successfully advertising 0 physical devices
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This needs to set the src swizzle so it doesn't access the .zw
members ever when we are just emitting a 0 constant here.
This fixes:
vert-conversion-explicit-dvec3-bvec3.shader_test
and a bunch of other fp64 tests on softpipe and radeonsi.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We used to upload the indices when they changed, now we rely
on the drivers calling the correct hook to have the values
updated from the context storage.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
This plugs the subroutine index updates into the i965 backend,
where it loads constants.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Acked-by: Andres Gomez <agomez@igalia.com>
This writes the subroutine indicies to the program storage for
a stage. This API is intended to be used by drivers to update
the uniform storage before uploading to the hw.
This isn't the most thread safe effort, but it will be significantly
more multi-context safe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
One piece of ARB_shader_subroutine I ignored was the fact that it
needs to store the subroutine index data per context and not per
shader program.
There is one CTS test that tests this:
GL45-CTS.shader_subroutine.multiple_contexts
However the test only does a write to context and readback,
it never renders using the values, so this is enough to fix the
test however not enough to do what the spec says.
So with this patch the info is now stored per context, but
it gets updated into the program at UseProgram and when the
values are inserted into the context, which won't help if
multiple contexts are in use in multiple threads.
v1.1: cleanups and nit-picks (Andres)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Without this the following line will segfault and we don't get to
see the results of the validate_assert() above.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
If we're going to skip setting up vertex input data in them, we should
probably not leave them as vertex inputs with a driver_location that
happens to alias to something else.
Fixes a regression in glsl-mat-attribute on vc4 when enabling GTN.
v2: Change commit message shortlog, lower the new globals away before
handing off to the driver.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Generally you'd see the gl_Color reference first and get some cursor set.
However, in piglit draw-pixel-with-texture we're now seeing the TexCoord
dereferenced first.
Reviewed-by: Rob Clark <robdclark@gmail.com>
In the GLSL-to-NIR conversion of VC4, I had a bit of trouble with what I
was calling the "state uniforms" that I was putting into the NIR fighting
with its other lowering passes. Instead of using magic uniform base
numbers in the backend, follow the lead of load_user_clip_plane and just
define system values for them.
v2: Fix unintended change to channel_num, drop unspecified const_index
value on blend_const_color_r_float.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We let the user believe we support some transfer formats which we don't.
This can lead to crashes when actually trying to use those formats for
example on dEQP-VK.api.copy_and_blit.image_to_image.* tests.
Let all formats we can render to or sample from as meta implements transfers
using attachments.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Implementation previously used value itself as the key, however after
hash implementation change by ee02a5e we cannot use 0 as key.
v2: use constant pointer as the key and implement comparison
for contents (Eric Anholt)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97309
Needed to fix android build after commit 16a9fcb
which enabled genxml for gen{8,9} state setup
This is the last patch needed, android build tested successfully.
Needed to fix android build after commit e198983
which enabled genxml for gen{7,75} state setup
Android build fix for gen{8,9} will follow as incremental patch,
build tested successfully with all per-gen patches applied.
Needed to fix android build after commit c8bc1ae
where new per-gen genX_blorp.c source replaced gen6_blorp.c for gen6
Android build fixes for gen{7,75} and gen{8,9} will follow as incremental patches,
build tested successfully with all per-gen patches applied.
We're going to handle output qualifiers here too, and calling it "inout"
seems to be the going convention.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Not providing a path allows the ICD to work on multi-arch systems but
breaks it if you install anywhere other than /usr/lib. Given that users
may be installing locally in .local or similar, we probably do want to
provide a filename. Distros can carry a revert of this commit if they want
an intel_icd.json file without the path.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Chad Versace <chad@kiwitree.net>
We only did depth clamp when the value was written from the fs.
This is very wrong both for d3d10 and GL, and only passed the
corresponding piglit test due to pure luck (it no longer does
with the enhanced test).
Also, interpolation clamped values to 1.0 always, which can legitimately
happen if depth clip is disabled, so fix that as well (untested).
There is one unresolved issue left, d3d10 always does depth clamping,
whereas GL does not (but does [0,1] clamp instead for fs depth outputs)
- this information isn't in any gallium state object, leave it as-is
for now (though it looks like llvmpipe misses the [0,1] clamp as well).
This (with the previous patch) fixes piglit depth-clamp-range test.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This wasn't handled before (the result was that no matter what value got
clamped, it always ended up as the near value in this case) (if clamping
actually happened).
Fix this by using the util helper for that (the math is otherwise "mostly"
the same, mostly because there could actually be differences due to float
rounding, but I don't even know which one would be more correct).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Scheduling barriers are implemented by placing a dependence on every
node before and after the barrier. This is unnecessary as we can limit
the number of nodes we place dependencies on to those between us and the
next barrier in each direction.
Runtime of dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
is reduced from ~25 minutes to a little more than three.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94681
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
var_range_end(v, n) loops over the n components of variable number v and
finds the maximum value, giving the last use of any component of v.
Therefore it expects v to correspond to the variable associated with the
.x channel of the VGRF.
var_from_reg() however returns the variable for the first channel of the
VGRF, post-swizzle.
So, if the last register had a swizzle with y, z, or w in the swizzle
component, we would read out of bounds. For any other register, we would
read liveness information from the next register.
The fix is to convert the src_reg to a dst_reg in order to call the
dst_reg version of var_from_reg() that doesn't consider the swizzle.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Allows shader-db to work on vec4 programs (has been broken since
shader-db commit 646df5ca98b2 from April!)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The previous bit disables the whole clipper, including the regular
viewport-related clipping that would go on. The two new bits disable
near and far clipping (separately, as verified with the
depth-clamp-range piglit).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
vc4 wants to have per-scalar IO load/stores so that dead code elimination
can happen on a more granular basis, which it has been doing in the
backend using a multiplication by 4 of the intrinsic's driver_location.
We can represent it properly in the NIR using the first_component field,
though.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The previous nir_load_system_value(b, nir_intrinsic_load_whatever), 0) was
rather verbose, when system values should be easy to generate.
The index is left out because only one system value had an index included
in it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reduces the diff between GLSL-to-NIR and TGSI-to-NIR, and gives NIR
more optimization to work on.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GLSL-to-NIR generates system value usage, and vc4/freedreno would both
like the system value instead of the varying, so switch this pass over to
it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch improves the performance of Vaapi Encode by enabling dual
instances encoding. flush function is not called after each end_frame
call. radeon/vce will do flush whenever 2 frames are submitted for
encoding. Implement sync surface function to flush only if the frame
hasn't been flushed yet.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Now that we're using genxml for everything, we no longer need the
hand-rolled state emit helpers.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The new helper emits surface states and the binding table in one go. It's
nice to have it pulled out of the main blorp_exec function.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
At the moment, it's only used for gen6 but that will change soon. We use
the genX prefix for recompiled things in the Vulkan driver. It isn't
great, but it seems to have worked ok.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Blorp never uses points or lines and the default values of 0 are perfectly
fine. Explicitly setting them is just noise.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
All three go together on SNB so let's keep them together for gen7+ as well.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We're about to start replacing blorp state setup code with packing structs
and we want to feel free to delete files as we go.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
More than half of the stuff in intel_reg.h had nothing whatsoever to do
with registers and really belongs in brw_defines.h anyway.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The actual data storred is in float, UNORM24, or UNORM16 depending on the
actual depth format.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This uses the unblocked time of the exit assigned to each available
node to attempt to unblock exit nodes as early as possible,
potentially reducing the runtime of the shader when an exit branch is
taken. There is a natural trade-off between terminating the program
as early as possible and reducing the worst-case latency of the
program as a whole (since this will typically move exit-unblocking
nodes closer to its dependencies potentially causing additional stalls
of the execution pipeline), but in practice the bandwidth and ALU
cycle savings from terminating the program earlier tend to outweigh
the slight increase in worst-case program execution latency, so it
makes sense to prefer nodes likely to unblock an earlier exit
regardless of the latency benefits of other available nodes.
I haven't observed any benchmark regressions from this change after
testing on VLV, HSW, BDW, BSW and SKL. The FPS of the GfxBench
Manhattan benchmark increases by 10%-20% and the FPS of Unigine Valley
improves by roughly 5% depending on the platform and settings.
The change to the register pressure-sensitive heuristic is rather
conservative and gives precedence to the existing heuristic in order
to avoid increasing register pressure and causing spill count and SIMD
width regressions in shader-db. It may make sense to revisit this
with additional performance data.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This adds a bit of metadata to schedule_node that will be used to
compare available nodes in the scheduling heuristic code based on
which of them unblocks the earliest successor exit node. Note that
assigning exit nodes wouldn't be necessary in a bottom-up scheduler
because we could achieve the same effect by scheduling the exit nodes
themselves appropriately.
No shader-db changes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The critical path of each node is calculated by induction based on the
critical paths of its children, which can be done in a post-order
depth-first traversal of the dependency graph. The current code
implements graph traversal by iterating over all nodes of the graph
and then recursing into its children -- But it turns out that
recursion is unnecessary because the lexical order of instructions in
the block is already a good enough reverse post-order of the
dependency graph (if it weren't a reverse post-order some instruction
would have been located before one of its dependencies in the original
ordering of the basic block, which is impossible), so we just need to
walk the instruction list in reverse to achieve the same result more
efficiently.
No shader-db changes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
ANY4H is more efficient than ANY8H and ANY16H because it makes sure
that whenever a whole subspan hits a discard statement it gets
disabled by the EU until the end of the program, regardless of whether
the discard condition is uniform across all channels of the SIMD8-16
thread. OTOH ANY8H/ANY16H would cause the rest of the program to be
executed for *all* channels if only one of the channels hadn't taken
the discard branch, potentially increasing the bandwidth and ALU usage
of the program unnecessarily.
This change increases the FPS by over 3x of a simple micro-benchmark
that discards a bunch of fragments and then does a single costly
texturing operation. I've just re-verified the FPS change on HSW and
SKL, but I expect all platforms from Gen6 up to get a similar benefit.
Note that we could potentially be more aggressive and use the NORMAL
predicate to discard individual channels, but that would need to
happen post-scheduling because the scheduler currently doesn't care to
reorder HALT instructions with respect to other instructions, and the
NORMAL predicate would cause the results of subsequent derivative
computations to become undefined -- If the scheduler didn't reorder
HALT instructions it would actually be safe to switch to NORMAL
because the behavior of derivative computations after a non-uniform
discard statement is undefined by the GLSL spec, but that would make
the optimization implemented by one of the following commits somewhat
more difficult.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This may have been the reason people ran into problems with
non-uniform HALT instructions and ended up using the inefficient
ANY16H/ANY8H predicates instead of ANY4H or NORMAL in order to prevent
non-uniform discard. The HALT instruction is able to handle
non-uniform execution masks just fine.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Valgrind complains with a "Conditional jump or move depends on
uninitialised value(s)" warning due to opaque being conditionally
initialized. However in the punchthrough_alpha == true case, it is
always initialized, so just flip the condition around to silence the
warning.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The max basevertex is already computed and added into max_index by the
caller, _tnl_draw_prims.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The vkCmdDbgMarker{Begin,End} symbols are exported, yet the json does no
advertise that the driver supports the extension. Furthermore the
functions are empty stubs.
Remove those until we get a proper implementation and json notation.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
With version 1 of the Loader interface there is an internal/private symbol
(vk_icdGetInstanceProcAddr) which is used to retrieve all the API from the
Vulkan entrypoints from the ICD. Implying that exposing the Vulkan API is not
recommended.
Version 2 goes a step further explicitly forbiding the ICD from exposing Vulkan
symbols (and adding a negotiation API)
As a reference:
- Nvidia 367.35
Missing negotiation API - version 1.
Exposes only vk_icdGetInstanceProcAddr.
- AMD 16.30.3.306809
Have negotiation API - version 2,
Exposes vk_icdGetInstanceProcAddr.
Exposes a couple of Vulkan entry points - seems to be in violation with the spec.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Hide the internal symbols and annotate the vk_icdGetInstanceProcAddr as public
since the loader needs it (since v1 of the loader interface).
v2: Add VISIBILITY_CFLAGS to AM_CFLAGS (Ken)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
I don't want src_is_bool() and src_is_type(x, nir_type_bool) to behave
differently. Having the logic spread out over three functions makes it
harder to decide where to put new logic, as well.
So, combine them all. It's a bit simpler because there's now only one
recursive function rather than a pair of mutually recursive functions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently, 'a@type' can only match if 'a' is produced by an ALU
instruction. This is rather limited - there are other cases we
can easily detect which we should handle.
Extending the code in-place would be fairly messy, so we introduce
a new src_is_type() helper.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The "Barrier Count" field goes in 14:9 of m0.2. The vec4 backend
correctly shifts by 9, but the scalar backend only shifted by 8.
It's not like this changed - I think I just made a typo when writing
the original scalar TCS backend code.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The first simply picks the bany_inequal[234] opcodes based on the SSA
def's number of components. The latter implicitly compares with zero
to achieve the same semantics of GLSL's any().
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
GL_EXT_packed_float, 2.1.B Unsigned 10-Bit Floating-Point Numbers:
0.0, if E == 0 and M == 0,
2^-14 * (M / 32), if E == 0 and M != 0,
2^(E-15) * (1 + M/32), if 0 < E < 31,
INF, if E == 31 and M == 0, or
NaN, if E == 31 and M != 0,
In the second case (E == 0 and M != 0), we were multiplying the mantissa
by 2^-20, when we should have been multiplying by 2^-19 (which is
2^(-14 + -5), or 2^-14 * 2^-5, or 2^-14 / 32).
The previous section defines the formula for 11-bit numbers, which is:
2^-14 * (M / 64), if E == 0 and M != 0,
In other words, we had accidentally copy and pasted the 11-bit code
to the 10-bit case, and neglected to change the exponent.
Fixes dEQP-GLES3.functional.pbo.renderbuffer.r11f_g11f_b10f_triangles
when run with surface dimensions of 1536x1152 or 1920x1080.
Cc: mesa-stable@lists.freedesktop.org
References: https://code.google.com/p/chrome-os-partner/issues/detail?id=56244
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Stephane Marchesin <stephane.marchesin@gmail.com>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
When viewport transform is disabled (ie. screen space coords are passed
in directly), the W component should be interpreted as RHW.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
This mega-commit pulls most of the i965-specific bits of blorp into the
brw_blorp.c/h files which now contain nothing but i965 wrappers around
"core blorp" calls. The "core blorp" api is moved into blorp.h and the
internal blorp data structures are moved into blorp_priv.h. The new file
blorp.c is created to house "core blorp" internals which are pulled from
the old brw_blorp.c
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The helpers are completely miptree-unaware and each fairly cleanly do a
single thing. This does come at the downside of not doing proper debug
reporting on whether or not we're doing replicated clears.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This pulls the mcs allocation into the if statement where we initially
determine that we are doing a fast clear and moves the programming of
wm_inputs and figuring out the fast clear rect into it's own if statement.
The next commit will put code inbetween the two.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The blorp_surface_info_init call above should set the format for us and
stomping it later does nothing whatsoever.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We had another inline copy of brw_meta_get_buffer_rect embedded in
get_fast_clear_rect for no good reason. This lets us get rid of the
gl_frameuffer parameter to get_fast_clear_rect.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We already have an inlined version of the function slightly higher up in
do_single_blorp_clear and all calling it does is stomp the values with the
same thing. We might as well just get rid of it.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that we have the brw_blorp_surf struct, we can start to make bits of
blorp completely miptree-unaware. To start things off, we split the guts
of brw_blorp_blit_miptrees into a brw_blorp_blit function which knows
nothing about miptrees.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
At the moment, this seems to make all of the interfaces messier rather than
clener. However, it does provide a representation of a surface that
simultaneously contains everything and is completely unaware of miptrees.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The isl_surf munging doesn't happen until fairly late in the blorp_blit
function. We can use the isl_surf for the vast majority if not all of our
params setup.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This keeps all of the nastyness of gen6 stencil on the i965 side of the API
line and lets us delete that nasty hand-rolled ISL-based offset path that
we were using for ALL_SLICES_AT_EACH_LOD.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This commit also adds support for an offset for aux surfaces. In GL, this
only gets used for HiZ on SNB at the moment. However, in Vulkan, all aux
surfaces are at a non-zero offset and that is likely to happen in GL
eventually.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This commit movies us from a miptree model to a surf+bo+offset model. In
the GL driver, miptrees are almost always at the start of the bo so the
offset is zero but we don't want to always make that assumption. In the
sort term, gen6 stencil and HiZ will be at an offset but, in the long term,
any Vulkan surface is liable to be at a non-zero offset.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The previous HiZ support was bogus because all of get_aux_isl_surf looked
at mt->mcs_mt directly. For HiZ buffers, you need to look at either
mt->hiz_buf or mt->hiz_buf->mt.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Otherwise, the clear color will get ignored. This prevents assertion
errors if clear color is set to something invalid and aux is not used.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
In order for the calculations of things such as fast clear rectangles to
work, we need more details of the auxiliary surface to be correct. In
particular, we need to be able to trust the width and height fields.
(These are not necessarily what you want coming out of the miptree.) The
only values state setup really cares about are the row and array pitch and
those we can safely stomp from the miptree.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
At one point, we were doing this correctly. It must have gotten lost in
one of the many rebases.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The only reason why we need layer or level is that we need the z-offset for
3-D surfaces. Let's just have the one field for that.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The data comes in via ISL in a format that's almost directly usable by the
hardware so we can avoid some of the conversion headache.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that the generic blorp path uses base level/layer, there's no need to
make gen8 special.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Since the dawn of time, blorp has used offsets directly to get at different
mip levels and array slices of surfaces. This isn't really necessary since
we can just use the base level/layer provided in the surface state. While
it may have simplified blorp's original design, we haven't been using the
blorp path for surface state on gen8 thanks to render compression and
there's really no good need for it most of the time. This commit restricts
such surface munging to the cases of fake W-tiling and fake interleaved
multisampling.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The layer field is in terms of physical layers which isn't quite what the
sampler will want for 2-D MS array textures.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Multisample array surfaces on IVB don't support the minimum array element
surface attribute so it needs to come through the sampler message. We may
as well just pass it through everything.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
At the moment, the minify operation does nothing because
params.depth.view.base_level is always zero. However, as soon as we start
using actual base miplevels and array slices, we are going to need the
minification. Also, we only need to align the surface dimensions in the
case where we are operating on miplevel 0. Previously, it didn't matter
because it aligned on miplevel 0 and, for all other miplevels, the miptree
code guaranteed that the level was already aligned.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The sampling hardware can handle them ok. It just looks at the tiling to
determine whether it's the new gen9 1-D layout or the old one. The render
hardware isn't so smart.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
In all three cases, we start with width and height taken from
isl_surf::phys_slice0_extent_sa which is already in samples. There is no
need to do the conversion and doing so gives us an incorrect value.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The helper does a full transformation on the surface to turn it into a new
2-D single-layer single-level surface representing the original layer and
level in memory.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
For the moment, we still call the old miptree function; we just assert that
the two are equal.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The function takes a logical array layer but was assuming it was a physical
array layer. While we'er here, we also make it not assert-fail on gen9 3-D
surfaces.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Eventually, this will be the actual view that gets passed into isl to
create the surface state. For now, we just use it for the format and the
swizzle.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Previously we multiplied full x/y offsets, resolved tile aligned buffer
offset and intra tile offset based on that. Now we let ISL to take into
account the msaa setting and we only multiply the resolved intra tile
offsets.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We put all of the code for fake IMS together. This requires moving a bit
of the program key setup code further down so that it gets the right values
out of the final surface.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that we're carrying around the isl_surf, we can just modify it
directly instead of passing an extra bit around.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The alignment we use doesn't matter (see the comment) but it should at
least be an alignment we can represent with the enums.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
It's only used to stomp the tiling to Y and it's only used by blorp so
there's no reason why blorp can't do it itself.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
It's been in elements for a while but, for whatever reason, the parameter
names in the header file never got updated.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The lrint() and lrintf() functions are pretty slow and make some
texture transfers very inefficient. This patch makes a better effort
at using those intrisics for 32-bit gcc and MSVC.
Note, this patch doesn't address the use of SSE4.1 with MSVC.
v2: get rid of the ROUND_WITH_SSE symbol, per Matt.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The function was always returning false because of this typo.
Retested with piglit. There's some sRGB-related blit failures, but
that seems unrelated.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
No change except to the copyright symbol. The next patch will generate
this file with Python, and Unicode + Python = pure rage.
v2: Massive rebase... I guess a lot can change in a year.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is required by OpenGL. Our hardware supports this.
Example: Bind RGBA32F with offset = 4 bytes.
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Vulkan doesn't do this. The reason may be that CB_COLOR1_INFO.SOURCE_FORMAT
from NI was moved to SPI_SHADER_COL_FORMAT for SI.
I asked CB guys about this 2 days ago and they still haven't replied.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Avoid building all those store 0 / store undef instruction pairs that
end up getting removed anyway.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Also, prepare for using tgsi_array_info.
This also opens the door for properly handling allocation failures, but I'm
leaving that for a separate change.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Doing the write-back of the temporary vector in radeon_llvm_emit_store makes
no sense.
This also allows us to get rid of get_alloca_for_array.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In the alloca'd array case, no longer create redundant and unused allocas
for the individual elements; create getelementptrs instead.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ir_unop_fract already forbade integer types in ir_validate. ir_unop_rcp,
ir_unop_rsq, and ir_unop_sqrt should also forbid them in ir_validate.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Some shaders include code that looks like:
uniform int i;
uniform vec4 bones[...];
foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]);
CSE would do some work on this:
x = i * 3
foo(bones[x], bones[x + 1], bones[x + 2]);
The compiler may then add '<< 4 + base' to the index calculations.
This results in expressions like
x = i * 3
foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]);
Just rearranging the math to produce (i * 48) + 16 saves an
instruction, and it allows CSE to do more work.
x = i * 48;
foo(bones[x], bones[x + 16], bones[x + 32]);
So, ~6 instructions becomes ~3.
Some individual shader-db results look pretty bad. However, I have a
really, really hard time believing the change in estimated cycles in,
for example, 3dmmes-taiji/51.shader_test after looking that change in
the generated code.
G45
total instructions in shared programs: 4020840 -> 4010070 (-0.27%)
instructions in affected programs: 177460 -> 166690 (-6.07%)
helped: 894
HURT: 0
total cycles in shared programs: 98829000 -> 98784990 (-0.04%)
cycles in affected programs: 3936648 -> 3892638 (-1.12%)
helped: 894
HURT: 0
Ironlake
total instructions in shared programs: 6418887 -> 6408117 (-0.17%)
instructions in affected programs: 177460 -> 166690 (-6.07%)
helped: 894
HURT: 0
total cycles in shared programs: 143504542 -> 143460532 (-0.03%)
cycles in affected programs: 3936648 -> 3892638 (-1.12%)
helped: 894
HURT: 0
Sandy Bridge
total instructions in shared programs: 8357887 -> 8339251 (-0.22%)
instructions in affected programs: 432715 -> 414079 (-4.31%)
helped: 2795
HURT: 0
total cycles in shared programs: 118284184 -> 118207412 (-0.06%)
cycles in affected programs: 6114626 -> 6037854 (-1.26%)
helped: 2478
HURT: 317
Ivy Bridge
total instructions in shared programs: 7669390 -> 7653822 (-0.20%)
instructions in affected programs: 388234 -> 372666 (-4.01%)
helped: 2795
HURT: 0
total cycles in shared programs: 68381982 -> 68263684 (-0.17%)
cycles in affected programs: 1972658 -> 1854360 (-6.00%)
helped: 2458
HURT: 307
Haswell
total instructions in shared programs: 7082636 -> 7067068 (-0.22%)
instructions in affected programs: 388234 -> 372666 (-4.01%)
helped: 2795
HURT: 0
total cycles in shared programs: 68282020 -> 68164158 (-0.17%)
cycles in affected programs: 1891820 -> 1773958 (-6.23%)
helped: 2459
HURT: 261
Broadwell
total instructions in shared programs: 9002466 -> 8985875 (-0.18%)
instructions in affected programs: 658784 -> 642193 (-2.52%)
helped: 2795
HURT: 5
total cycles in shared programs: 78503092 -> 78450404 (-0.07%)
cycles in affected programs: 2873304 -> 2820616 (-1.83%)
helped: 2275
HURT: 415
Skylake
total instructions in shared programs: 9156978 -> 9140387 (-0.18%)
instructions in affected programs: 682625 -> 666034 (-2.43%)
helped: 2795
HURT: 5
total cycles in shared programs: 75591392 -> 75550574 (-0.05%)
cycles in affected programs: 3192120 -> 3151302 (-1.28%)
helped: 2271
HURT: 425
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There's no guarantee that there is one, and we don't need one anyway.
Fixes piglit tests:
glx@glx-fbconfig-bad
glx@glx_ext_import_context@import context, multi process
glx@glx_ext_import_context@import context, single process
Fixes: 2e3f067458 ("glx: fix error code when there is no context bound")
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
It's fairly rare that the BB layout puts BBs after the exit block, which
is likely the reason these issues lingered for so long.
This fixes a fraction of issues with the giant pixmark piano shader.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
The current logic used to determine the execution size of sampler
messages was based on special-casing several argument and opcode
combinations, which unsurprisingly missed the possibility that some
messages could exceed the payload size limit or not depending on the
number of coordinate components present. In particular:
- The TXL, TXB and TEX messages (the latter on non-FS stages only)
would attempt to use SIMD16 on Gen7+ hardware even if a shadow
reference was present and the texture was a cubemap array, causing
it to overflow the maximum supported sampler payload size and
crash.
- The TG4_OFFSET message with shadow comparison was falling back to
SIMD8 regardless of the number of coordinate components, which is
unnecessary when two coordinates or less are present.
Both cases have been handled incorrectly ever since cubemap arrays and
texture gather were respectively enabled (the current logic used by
the SIMD lowering pass is almost unchanged from the previous no16
fall-back logic used pre-SIMD lowering times).
Fixes the following GL4.5 conformance test on Gen7-8 (the bug also
affects Gen9+ in principle, but SKL passes the test by luck because it
manages to use the TXL_LZ message instead of TXL):
GL45-CTS.texture_cube_map_array.sampling
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97267
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes it easier for the caller to find out how many scalar
components are actually read by the instruction. As a bonus we no
longer need to special-case BAD_FILE in the implementation of
fs_inst::regs_read.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This simplifies the code slightly and will allow the SIMD lowering
pass to find out easily what the actual texturing opcode is in order
to determine the maximum execution size of texturing instructions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All three fields of the vbuffer_attrs[] array are assigned in the following
loop. The remaining elements of the array are not used.
Tested with full Piglit run, Heaven 4.0, etc.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We don't need to loop over the max number of color buffers, just the
current number (which is usually one).
Tested with full Piglit run, Heaven 4.0, etc.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Fixes regression with team_fortress_2 trace.
This change has been in our in-house tree for some time.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Since clears are more or less just normal draws, there isn't that much
benefit in having hand-rolled clear path. Add support to use u_blitter
instead if gen specific backend doesn't implement ctx->clear().
Signed-off-by: Rob Clark <robdclark@gmail.com>
Not (currently) state that is overwridden by u_blitter itself, but
drivers with custom blit/clear which are reusing part of the u_blitter
infrastructure will use it.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
eglMakeCurrent can also be used to change the active display. In that
case, we need to decrement ref_count of the previous display (possibly
destroying it), and increment it on the next display.
Also, old_dsurf/old_rsurf cannot be non-NULL if old_ctx is NULL, so
we only need to test if old_ctx is non-NULL.
v2: Save the old display before destroying the context.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97214
Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reported-by: Alexandr Zelinsky <mexahotabop@w1l.ru>
Tested-by: Alexandr Zelinsky <mexahotabop@w1l.ru>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Apply the median and matrix filter before the compostioning
we apply the deinterlacing first to avoid the extra overhead
in processing the past and the future surfaces in deinterlacing.
v2: apply the filters on all the surfaces (Christian)
v3: use get_sampler_view_planes() instead of
get_sampler_view_components() and iterate over
VL_MAX_SURFACES (Christian)
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Ian recently changed the preprocessor to allow this in most GLSL
versions, but not GLSL ES 3.00+. This patch converts the existing
test that expects a failure to a #version 300 es shader, and adds
a #version 110 shader to make sure that it's allowed.
Fixes 'make check'.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97307
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Found with valgrind:
==4841== Invalid read of size 4
==4841== at 0x56BDC80: dri2_initialize (egl_dri2.c:783)
==4841== by 0x56BAFE5: _eglMatchAndInitialize (egldriver.c:261)
==4841== by 0x56BB15E: _eglMatchDriver (egldriver.c:295)
==4841== by 0x56B58C9: eglInitialize (eglapi.c:480)
==4841== by 0x4F537DC: _glfwInitEGL (in /usr/lib64/libglfw.so.3.2)
==4841== by 0x4F4BEFB: _glfwPlatformInit (in /usr/lib64/libglfw.so.3.2)
==4841== by 0x4F46F40: glfwInit (in /usr/lib64/libglfw.so.3.2)
==4841== by 0x402E59: main
==4841== Address 0x6a05824 is 148 bytes inside a block of size 480 free'd
==4841== at 0x4C2B680: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==4841== by 0x56C2AAE: dri2_initialize_x11_swrast (platform_x11.c:1233)
==4841== by 0x56C2AAE: dri2_initialize_x11 (platform_x11.c:1493)
==4841== by 0x56BDCEB: dri2_initialize (egl_dri2.c:805)
==4841== by 0x56BAFAF: _eglMatchAndInitialize (egldriver.c:261)
==4841== by 0x56BB0C9: _eglMatchDriver (egldriver.c:292)
==4841== by 0x56B58C9: eglInitialize (eglapi.c:480)
==4841== by 0x4F537DC: _glfwInitEGL (in /usr/lib64/libglfw.so.3.2)
==4841== by 0x4F4BEFB: _glfwPlatformInit (in /usr/lib64/libglfw.so.3.2)
==4841== by 0x4F46F40: glfwInit (in /usr/lib64/libglfw.so.3.2)
==4841== by 0x402E59: main
==4841== Block was alloc'd at
==4841== at 0x4C2A868: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==4841== by 0x56C2A47: dri2_initialize_x11_swrast (platform_x11.c:1171)
==4841== by 0x56C2A47: dri2_initialize_x11 (platform_x11.c:1493)
==4841== by 0x56BDCEB: dri2_initialize (egl_dri2.c:805)
==4841== by 0x56BAFAF: _eglMatchAndInitialize (egldriver.c:261)
==4841== by 0x56BB0C9: _eglMatchDriver (egldriver.c:292)
==4841== by 0x56B58C9: eglInitialize (eglapi.c:480)
==4841== by 0x4F537DC: _glfwInitEGL (in /usr/lib64/libglfw.so.3.2)
==4841== by 0x4F4BEFB: _glfwPlatformInit (in /usr/lib64/libglfw.so.3.2)
==4841== by 0x4F46F40: glfwInit (in /usr/lib64/libglfw.so.3.2)
==4841== by 0x402E59: main
Signed-off-by: Jan Ziak (http://atom-symbol.net) <0xe2.0x9a.0x9b@gmail.com>
Fixes: 9ee683f877 (egl/dri2: Add reference count for dri2_egl_display)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Some applications continue to use the Xlib client library and expect that
VK_KHR_xlib_surface will be available in the driver. Service these
applications by converting the Display pointer to xcb_connection_t and use
the existing xcb code in the driver.
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Instead of checking for mapped buffers in vbo_bind_arrays
do this check in api_validate.c. This additionally
enables printing the draw calls name into the error
string.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Move the function to check if all vao buffers are
unmapped into the vao implementation file.
Rename the function to _mesa_all_buffers_are_unmapped.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
In array draw do not check if the vertex buffer object that
is used to implement immediate mode glBegin/glEnd is mapped.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
These are the same for a3xx and later. (a2xx could probably use them
too, but due to limited hw support and ancient downstream kernels, it
isn't so easy to test.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
For various tex fetch instructions, coord's get fixed up in different
ways. But modifying the array returned from get_src() has side-effects
if the same SSA src is used again.. the later instruction will see the
previous fixups.
Fix this, and const'ify things to prevent this sort of mistake in the
future.
Noticed by Varad when adding support for txf_ms.
Signed-off-by: Rob Clark <robdclark@gmail.com>
For regular ast_add, we can implicitly change either a or b's type.
However in an assignment situation, the type of the lvalue is fixed. So
if the implicit conversion logic decides to change it, it means that the
rhs's type could not be converted to the lhs type.
Emit a specific error for this rather than the rather mysterious "is not
an lvalue" error that results from having a i2f or other operation as
the lvalue.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96729
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The additional provision of GL_OES_copy_image is that it work for ETC.
However many desktop GPUs don't have native ETC support, so st/mesa does
the decoding by hand. Instead of discarding the compressed data, keep it
around in CPU memory. Use it when performing image copies.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
The GL spec is very unclear on this point. Apparently this is discussed
without resolution in the closed Khronos bugtracker at
https://cvs.khronos.org/bugzilla/show_bug.cgi?id=7829 . The
recommendation is to allow dropping the [0] for looking up the bindings.
The approach taken in this patch is to instead tack on [0]'s for each
arrayness level of the output's type, and doing the lookup again. That
way, for
out vec4 foo[2][2][2]
we will end up looking for bindings for foo, foo[0], foo[0][0], and
foo[0][0][0], in that order of preference.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96765
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Anholt <eric@anholt.net>
The GL_BGR and GL_UNSIGNED_SHORT_5_6_5_REV are not defined anywhere in
OpenGL ES 3.2 (or earlier) specification, and there are no known extensions
in the Khronos registry that would add these enums as valid responses for
glGetIntegerv(GL_IMPLEMENTATION_COLOR_READ_TYPE) and
glGetIntegerv(GL_IMPLEMENTATION_COLOR_READ_FORMAT) queries.
Note that this patch does not change the bit layout returned by the query. As
defined by the GL spec, the bit layout of GL_RGB + GL_UNSIGNED_SHORT_5_6_5 and
GL_BGR + GL_UNSIGNED_SHORT_5_6_5_REV are identical.
TEST=dEQP-GLES3.functional.state_query.integers.*
Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: Stéphane Marchesin <marcheu@chromium.org>
Change-Id: I81bbc8ccdc7e125edaeae443baf6fa8fdefcc6b6
This is required following the change in 8X sample positions.
Fixes the recently modified multisample-scaled-blit piglit tests.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There are no standard sample positions defined in OpenGL and OpenGL
ES specs. Implementations have the freedom to pick the positions
which give plausible results. But the Vulkan 1.0 spec does define
standard sample positions for different sample counts. Defined
positions in Vulkan for all the sample counts except 8X match with
the positions we set in i965. We have an upcoming plan to share the
blorp code between OpenGL and Vulkan driver in near future. Keeping
the 8X sample positions same on both the drivers will help us move
in that direction.
Here is an argument by Neil Roberts (from commit 20250e85) against
any advantage of current 8X sample positions over the new ones:
"The comment above for the 8x sample positions says that the hardware
implements centroid interpolation by picking the centre-most sample
that is inside the primitive. That implies that it might be worthwhile
to pick a pattern that includes 0.5,0.5. However by experimentation
this doesn't seem to actually be the case. With the sample positions
in this patch, if I modify the piglit test below so that it instead
reports the centroid position, it reports 0.492188,0.421875 which
doesn't match any of the positions. If I modify the sample positions
so that they include one at exactly 0.5,0.5 it doesn't help and it
reports another position which is even further from the center for
some reason.
arb_gpu_shader5-interpolateAtSample-different
Kenneth Graunke experimented with some other patterns that have a
higher standard deviation but I think after some discussion it was
decided that it would be better to pick the same pattern as the other
graphics API in case there are games that rely on this pattern."
Observed no regressions in jenkins testing.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
because NewDriverState is filtered depending on active shader states,
while st->dirty isn't.
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
At compile time, each shader determines which ST_NEW flags should be set
at shader bind time.
This just sets the new field for all shaders. The next commit will use it.
v2: small code unification
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
It always returns non-null, even if the number is an invalid enum.
Cc: Haixia Shi <hshi@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Change-Id: I26e8843c96130be972e66f48a49e362442e1bf97
The previous commit fixed xfb_buffer handling, which was largely copy
and pasted from the stream handling. The difference is that stream
was set in input_layout_mask, so it worked.
However, that's totally rubbish: stream is only valid on geometry shader
outputs. Presumably this was to hack around inout. Instead, apply the
solution I used in the previous fix.
Really, we just need to separate shader interface and parameter
qualifier handling so this isn't a mess, but this patch at least
tidies it slightly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
inout variables have q.in and q.out set. We were trying to set
xfb_buffer = 1 for shader output variables (and inadvertantly setting
it on inout parameters, too). But input_layout_mask doesn't have
xfb_buffer set, so it was seen as in invalid input qualifier.
This meant that all 'inout' parameters were broken.
Caught by running a WebGL conformance test in Chromium:
https://www.khronos.org/registry/webgl/sdk/tests/deqp/data/gles3/shaders/qualification_order.html?webglVersion=2
Fixes Piglit's tests/spec/glsl-4.40/compiler/inout-parameter-qualifier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Only include the ones that can be used by the shader.
This fixes texture coordinates, which were completely wrong,
because WPOS was included in the list of attribs. It also
increases performance noticeably.
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Section 3.4 (Preprocessor) of the GLSL ES 3.00 spec says:
It is an error to undefine or to redefine a built-in (pre-defined)
macro name.
The GLSL ES 1.00 spec does not contain this text.
Section 3.3 (Preprocessor) of the GLSL 1.30 spec says:
#define and #undef functionality are defined as is standard for C++
preprocessors for macro definitions both with and without macro
parameters.
At least as far as I can tell GCC allow '#undef __FILE__'. Furthermore,
there are desktop OpenGL conformance tests that expect '#undef
__VERSION__' and '#undef GL_core_profile' to work.
Fixes:
GL45-CTS.shaders.preprocessor.definitions.undefine_version_vertex
GL45-CTS.shaders.preprocessor.definitions.undefine_version_fragment
GL45-CTS.shaders.preprocessor.definitions.undefine_core_profile_vertex
GL45-CTS.shaders.preprocessor.definitions.undefine_core_profile_fragment
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
Builtins already have locations assigned so this shouldn't
change anything. We want to call it earlier so we can tranform
GLSL IR to NIR earlier.
Reviewed-by: Eric Anholt <eric@anholt.net>
Here a new function link_varyings_and_uniforms() is created this
should help make it easier to follow the code in link_shader()
which was getting very large.
Note the end of the new function contains a for loop with some
lowering calls that currently don't seem related to varyings or
uniforms but they are a dependancy for converting to NIR ealier
so we move things here now to keep things easy to follow.
Reviewed-by: Eric Anholt <eric@anholt.net>
The pass isn't really control-flow aware and you can get into case where it
tries to combine instructions from different blocks. This can actually
lead to an assertion failure when removing unneeded instructions if part of
the vector is set in one block and part in another. This prevents
regressions in the next commit.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Improves glretrace -b servo.trace (a trace of Mozilla's servo rendering
engine booting, rendering a page, and exiting) from 1.8s to 1.1s. It uses
a large uniform array of structs, making a huge number of separate program
resources, and the fixed-size hash table was killing it. Given how many
times we've improved performance by swapping the hash table to
util/hash_table.h, just do it once and for all.
This just rebases the old hash table API on top of util/, for minimal
diff. Cleaning things up is left for later, particularly because I want
to fix up the new hash table API a little bit.
v2: Add UNUSED to the now-unused parameter.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
I'm going to replace this hash table with util/hash_table.h, and the first
step is to compare things the same way.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Separated FE stats out into its own structure. There are 17 FE vs 3 BE
stat fields. Since there is only one FE thread per DC then we don't have
to loop over all threads and sum up FE stats over all the worker threads.
This also reduces size of DC since we only need to store one copy of the
FE stats and not one per worker. Finally, we can use the new FE callback
mechanism to update these.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
- Remove HYPERTHREADED_FE support
- Add threading info as optional data passed to SwrCreateContext.
If supplied this data will override any KNOB thread settings.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
This is a blocking call that waits until all FE work is complete.
This is useful for waiting for FE work to complete such as for streamout.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
enable/disable raster tile trivial accept test based on scissor enable trait.
Can be optimized further.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
1. SoWriteOffset is no longer treated as a stat
2. Added callback from core to update streamout write offset
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
This is the case when the "00 00 03" is very close to the beginning of
nal unit header
v2: move the check to rbsp init
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
If a shader has an output array, it will get treated as though it were
gl_FragData and rewritten into gl_out_FragData instances. We only want
this to happen on the actual gl_FragData and not everything else.
This is a small part of the problem pointed out by the below bug.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96765
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This can help enable some blend optimizations (see the register spec).
Vulkan always sets this.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
This is the proper fix for Overlord and Witcher 2 hangs.
The hang condition is that 1 app must write to MRT0 and MRT1 from a pixel
shader while MRT1 is disabled in CB_TARGET_MASK (does this generate
unflushable pixel quads? I don't know), and another app (e.g. Glamor)
must enable dual source blending in both MRT0 and MRT1. The hw gets
confused, which leads to corruption and hangs.
Cc: 12.0 11.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Since commit 6d7177f01b, radeonsi
would take a different path if info->indirect_params was not
initialized properly. Nine was not initializating this field.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
+23% Bioshock Infinite performance.
v2: - use the new fence_finish interface
- allow deferred fences with multiple contexts
- clear the ctx pointer after a deferred flush
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
required by glClientWaitSync (GL 4.5 Core spec) that can optionally flush
the context
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Some ideas copied from Jakob Sinclair's implementation, but the color
clearing is completely different.
v2: remove leftover code, disable conditional rendering
disable render condition cleanly
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
On android platform, the width and height of a native window surface may
be updated after initialization. It is therefore necessary to query android
framework for the current width and height.
v2: remove Android specific #ifdef's and just implement the fallback directly
if the platform query_surface() callback is not provided.
TEST=dEQP-EGL.functional.resize.surface_size#* on cyan-cheets
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> (v1)
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Change-Id: I673f7d2f1d90c3bf572b30f63da537f2cae1496e
V1: Add multisample positions (Nanley)
V2: Fix 8x sample positions to match OpenGL (Anuj)
V3: Vulkan has standard sample locations. They need not be same as
in OpenGL. (Anuj)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
As requested with the initial creation of util/bitscan.h
now move other bitscan related functions into util.
v2: Split into two patches.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This enables GL_shader_draw_parameters and GL_ARB_indirect_parameters as well
as a properly accelerated implementation of GL_ARB_multi_draw_indirect.
Enabling the feature requires a sufficiently uptodate firmware -- those have
already been released a long time ago, although this does mean that the
feature only works with the amdgpu kernel module, since the radeon module
doesn't have a way to query the firmware version.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Note that for indirect draws, the new MULTI firmware packets are required.
There's also no need to reset last_{start_instance,sh_base_reg}, since
resetting last_base_vertex is sufficient.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Byte indices don't need any alignment, so remove this assertion (it got moved
into a path where a piglit test hit it during the refactoring of
commit 64ff23a58c).
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
MSVC doesn't support 64-bit enum values, at least not with C code.
The compiler was warning:
c:\users\brian\projects\mesa\src\mesa\state_tracker\st_atom_list.h(43) : warning
C4309: 'initializing' : truncation of constant value
c:\users\brian\projects\mesa\src\mesa\state_tracker\st_atom_list.h(44) : warning
C4309: 'initializing' : truncation of constant value
...
And at runtime we crashed since the high 32-bits of the 'dirty' bitmask
was always 0xffffffff and the 32+u_bit_scan() index went out of bounds of
the atoms[] array.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Previously we would not print a swizzle on ssa_52 when only its .x
component is used (as seen in the definition of ssa_53):
vec3 ssa_52 = fadd ssa_51, ssa_51
vec1 ssa_53 = flog2 ssa_52
vec1 ssa_54 = flog2 ssa_52.y
vec1 ssa_55 = flog2 ssa_52.z
But this makes the interpretation of the RHS of the definition difficult
to understand and dependent on the size of the LHS. Just print swizzles
when they are not the identity swizzle, so the previous example is now
printed as:
vec3 ssa_52 = fadd ssa_51.xyz, ssa_51.xyz
vec1 ssa_53 = flog2 ssa_52.x
vec1 ssa_54 = flog2 ssa_52.y
vec1 ssa_55 = flog2 ssa_52.z
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This reverts commit a37e46323c.
It broke the game Overlord such that it hung a GCN GNU. While I don't know
how the hang happened because of its randomness and gfx corruption precedes
it, many of the shaders contain this:
out vec4 FragData[gl_MaxDrawBuffers];
This patch adds support for YV12 pixel format to the Android platform
backend. Only creating EGL images is supported, it is not added to the
list of available visuals.
v2: Use const array defined just for YV12 instead of trying to be overly
generic.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Tested-by: Rob Herring <rob@kernel.org>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Change-Id: I4aeb2d67a95c5cdd10b530c549b23146c8f0b983
OpenGL 4.4 specifies that BlitFramebuffer should perform sRGB encode
and decode like ES 3.x does, but only when GL_FRAMEBUFFER_SRGB is
enabled. This is technically incompatible in certain cases, but is
more consistent across GL, ES, and WebGL, and more flexible.
The NVIDIA 367.35 drivers appear to follow this behavior.
For the awful spec analysis, please read Piglit's
tests/spec/arb_framebuffer_srgb/blit.c, which explains the differences
between GL 4.1, 4.2, 4.3 (2012), 4.3 (2013), and 4.4, and why this
is the right rule to implement.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Just avoid whacking GL_FRAMEBUFFER_SRGB altogether, so we respect
the application's setting. This appears to work.
v2: Update one more comment (requested by Ian).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
OpenGL 4.4 specifies that BlitFramebuffer should perform sRGB encode
and decode like ES 3.x does, but only when GL_FRAMEBUFFER_SRGB is
enabled. This is technically incompatible in certain cases, but is
more consistent across GL, ES, and WebGL, and more flexible.
The NVIDIA 367.35 drivers appear to follow this behavior.
For the awful spec analysis, please read Piglit's
tests/spec/arb_framebuffer_srgb/blit.c, which explains the differences
between GL 4.1, 4.2, 4.3 (2012), 4.3 (2013), and 4.4, and why this
is the right rule to implement.
Note that ctx->Color.sRGBEnabled is initialized to _mesa_is_gles(ctx),
and ES doesn't have enable/disable flags for GL_FRAMEBUFFER_SRGB, so
it's effectively on all the time. This means the ES behavior should
be unchanged.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This should have no effect, as all drivers which support BLORP also
support ES 3.0 - so ES 2.0 would be promoted and follow the ES 3 rules.
ES 1.0 doesn't have BlitFramebuffer.
This is purely to clarify the next patch a bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This reverts commit 4e549ddb50,
dropping the hack from Gallium that I just deleted from i965.
See the previous commit for rationale.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
I've never quite understood the purpose of this hack - supposedly,
doing resolves in the sRGB colorspace is slightly more accurate.
Currently, BlitFramebuffer() ignores sRGB encoding and decoding
on OpenGL, although it encodes and decodes in GLES 3.x.
The updated OpenGL 4.4 rules also allow for encoding and decoding
if GL_FRAMEBUFFER_SRGB is enabled, allowing the application to
control what colorspace blits are done in. I don't think this hack
makes any sense in such a world - the application can do what it
wants, and we shouldn't second guess them.
A related Piglit patch, "Make multisample accuracy test set
GL_FRAMEBUFFER_SRGB when resolving." makes the Piglit MSAA accuracy
test explicitly request SRGB encoding/decoding during resolves when
running "srgb" subtests. Without that patch, this commit will regress
those tests, but with it, they should continue to work just fine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Modern OpenGL BlitFramebuffer require sRGB encode/decode when
GL_FRAMEBUFFER_SRGB is enabled. The blitter can't handle this,
so we need to bail. On Gen4-5, this means falling back to Meta,
which should handle it.
We allow sRGB <-> sRGB blits, as decode then encode ought to be a noop
(other than potential precision loss, which nobody wants anyway).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There are DRI_IMAGE_FOURCC macros, for which there are no corresponding
DRI_IMAGE_FORMAT macros. To support such formats we need to make the
lookup function take the native format directly. As a side effect, it
simplifies all existing calls to this function, because they all called
get_format() first to convert from native to DRI_IMAGE_FORMAT.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Rob Herring <rob@kernel.org>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Change-Id: I4674000fb5ccfd02e38b8fa89bc567ac1d4fc16b
This patch splits current dri2_create_image_android_native_buffer() into
main entry point and two additional functions, one for creating an image
from flink name and one for handling prime FDs using the generic DMA-buf
path. This makes the code cleaner and also prepares for disabling flink
path more easily in the future.
v2: Split into separate patch.
Add error messages.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Rob Herring <rob@kernel.org>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Change-Id: Ifdfb5927399d56992fe707160423c29278f49172
Drivers can request different set of buffers depending on the buffer
mask they pass to the get_buffers callback. This patch makes
droid_image_get_buffers() respect this mask.
v2: Return error only in case of real error condition and ignore requests
of unavailable buffers.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Rob Herring <rob@kernel.org>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Change-Id: I6c3c4eca90f4c618579f6725dec323c004cb44ba
Fix compilation warnings due to unused variables left after some earlier
code changes.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Rob Herring <rob@kernel.org>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Change-Id: Iec09eb2a62887f3a38dff156756ed8385f3f3447
We've been setting it in gen7 forever but never in gen8; best to make it
consistent. This hasn't caused any problems yet because we don't advertise
support for statistics queries yet.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The multisample rasterization mode is computed based on this field,
3DSTATE_RASTER::DXMultisampleRasterizationMode (only for forced
multisampling), 3DSTATE_RASTER::APIMode, and the number of samples. There
are two tables in the SKL PRM that describe how the final multisample mode
is calculated: "Windower (WM) Stage >> Multisampling >> Multisample
ModeState >> Table 1" and the formula for "SF_INT::Multisample
Rasterization Mode".
The "DX Multisample Rasterization Enable" bit changes whether multisample
mode is set to OFF_PIXEL or ON_PATTERN in the samples > 1 case. In the
samples == 1 case, the bit has no effect. Since Vulkan has no concept of
disabling multisampling for samples > 1, we can just set the bit.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This error condition is not implementable when using tessellation or
geometry shaders. The text was also removed from the ES 3.2 spec.
I believe the intended behavior is to remove the error condition
when either OES_geometry_shader or OES_tessellation_shader are
exposed.
v2: Quote a better part of issue 13 (suggested by Ian).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Mostly, I want to share the GLES 3 transform feedback handling,
though most of the rest of the code is identical as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Also update _mesa_has_tessellation to know about the new extensions.
For now, these are dummy_false, to avoid turning on the extension
until everything's in place. Eventually, we'll move them over to
the "ARB_tessellation_shader" bit so that any drivers supporting
both the desktop extension and ES 3.1 get the feature.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The OES_tessellation_shader and EXT_tessellation_shader specifications
have suffixed names. These are identical to the core function, so just
alias them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Prefer to use DRAW_(INDEX)_INDIRECT_MULTI when available in the firmware.
Versions for SI and CI already added as provided by the firmware team, but
keep in mind that they won't currently be used since the radeon kernel module
has no interface to query the firmware version.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The radeon kernel module doesn't have the firmware query interface, so the
corresponding values will remain 0.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Overriding it is not allowed anyway, and actually lead to a crash when polygon
stippling was used with monolithic shaders.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
These are largely identical, except that the GS version has a few
extra error conditions. We can just pass in the stage and skip these.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
We need to subtract VARYING_SLOT_PATCH0, not VARYING_SLOT_VAR0.
Since "patch" only applies to inputs and outputs, we can just handle
this once outside the switch statement, rather than replicating the
check twice and complicating the earlier conditions.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This assertion is bogus. Varying structs, and arrays of structs, are
allowed by GLSL, and we can see them here. While we currently don't
have any partial-variable support for those, simply returning false
and marking the entire thing as used is certainly legitimate.
I believe this is often swept under the rug by varying packing,
but that's disabled in certain tessellation situations.
Hit by 20 dEQP-GLES31.functional.tessellation.user_defined_io.* tests.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This better matches the grammar in section 4.3.9 of the GLSL 4.5 spec,
and also removes some redundant code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Similar to has_geometry_shader(), has_compute_shader(), and so on.
This will make it easier to add more conditions here later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
They can only be run manually as described in HOW_TO_RUN.
It should help catch suboptimal code generation.
Some of the tests already fail.
v2: rename the tests to *.glsl,
fix lit.cfg to find FileCheck
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
This will be used by GLSL lit tests.
For developers only. It shouldn't be distributable and it doesn't use
the Mesa build system.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will be used by: amdgcn_glslc -mcpu=[family]
It can also be used for shader-db if you want stats for a different family.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
On gl{Push,Pop}Attrib(GL_CLIENT_VERTEX_ARRAY_BIT) take
care that gl_vertex_array_object::VertexAttribBufferMask
matches the bound buffer object in the
gl_vertex_array_object::VertexBinding array.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Both the rgb9e5 and r11g11b10 formats are defined based on how they are
packed into a 32-bit integer. It makes sense that the functions that
manipulate them take an explicitly sized type.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
There are a number of reasons for this refactor. First, format_rgb9e5.h is
not something that a user would expect to define such a generic union.
Second, defining it requires checking for endianness which is ugly. Third,
90% of what we were doing with the union was float <-> uint32_t bitcasts
and the remaining 10% can be done with a sinmple left-shift by 23.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The rgb9e5 format is a packed format defined in terms of slicing up a
single 32-bit value. The bitfields are far more confusing than simple
shifts and require that we check the endianness.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
When an argument for a structure constructor or initializer doesn't
match the expected type, only Section 4.1.10 “Implicit Conversions”
are allowed to try to match that expected type.
From page 32 (page 38 of the PDF) of the GLSL 1.20 spec:
" The arguments to the constructor will be used to set the structure's
fields, in order, using one argument per field. Each argument must
be the same type as the field it sets, or be a type that can be
converted to the field's type according to Section 4.1.10 “Implicit
Conversions.”"
From page 35 (page 41 of the PDF) of the GLSL 4.20 spec:
" In all cases, the innermost initializer (i.e., not a list of
initializers enclosed in curly braces) applied to an object must
have the same type as the object being initialized or be a type that
can be converted to the object's type according to section 4.1.10
"Implicit Conversions". In the latter case, an implicit conversion
will be done on the initializer before the assignment is done."
v2: Remove also the now redundant constant conversion, the
constant_record_constructor helper and the replacement code
(Timothy).
Fixes GL44-CTS.shading_language_420pack.initializer_list_negative
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
v2: Refactor also the conversion to constant and replacement code
(Timothy).
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Implicit conversions were added in the GLSL 1.20 spec version.
v2: Join the checks for GLSL 1.10 and ESSL (Timothy).
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Previously, for every input, we moved the dispatch mask to the flag
register, then emitted two predicated PLN instructions, one with
centroid barycentric coordinates (for normal pixels), and one with
pixel barycentric coordinates (for unlit helper pixels).
Instead, we can simply emit a set of predicated MOVs at the top of
the program which copy the pixel barycentric coordinates over the
centroid ones for unlit helper pixel channels. Then, we can just
use normal PLNs.
On Sandybridge:
total instructions in shared programs: 7538470 -> 7534500 (-0.05%)
instructions in affected programs: 101268 -> 97298 (-3.92%)
helped: 705
HURT: 9 (all of which are SIMD16 programs)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Sync now uses a callback to ensure that it's called by the last
thread moving past a DC. This will help with the new counter
handling.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
By allocating and initializing the matrices at context creation, the OS
couldn't even overcommit the pages. This saves about 63k (out of 946k) of
maximum memory size according to massif on simulated vc4
glsl-algebraic-add-add-1. It also means we could potentially relax the
maximum stack sizes, but that should be a separate commit.
v2: Drop redundant Top update, explain why the stack is small at init
time.
Reviewed-by: Brian Paul <brianp@vmware.com>
It's only used for rarely-used deprecated GL features
(feedback/rasterpos), so we can skip the memory allocation and
initialization for it most of the time.
Saves about 659k (out of 1605k) of maximum memory size according to massif
on simulated vc4 glsl-algebraic-add-add-1
Reviewed-by: Brian Paul <brianp@vmware.com>
This works out to be a wash in terms of memory usage: We use more memory
to store the separate ALU instructions, but we optimize out a lot of code
as well. The main result, though, is that we do more of our work at link
time rather than draw time.
We don't want to bake the whole array into the FS key, because of the
hashing overhead. But we can keep a set of the arrays seen, and use a
pointer to the copy in as the array's proxy.
Between this and the previous patch, gl-1.0-blend-func now passes on
hardware, where previously it was filling the 256MB CMA area with shaders
and OOMing.
Drops 712 shaders from shader-db.
I found a shader in Tales of Maj'Eyal that contains:
if ssa_21 {
block block_1:
/* preds: block_0 */
...instructions that prevent the select peephole...
vec1 32 ssa_23 = imov ssa_4
vec1 32 ssa_24 = imov ssa_4.y
vec1 32 ssa_25 = imov ssa_4.z
/* succs: block_3 */
} else {
block block_2:
/* preds: block_0 */
vec1 32 ssa_26 = imov ssa_4
vec1 32 ssa_27 = imov ssa_4.y
vec1 32 ssa_28 = imov ssa_4.z
/* succs: block_3 */
}
block block_3:
/* preds: block_1 block_2 */
vec1 32 ssa_29 = phi block_1: ssa_23, block_2: ssa_26
vec1 32 ssa_30 = phi block_1: ssa_24, block_2: ssa_27
vec1 32 ssa_31 = phi block_1: ssa_25, block_2: ssa_28
Here, copy propagation will bail because phis cannot perform swizzles,
and CSE won't do anything because there is no dominance relationship
between the imovs. By making nir_opt_remove_phis handle identical moves,
we can eliminate the phis and rewrite everything to use ssa_4 directly,
so all the moves become dead and get eliminated.
I don't think we need to check "exact" - just the alu sources.
Presumably phi sources should match in their exactness.
On Broadwell:
total instructions in shared programs: 11639872 -> 11638535 (-0.01%)
instructions in affected programs: 134222 -> 132885 (-1.00%)
helped: 338
HURT: 0
v2: Fix return value to be NULL, not false (caught by Iago).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Previously, we allocated a new VGRF for every undefined definition.
Instead, this patch makes us allocate a new VGRF for every use of an
undefined definition. This makes sure that undefined values are
fully independent of one another, and have live ranges limited to
their single use. This allows register coalescing to combine the
source and destination of MOVs from undefined sources, eliminating
the MOV altogether.
On Broadwell:
total instructions in shared programs: 11641187 -> 11640214 (-0.01%)
instructions in affected programs: 70199 -> 69226 (-1.39%)
helped: 213
HURT: 1
v2: Add a comment (based on Iago's suggested one).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Without this, the X server may accumulate stale Present event contexts
if a client performs several video decoding sessions using the same
window.
v2: Based on Chris Wilson's review:
* Use xcb_discard_reply() instead of free(xcb_request_check())
Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com>
Without this, the X server may accumulate stale Present event contexts
if a client ends up creating and destroying DRI drawables for the same
window.
v2: Based on Chris Wilson's review:
* Use xcb_present_select_input_checked so that protocol errors
generated by old X servers can be handled gracefully
* Use xcb_discard_reply() instead of free(xcb_request_check())
We were baking in the LOD of the source level to each shader. Instead,
pass it in as a uniform -- this requires storing it to a temp register,
but that's better than compiling a ton of separate shaders:
total instructions in shared programs: 115032 -> 115036 (0.00%)
instructions in affected programs: 96 -> 100 (4.17%)
LOST: 572
This helps in debugging memory pressure. It would be nice if we could
tell valgrind about it all the way from allocation time to destroy, but we
need a pointer to hand to VALGRIND_MALLOCLIKE_BLOCK.
The ranges are in units of bytes, not dwords. This wasn't caught by
piglit tests because ttn tends to make one big uniform file, so we only
had one UBO range with a src and dst offset of 0.
nir_opt_peephole_select has the job of removing IF statements with no side
effects. However, if the IF statement's successor didn't have any
instructions in it, we were skipping it, which occurred in mupen64 on vc4
with glsl_to_nir enabled:
instructions in affected programs: 6134 -> 4120 (-32.83%)
total uniforms in shared programs: 38268 -> 38219 (-0.13%)
No changes on Haswell shader-db.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v_fma_f32 runs at FP64 rate (= slow). Alien Isolation and F1 2015 seem
to use fma for all d3d multiply-add instructions, which is silly.
This tries to restore performance for those games.
The main difference between v_mad_f32 and v_fma_f32 is that v_mad doesn't
support denormals, which we don't enable anyway, because they are slow too.
Also, there is code size reduction:
Totals from affected shaders:
VGPRS: 109796 -> 109808 (0.01 %)
Spilled SGPRs: 29995 -> 30022 (0.09 %)
Spilled VGPRs: 12 -> 13 (8.33 %) <-- it's just one shader going from 12 to 13
Code Size: 6667596 -> 6476356 (-2.87 %) bytes
Max Waves: 26931 -> 26899 (-0.12 %)
I've not actually tested real performance.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We need to include mt->offset in the calculation of src pointer because its
value may be non-zero, for example in a cubemap texture.
Signed-off-by: Haixia Shi <hshi@chromium.org>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad@kiwitree.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Change-Id: I461ad5b204626d5a1c45611fc6b63735dcf29f63
swr rasterizer contains numerous data transfers between vectors
and ordinary C types. Fixing for strict aliasing will take time.
Reviewed-by: Matt Turner <mattst88@gmail.com>
AST_NUM_OPERATORS stores the dimension of the ast_operators
enumeration but was not updated after its last modification.
This doesn't add any real modification for any code paths but it makes
sense for coherence.
v2 (Eric Engestrom): Just place the define at the end of the
enumeration, not below.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Once upon a time (commit 8313f44409) Paul added code for the unlit
centroid workaround (WaCopyUnlitCentroidBarys). His commit message
claims it fixed the EXT_framebuffer_multisample/interpolation {2,4}
{centroid-deriv,centroid-deriv-disabled} piglit tests but does not say
on which platform, though he cites the IVB PRM.
"3DSTATE_WM [DevIVB, DevHSW]" says
"[DevIVB]: Workaround: When Centroid Barycentric mode is required, HW
may produce incorrect interpolation results when a 2X2 pixels have
unlit pixels."
I later disabled it for Haswell (commit f6db414f3c) with no known ill
effects.
The Sandybridge page does not have this text, but the workarounds
database (see WaCopyUnlitCentroidBarys) says the issues applies *only*
to Sandybridge, and in fact in commit 1a2de7dce8 I note that disabling
the workaround on Sandybridge causes the tests Paul originally mentioned
to fail.
So this is, and always has been, a huge confusing mess.
Disabling the workaround indeed causes the tests Paul originally
mentioned to fail on Sandybridge but not on Ivybridge/Baytrail.
On Ivybridge:
total instructions in shared programs: 6914901 -> 6909599 (-0.08%)
instructions in affected programs: 106766 -> 101464 (-4.97%)
helped: 884
total cycles in shared programs: 70874764 -> 70813774 (-0.09%)
cycles in affected programs: 794144 -> 733154 (-7.68%)
helped: 688
HURT: 186
LOST: 1
GAINED: 6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves performance of OglBatch7 by 4.06851% +/- 1.17925% (n=169) on
Haswell, and cuts ~18k of .text:
text data bss dec hex filename
5824627 287816 29384 6141827 5db783 before/i965_dri.so
5806354 287816 29384 6123554 5d7022 after/i965_dri.so
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This can be used by the driver to get the command line which started
the process. Will be used by the VMware driver for extra logging.
For now, this is only implemented for Linux via /proc/self/cmdline
and Windows via GetCommandLine().
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This patch eliminates the redundant SetVertexBuffers and
SetIndexBuffer commands that are emitted for rebind purpose.
With this patch, the set commands will be skipped, but we will still
reference the associated resources to allow the kernel to
bring in the resources.
Tested with Lightsmark2008, Valley, MTT glretrace, piglit, conform.
Reviewed-by: Brian Paul <brianp@vmware.com>
There are cases where we hit u_vbuf path due to alignment or pitch-
alignment restrictions, but for an output-format that u_vbuf does not
support translating (yet the driver does support natively). In which
case we hit the memcpy() path and don't care that u_vbuf doesn't
understand it.
Fixes crash with debug build of mesa in:
dEQP-GLES3.functional.vertex_arrays.single_attribute.strides.fixed.user_ptr_stride17_components2_quads1
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95000
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
AFAICT, it's never been used.
It was briefly nudged in the right direction here:
commit 10e5ffd496
Author: Emil Velikov <emil.l.velikov@gmail.com>
Date: Sat Jan 25 17:19:10 2014 +0000
gbm: do not export _gbm_mesa_get_device
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
The field is only read for printing today and
there it was probably a leftover.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
The way it is used today does not care about the
Enabled flag anymore.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead of gl_client_array::Enabled inside a VAO,
directly use the gl_vertex_attrib_array::Enabled value
which is the origin of the above.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Only a debugging function, but move away from
gl_client_array and use the first order information
from the VAO.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Only a debugging function, but move away from
gl_client_array and use the first order information
from the VAO. Also make use of gl_vert_attrib_name.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Only a debugging function, but move away from
gl_client_array and use the first order information
from the VAO. Also make use of gl_vert_attrib_name.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Similarily to _mesa_all_varyings_in_vbos walk the VAO
to check if we have an illegal mapped buffer object
instead of walking all gl_client_arrays.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
In vbo_draw_transform_feedback we currently look at
exec->array.inputs to determine if all varying
vertex attributes reside in vbos. But the vbo_bind_arrays
call only happens past the vbo_all_varyings_in_vbos
query. Thus we may work on a stale set of client arrays.
Using the current VAOs content for this query feels much
more logical to me.
Additionally with this change mesa makes more use of the
information already tracked in the VAO instead of looping
across VERT_ATTRIB_MAX vertex arrays.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Implement the equivalent of vbo_all_varyings_in_vbos for
vertex array objects.
v2: Update comment.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
When a vertex buffer object gets deleted, it is unbound
at the VAO. To do this use _mesa_bind_vertex_buffer instead
of plain unreferencing the buffer object. This keeps the VAOs
internal state consistent. In this case it showed up with
gl_vertex_array_object::VertexAttribBufferMask getting out of
sync.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
This was added with ARB_enhanced_layouts.
V2: Add an extra format specifier for the new qualifier.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
See commit b0629e6894, where we discovered
that the SOL stage's "Rendering Disable" feature is a lot faster at
throwing away all geometry than the clipper's "reject all" mode.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously, the bitshift would be performed on a simple int (32 bits on
most systems), overflow, and then be cast to 64 bits.
CovID: 1362461
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Rob Clark <robdclark@gmail.com>
We need to emit RB_FRAME_BUFFER_DIMENSION once per batch.. tracking this
in fd_context is wrong when the gmem code executes asynchronously from
the flush_queue worker. But in fact we don't really need to track it at
all. We cannot assume previous value at the beginning of the batch
(because of other processes potentially using the GPU), so just drop the
tracking and emit it in _tile_init().
Signed-off-by: Rob Clark <robdclark@gmail.com>
This is also used in gmem code, which executes from the "bottom half"
(ie. from the flush_queue worker thread), so it cannot be in fd_context.
Signed-off-by: Rob Clark <robdclark@gmail.com>
They weren't really used, and it gets somewhat more complicated to deal
with if batches are flushed asynchronously (on another thread). So just
drop them, and move _query_set_state(NULL) call into batch (so it is not
happening on background thread).
Signed-off-by: Rob Clark <robdclark@gmail.com>
With the state accessed from GMEM+submit factored out of fd_context and
into fd_batch, now it is possible to punt this off to a helper thread.
And more importantly, since there are cases where one context might
force the batch-cache to flush another context's batches (ie. when there
are too many in-flight batches), using a per-context helper thread keeps
various different flushes for a given context serialized.
TODO as with batch-cache, there are a few places where we'll need a
mutex to protect critical sections, which is completely missing at the
moment.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add a bit of extra book-keeping about blits and back-blits (from
resource shadowing). If the app uploads all mipmap levels, as opposed
to uploading the first level and then glGenerateMipmap(), we can discard
the back-blit (as opposed to being naive and shadowing the resource for
each mipmap level). Also, after a normal blit, we might as well flush
the batch immediately, since there is not likely to be further rendering
to the surface.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Push query state down to batch, and use the resource tracking to figure
out which batch(es) need to be flushed to get the query result.
This means we actually need to allocate the prsc up front, before we
know the size. So we have to add a special way to allocate an un-
backed resource, and then later allocate the backing storage.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Switch to using a pipe_resource (rather than an fd_bo directly) for hw
query result buffers. This is first step towards making queries work
properly with reordered batches, since we'll need the additional
dependency tracking to know which batches to flush.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Basically, to "DCE" blits triggered by resource shadowing, in cases
where the levels are immediately completely overwritten. For example,
mid-frame texture upload to level zero triggers shadowing and back-blits
to the remaining levels, which are immediately overwritten by
glGenerateMipmap().
Signed-off-by: Rob Clark <robdclark@gmail.com>
To make batch re-ordering useful, we need to be able to create shadow
resources to avoid a flush/stall in transfer_map(). For example,
uploading new texture contents or updating a UBO mid-batch. In these
cases, we want to clone the buffer, and update the new buffer, leaving
the old buffer (whose reference is held by cmdstream) as a shadow.
This is done by blitting the remaining other levels (and whatever part
of current level that is not discarded) from the old/shadow buffer to
the new one.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Note that I originally also had a entry-point that would construct a key
and do lookup from a pipe_surface. I ended up not needing that (yet?)
but it is easy-enough to re-introduce later if we need it for the blit
path.
For now, not enabled by default, but can be enabled (on a3xx/a4xx) with
FD_MESA_DEBUG=reorder.
Signed-off-by: Rob Clark <robdclark@gmail.com>
To flush batches out of order, the gmem code needs to not depend on
state from fd_context (since that may apply to a more recent batch).
So this all moves into batch.
The one exception is the gmem/pipe/tile state itself. But this is
only used from gmem code (and batches are flushed serially). The
alternative would be having to re-calculate GMEM layout on every
batch, even if the dimensions of the render targets are the same.
Note: This opens up the possibility of pushing gmem/submit into a
helper thread.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Introduce the batch object, to track a batch/submit's worth of
ringbuffers and other bookkeeping. In this first step, just move
the ringbuffers into batch, since that is mostly uninteresting
churn.
For now there is just a single batch at a time. Note that one
outcome of this change is that rb's are allocated/freed on each
use. But the expectation is that the bo pool in libdrm_freedreno
will save us the GEM bo alloc/free which was the initial reason
to implement a rb pool in gallium.
The purpose of the batch is to eventually facilitate out-of-order
rendering, with batches associated to framebuffer state, and
tracking the dependencies on other batches.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This bug seems to have always been there. Applications changing shaders
but not textures between draw calls would have gotten undefined behavior.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This just needs to be done by st_validate_state.
v2: add "shaders_may_be_dirty" flags for not skipping st_validate_state
on _NEW_PROGRAM to detect real shader changes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The goal is to do this in st_validate_state:
while (dirty)
atoms[u_bit_scan(&dirty)]->update(st);
That implies that atoms can't specify which flags they consume.
There is exactly one ST_NEW_* flag for each atom. (58 flags in total)
There are macros that combine multiple flags into one for easier use.
All _NEW_* flags are translated into ST_NEW_* flags in st_invalidate_state.
st/mesa doesn't keep the _NEW_* flags after that.
torcs is 2% faster between the previous patch and the end of this series.
v2: - add st_atom_list.h to Makefile.sources
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The pass I introduced in commit a2dc11a781
was entirely broken. A missing "break" made the load_interpolated_input
case always fall through to "default" and hit a "continue", making it
not actually move any load_interpolated_input intrinsics at all.
It would only move the simple load_barycentric_* intrinsics, which
don't emit any code anyway, making it basically useless.
The initial version I sent of the pass worked, but I apparently
failed to verify that the simplified version in v2 actually worked.
With the obvious fix applied (so we actually tried to move
load_interpolated_input intrinsics), I discovered a second bug: we
weren't moving the offset SSA def to the top, breaking SSA validation.
The new version of the pass actually moves load_interpolated_input
intrinsics and all their dependencies, as intended.
Papers over GPU hangs on Ivybridge and Baytrail caused by the
recent NIR FS input rework by restoring the old behavior.
(I'm not honestly sure why they hang with PLN not at the top.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97083
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Exported dmabufs can get imported by the same process, but the handle was
not getting added to the hash table on export. Add the handle to the hash
table on export.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We should use the persample_dispatch variable in prog_data.
Fixes all (~60) the DEQP sample shading tests. Many tests exited with
VK_ERROR_OUT_OF_DEVICE_MEMORY without this patch.
V2: Use the shader key bits set in brw_compile_fs (Jason)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
android.opengl.cts.WrapperTest#testGetIntegerv1 CTS test calls
eglTerminate, followed by eglReleaseThread. A similar case is
observed in this bug: https://bugs.freedesktop.org/show_bug.cgi?id=69622,
where the test calls eglTerminate, then eglMakeCurrent(dpy, NULL, NULL, NULL).
With the current code, dri2_dpy structure is freed on eglTerminate
call, so the display is not initialized when eglReleaseThread calls
MakeCurrent with NULL parameters, to unbind the context, which
causes a a segfault in drv->API.MakeCurrent (dri2_make_current),
either in glFlush or in a latter call.
eglTerminate specifies that "If contexts or surfaces associated
with display is current to any thread, they are not released until
they are no longer current as a result of eglMakeCurrent."
However, to properly free the current context/surface (i.e., call
glFlush, unbindContext, driDestroyContext), we still need the
display vtbl (and possibly an active dri dpy connection). Therefore,
we add some reference counter to dri2_egl_display, to make sure
the structure is kept allocated as long as it is required.
One drawback of this is that eglInitialize may not completely reinitialize
the display (if eglTerminate was called with a current context), however,
this seems to meet the EGL spec quite well, and does not permanently
leak any context/display even for incorrectly written apps.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The file was removed with earlier commit breaking 'make dist'.
Drop it from Makefile.sources since it's no longer around.
Fixes: 16985eb308 ("vc4: Switch to using the libdrm-provided
vc4_drm.h.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
When an application declares varying arrays but does not actually do any
indirect indexing, some array indices may end up unused in the consuming
shader, so the number of input slots that correspond to the array ends
up less than the array_size.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Without this GCC 4.8.x throws below error:
error: invalid initialization of non-const reference of type
'clover::llvm::compat::raw_ostream_to_emit_file {aka llvm::raw_svector_ostream&}'
from an rvalue of type '<brace-enclosed initializer list>'
v2: change commit title and add error message like Eric Engestrom requested
Signed-off-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97019
[ Francisco Jerez: Trivial formatting fix. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
These are only used by get_matching_input() which has been call
at this point so free the hash tables.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The number of outputs patch (limited to 255) has moved in the TCP
header, but blob seems to also set the old position. Also, the high
8-bits are now located inbetween the min/max parallel output read
address at position 20.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
This cuts down the overhead of si_dump_shader when ddebug is capturing
shader logs, which is done for every draw call unconditionally (that's
quite a lot of work for a draw call).
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
For good performance while being able to generate decent hang reports.
The report doesn't contain the parsed IB and the buffer list, but it
isolates the draw call and dumps shaders while not having to flush
the context.
This is for GPU hangs that are harder to reproduce and require interactive
playing for minutes or even hours.
dd_pipe.h explains some implementation details. Initializing, copying
(recording) and clearing states is most of the code.
The performance should be at least 50% of the normal performance depending
on the circumstances. (i.e. 50% is expected to be the worst case scenario,
not the best case) The majority of time is spent in
dump_debug_state(PIPE_DUMP_CURRENT_SHADERS) and that's after all
the optimizations in later patches. There is no obvious way to optimize
that further.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The pipelined hang detection mode will not want to dump everything.
(and it's also time consuming) It will only dump shaders after a draw call
and then dump the status registers separately if a hang is detected.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It is necessary to reuse existing BOs when dmabufs are imported. There
are 2 cases that need to be handled. dmabufs can be created/exported and
imported by the same process and can be imported multiple times.
Copying other drivers, add a hash table to track exported BOs so the
BOs get reused.
v2: Whitespace fixup (by anholt)
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We could use the nir_shader_gather_info() pass to update it after the
fact, but this is what glsl_to_nir and prog_to_nir do.
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Shaders with shProg->Name == ~0 (aka 4294967295) are internal meta
shaders that we don't really want to capture.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Compilers are perfectly capable of generating efficient code for calls
like these to memcpy().
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I do appreciate the cleverness, but unfortunately it prevents a lot more
cleverness in the form of additional compiler optimizations brought on
by -fstrict-aliasing.
No difference in OglBatch7 (n=20).
Co-authored-by: Davin McCall <davmac@davmac.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that the logical_depth0 field is in number of 2D slices, we don't need
to be multiplying by 6 when creating the surface. It wasn't hurting
anything primarily because we get the actual length from the view which was
already handling it correctly.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
intel_mipmap_tree::logical_depth0 is now in 2-D slices so there is no need
for us to multiply by 6 when we go to fill out a blorp surface state.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When possible, do the memcpy on larger blocks. This reduces cycles
spent in _mesa_propagate_uniforms_to_driver_storage from
1.51 % to 0.62% according to perf during the Unigine Heaven benchmark.
It did not affect the framerate of the benchmark. The system used for
testing was an i5 6600K with a Radeon R9 380.
Piglit hangs randomly on this system both with and without the patch
so i could not make a comparison.
v2: fixed whitespace
Signed-off-by: Nils Wallménius <nils.wallmenius@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Enable H.264 VAAPI encoding through config. Currently only H.264 baseline is supported. Encode entrypoint is not accepted by driver.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Frame rate can be passed to driver either through VAEncSequenceParameterBufferType or VAEncMiscParameterTypeFrameRate. Previous code only implement the former one, which is used by Gstreamer-Vaapi. Now adding implementation for VAEncMiscParameterTypeFrameRate. Also adding default frame rate as 30 just in case application never provides frame rate information to driver.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Add environmental variable to disable interlace mode. At VAAPI decoding stage, driver can not distinguish b/w pure decoding case and transcoding case. And since interlace encoding is not supported, we have to disable interlace for transcoding case. The temporary solution is to use enviromental variable to disable interlace mode.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Add some hardcoded values hardware needs mainly for rate control purpose. With previously hardcoded values for OMX, the rate control result is not correct. This change fixed the rate control result by setting correct values for Vaapi.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Add necessary functions/changes for VAAPI encoding to buffer and picture. These changes will allow driver to handle all Vaapi encode related operations. This patch doesn't change the Vaapi decode behaviour.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Rate control method is passed from app to driver through config attrib list.
That is why we need to store this rate control method to config. And later
on, we will pass this value to context->desc.h264enc.rate_ctrl.rate_ctrl_method.
v2 (chk): fix broken build and commit message
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
For putimage call, if image format is yv12 (or IYUV with U V field swap) and
surface format is nv12, then we need to convert yv12 to nv12 and then copy
the converted data from image to surface. We can't use the existing logic
where surface is destroyed and re-created with yv12 format.
v2 (chk): fix some compiler warnings and commit message
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Add function to copy from yv12 image to nv12 surface for VAAPI putimage call.
We need this function in VaPutImage call where copying from yv12 image to nv12
surface for encoding. Existing function can't be used because it only work for
copying from yv12 surface to nv12 image in Vaapi.
v2: cleanup variable types and commit message
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
VAAPI passes PIPE_VIDEO_ENTRYPOINT_ENCODE as entry point for encoding case. We
will save this encode entry point in config. config_id was used as profile
previously. Now, config has both profile and entrypoint field, and config_id is
used to get the config object. Later on, we pass this entrypoint to
context->templat.entrypoint instead of always hardcoded to
PIPE_VIDEO_ENTRYPOINT_BITSTREAM for decoding case previously. Encode entrypoint
is not accepted by driver until we enable Vaapi encode in later patch.
v2 (chk): fix commit message to match 80 chars, use switch instead of ifs,
fix memory leaks in the error path, implement vlVaQueryConfigEntrypoints
as well, drop VAEntrypointEncPicture (only used for JPEG).
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
This fixes a bunch of multisample piglit tests on GM206, like
bin/arb_texture_multisample-texelfetch 2 -auto -fbo
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
It's illegal to have neg modifiers on both sources for OP_ADD,
and it's illegal to have OP_SUB with just src0 neg.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Previously we were only restricting based on ES/non-ES-ness and whether
the overall enable bit had been flipped on. However we have been adding
more fine-grained restrictions, such as based on compat profiles, as
well as specific ES versions. Most of the time this doesn't matter, but
it can create awkward situations and duplication of logic.
Here we separate the main extension table into a separate object file,
linked to the glsl compiler, which makes use of it with a custom
function which takes the ES-ness of the shader into account (thus
allowing desktop shaders to properly use ES extensions that would
otherwise have been disallowed.) We can also now use this logic to
generate #define's for all supported extensions automatically, removing
the duplicate (and often inaccurate) list in glcpp.
The effect of this change should be nil in most cases. However in some
situations, extensions like GL_ARB_gpu_shader5 which were formerly
available in compat contexts on the GLSL side of things will now become
inaccessible.
This regresses two ES CTS tests:
ES3-CTS.shaders.shader_integer_mix.define
ES31-CTS.shader_integer_mix.define
however that is due to them using #version 100 instead of 300 es. As the
extension is only defined for ES3, I believe this is the correct
behavior.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v2)
v2 -> v3: integrate glcpp defines into the same mechanism
We need "NULL" state to be a valid bit in the bitmask, because timestamp
queries are not restricted to draw/etc stages (ie. the only commands to
submit may just be to read the timestamp). And just because there are
no draws, isn't a reason to skip the flush and return zero.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Since commit d938b8c, the sample locations are no longer set unconditionally,
so we need to set the atom to dirty on all chips, not just Polaris.
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
The regression was introduced by commit d938b8c. The problem here is that in
order to use the small primitive filter, we need to explicitly set the sample
locations to 0. But the DB doesn't properly process the change of sample
locations without a flush, and so we can end up with incorrect Z values.
Instead of doing a flush, just disable the small primitive filter when MSAA
is force-disabled.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96908
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
v2: no need for break after an unreachable (Matt Turner)
Signed-off-by: Francesco Ansanelli <francians@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
to reduce the call indirections with u_resource_vtbl.
The worst call tree you could get was:
- u_transfer_inline_write_vtbl
- u_default_transfer_inline_write
- u_transfer_map_vtbl
- driver_transfer_map
- u_transfer_unmap_vtbl
- driver_transfer_unmap
That's 6 indirect calls. Some drivers only had 5. The goal is to have
1 indirect call for drivers that care. The resource type can be determined
statically at most call sites.
The new interface is:
pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data)
pipe_context::texture_subdata(ctx, resource, level, usage, box, data,
stride, layer_stride)
v2: fix whitespace, correct ilo's behavior
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
One of the WebGL 2.0 conformance tests is trying to call
glGenerateMipmaps with a width and height of 0. With the meta
implementation, this generates a "framebuffer attachment incomplete"
status, and falls back to the CPU path, calling MapTextureImage.
Except that there's no actual texture to map, and we assert fail.
There's no work to do in this case. The test expects it to succeed,
so just return early with no error and avoid hassling the driver.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96911
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
For lod query instructions, we really don't care whether or not the sampler
is an array type because that doesn't factor into the LOD.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
On i965, we can't support coordinate offsets for texelFetch or rectangle
textures. Previously, we were doing this with a GLSL pass but we need to
do it in NIR if we want those workarounds for SPIR-V.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
While SPIR-V technically doesn't support "old style" shadow, the
shadow-compare gather instruction does return a vec4 so we need to be able
to set the old_style_shadow bit in NIR.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-dev@lists.freedesktop.org>
This allows the load-propagation pass to swap the sources in presence
of immediate values.
Maxwell (GM107):
total instructions in shared programs :1928187 -> 1927634 (-0.03%)
total gprs used in shared programs :330741 -> 330154 (-0.18%)
total local used in shared programs :28032 -> 28032 (0.00%)
local gpr inst bytes
helped 0 271 425 425
hurt 0 0 194 194
Fermi (GF114):
total instructions in shared programs :2334474 -> 2333829 (-0.03%)
total gprs used in shared programs :380934 -> 380215 (-0.19%)
total local used in shared programs :33304 -> 33264 (-0.12%)
local gpr inst bytes
helped 5 314 521 521
hurt 0 4 195 195
No regressions on GM107 and GF114 with full piglit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
There are 2 uses:
- Asynchronous flushing for multithreaded drivers.
- Return a fence without flushing (mid-command-buffer fence). The driver
can defer flushing until fence_finish is called.
This is required to make Bioshock Infinite faster, which creates
1000 fences (flushes) per frame.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
subroutine variables are to be used just in the way functions are
called. Although the spec doesn't say it explicitely, this means that
these variables are not to be used in any other way than those left
for function calls. Therefore, a comparison between 2 subroutine
variables should also cause a compilation error.
From The OpenGL® Shading Language 4.40, page 117:
" To use subroutines, a subroutine type is declared, one or more
functions are associated with that subroutine type, and a
subroutine variable of that type is declared. The function
currently assigned to the variable function is then called by
using function calling syntax replacing a function name with the
name of the subroutine variable. Subroutine variables are
uniforms, and are assigned to specific functions only through
commands (UniformSubroutinesuiv) in the OpenGL API."
From The OpenGL® Shading Language 4.40, page 118:
" Subroutine uniform variables are called the same way functions
are called. When a subroutine variable (or an element of a
subroutine variable array) is associated with a particular
function, all function calls through that variable will call that
particular function."
Fixes GL44-CTS.shader_subroutine.subroutines_cannot_be_assigned_float_int_values_or_be_compared
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since 7f53fead5c we treat every location as using all
four components so we only need special handling for
doubles when they cross multiple locations.
This fixes a crash in GL45-CTS.enhanced_layouts.varying_locations
where the outputs array would overflow when a dmat2 was stored at
the max varying location i.e 30.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This fixes a regression introduced in
1da704a94c because the offset has moved
from 0x180 to 0x1a0, and the macros have to be re-compiled.
Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes a regression introduced in
1da704a94c because the offset has moved
from 0x600 to 0x620, and the kernels used for reading MP perf counters
have to be re-assembled.
This also fixes amd_performance_monitor_measure piglit.
Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The GL_EXT_texture_format_BGRA8888 extension specification defines a
GL_BGRA_EXT unsized internal format (which is a little odd - usually
BGRA is a pixel transfer format). The extension is written against
the ES 1.0 specification, so it's a little hard to map, but I believe
it's effectively adding it to the table used here, so we should allow
it here as well.
Note that GL_EXT_texture_format_BGRA8888 is always enabled (dummy_true),
so we don't need to check if it's enabled here.
This fixes mipmap generation in Skia and ChromeOS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
References: https://bugs.chromium.org/p/chromium/issues/detail?id=630371
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reported-by: Stéphane Marchesin <marcheu@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
Commit 52e75dcb8c made nir_lower_io
start using nir_intrinsic_set_base instead of writing const_index[0]
directly. However, those intrinsics apparently don't /have/ a base,
so this caused assert failures.
However, the old code was happily setting non-existent const_index
fields, so it was pretty bogus too.
Jason pointed out that load_shared and store_shared have a base,
and that the i965 driver uses that field. So presumably atomics
should have one as well, so that loads/stores/atomics all refer
to variables with consistent addressing.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
We always resort to the pull model for instanced GS inputs. So, we'd
better include the VUE handles, or else we can't actually pull anything.
Ian reports that on his branch with OES_geometry_shader enabled,
this fixes a bunch of dEQP-GLES31.functional.geometry_shading tests::
- instanced.draw_2_instances_geometry_2_invocations
- instanced.draw_2_instances_geometry_8_invocations
- instanced.draw_4_instances_geometry_2_invocations
- instanced.draw_4_instances_geometry_8_invocations
- instanced.draw_8_instances_geometry_2_invocations
- instanced.draw_8_instances_geometry_8_invocations
- instanced.geometry_2_invocations
- instanced.geometry_32_invocations
- instanced.geometry_8_invocations
- instanced.geometry_max_invocations
- instanced.geometry_output_different_2_invocations
- instanced.geometry_output_different_32_invocations
- instanced.geometry_output_different_8_invocations
- instanced.geometry_output_different_max_invocations
- instanced.invocation_output_vary_by_attribute
- instanced.invocation_output_vary_by_texture
- instanced.invocation_output_vary_by_uniform
- query.primitives_generated_instanced
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Not sure why I forgot to add them to CXXFLAGS in commit f55c408067 or
commit 875458b778. Cuts about 1k of .text.
text data bss dec hex filename
5806354 287816 29384 6123554 5d7022 i965_dri.so before
5805497 287744 29384 6122625 5d6c81 i965_dri.so after
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
According to the referenced bug report, gcc-4.5 and newer do not inline
memcmp(). I see no difference in performance of ipers with llvmpipe on a
Sandybridge (which does not have "Enhanced REP MOVSB/STOSB") by removing
this flag.
I attempted to confirm the problem with gcc-4.4, but it fails to compile
for quite a few different reasons.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Based loosely on patches submitted ages ago by Thomas Helland.
v2: Add lots of missing data provided by Ilia. Fix sort order of
GL_ARB_sparse_texture extensions suggested by Ilia.
v3: Note that Dave Airlie has started work on GL_ARB_bindless_texture.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Here we create a new output_generic_reg array with the ability to
store the dst_reg for each component of user defined varyings.
This is needed as the previous code only stored the dst_reg based
on the varying location which meant packed varyings would overwrite
each other.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
For example where n=3 first_component=1 this will give us
0xE (WRITEMASK_YZW).
V2:
Add assert to check first component is <= 4 (Suggested by Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
This will be used to swizzle components to the beginning or end
of the vector based on the component layout qualifier and whether
we are doing a load or store.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
This makes sure we give the correct driver location
for doubles when using component packing. Specifically
it handles packing a dvec3 with a double which is the
only packing scenario allowed which spans across two
locations.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Here we use the component qualifier (which is the first component)
as an offset when loading output varyings.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rather than trying to work out the total number of components
used at a location we simply treat all outputs as vec4s. This
removes the need for complex code looping over varyings to match
packed locations and the need for storing the total number of
components used at each location.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We will use this for output varyings. To make component
packing simpler we will just treat all varyings as vec4s.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The build was failing because the official CL headers have a few defines,
like: # define cl_khr_gl_sharing 1
Which have the same name as some class members of clang's OpenCLOptions class.
If we include the cl headers first, this breaks the build because the member
names of this class are replaced by the literal 1.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
clang commit r275822 removed unnecessary includes from header files,
so we now need to explicitly include clang/Lex/PreprocessorOptions.h
v2:
- Use <> instead of "" for the include path.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
TCS/TES/GS and now FS all handle these in stage-specific functions.
CS don't have inputs, so VS was the only one left using this code.
Move it to the VS-specific function for clarity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We no longer use this message. As far as I can tell, it's fairly
useless - the equivalent information is provided in the payload.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This eliminates the need to walk the list of input variables, recurse
into their types (via logic largely redundant with nir_lower_io), and
interpolate all possible inputs up front. The backend no longer has
to care about variables at all, which eliminates complications from
trying to pack multiple variables into the same location. Instead,
each intrinsic specifies exactly what's needed.
This should unblock Timothy's work on GL_ARB_enhanced_layouts.
Each load_interpolated_input intrinsic corresponds to PLN instructions,
while load_barycentric_at_* intrinsics correspond to pixel interpolator
messages. The pixel/centroid/sample barycentric intrinsics simply refer
to payload fields (delta_xy[]), and don't actually generate any code.
Because we use a single intrinsic for both centroid-qualified variables
and interpolateAtCentroid(), they become indistinguishable. We stop
sending pixel interpolator messages for those, and instead use the
payload provided data, which should be considerably faster.
On Broadwell:
total instructions in shared programs: 9067751 -> 9067570 (-0.00%)
instructions in affected programs: 145902 -> 145721 (-0.12%)
helped: 422
HURT: 209
total spills in shared programs: 2849 -> 2899 (1.76%)
spills in affected programs: 760 -> 810 (6.58%)
helped: 0
HURT: 10
total fills in shared programs: 3910 -> 3950 (1.02%)
fills in affected programs: 617 -> 657 (6.48%)
helped: 0
HURT: 10
LOST: 3
GAINED: 3
The differences mostly appear to be slight changes in MOVs.
v2: Use nir_shader_compiler_options::use_interpolated_input_intrinsics
flag rather than passing it directly to nir_lower_io. Use the
unreachable() macro rather than assert in one place. (Review
feedback from Chris Forbes.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Currently, i965 interpolates all FS inputs at the top of the program.
This has advantages and disadvantages, but I'd like to keep that policy
while reworking this code. We can consider changing it independently.
The next patch will make the compiler generate PLN instructions "on the
fly", when it encounters an input load intrinsic, rather than doing it
for all inputs at the start of the program.
To emulate this behavior, we introduce an ugly pass to move all NIR
load_interpolated_input and payload-based (not interpolator message)
load_barycentric_* intrinsics to the shader's start block.
This helps avoid regressions in shader-db for cases such as:
if (...) {
...load some input...
} else {
...load that same input...
}
which CSE can't handle, because there's no dominance relationship
between the two loads. Because the start block dominates all others,
we can CSE all inputs and emit PLNs exactly once, as we did before.
Ideally, global value numbering would eliminate these redundant loads,
while not forcing them all the way to the start block. When that lands,
we should consider dropping this hacky pass.
Again, this pass currently does nothing, as i965 doesn't generate these
intrinsics yet. But it will shortly, and I figured I'd separate this
code as it's relatively self-contained.
v2: Dramatically simplify pass - instead of creating new instructions,
just remove/re-insert their list nodes (suggested by Jason Ekstrand).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When working with a non-multisampled render target, asking for "sample"
interpolation locations doesn't make sense. We demote them to centroid.
In a couple of patches, brw_compute_barycentric_modes will begin looking
at these intrinsics to determine the barycentric modes. fs_visitor also
will use them to code-generate pixel interpolator messages or payload
references. Handling the "but what if it's not MSAA?" logic ahead of
time in a NIR pass simplifies things and prevents duplicated logic.
This patch doesn't actually do anything useful yet as we don't generate
these intrinsics. I decided to keep it separate as it's self-contained,
in the hopes of shrinking the "convert everything" patch for reviewers.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now nir_lower_io can optionally produce load_interpolated_input
and load_barycentric_* intrinsics for fragment shader inputs.
flat inputs continue using regular load_input.
v2: Use a nir_shader_compiler_options flag rather than ad-hoc boolean
passing (in response to review feedback from Chris Forbes).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Backends can normally handle shader inputs solely by looking at
load_input intrinsics, and ignore the nir_variables in nir->inputs.
One exception is fragment shader inputs. load_input doesn't capture
the necessary interpolation information - flat, smooth, noperspective
mode, and centroid, sample, or pixel for the location. This means
that backends have to interpolate based on the nir_variables, then
associate those with the load_input intrinsics (say, by storing a
map of which variables are at which locations).
With GL_ARB_enhanced_layouts, we're going to have multiple varyings
packed into a single vec4 location. The intrinsics make this easy:
simply load N components from location <loc, component>. However,
working with variables and correlating the two is very awkward; we'd
much rather have intrinsics capture all the necessary information.
Fragment shader input interpolation typically works by producing a
set of barycentric coordinates, then using those to do a linear
interpolation between the values at the triangle's corners.
We represent this by introducing five new load_barycentric_* intrinsics:
- load_barycentric_pixel (ordinary variable)
- load_barycentric_centroid (centroid qualified variable)
- load_barycentric_sample (sample qualified variable)
- load_barycentric_at_sample (ARB_gpu_shader5's interpolateAtSample())
- load_barycentric_at_offset (ARB_gpu_shader5's interpolateAtOffset())
Each of these take the interpolation mode (smooth or noperspective only)
as a const_index, and produce a vec2. The last two also take a sample
or offset source.
We then introduce a new load_interpolated_input intrinsic, which
is like a normal load_input intrinsic, but with an additional
barycentric coordinate source.
The intention is that flat inputs will still use regular load_input
intrinsics. This makes them distinguishable from normal inputs that
need fancy interpolation, while also providing all the necessary data.
This nicely unifies regular inputs and interpolateAt functions.
Qualifiers and variables become irrelevant; there are just
load_barycentric intrinsics that determine the interpolation.
v2: Document the interp_mode const_index value, define a new
BARYCENTRIC() helper rather than using SYSTEM_VALUE() for
some of them (requested by Jason Ekstrand).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Gen7/7.5 call it "Rendering Disable" while Gen8/9 prefix it with "API".
Pick one for consistency, and so we can share code between generations.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The bulk of this is the same. There are just a couple fields that only
exist on one generation or another, and we can easily handle those with
an #ifdef.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The Gen7/7.5 clip code used APIMODE_OGL, while the Gen8+ clip code used
APIMODE_D3D. The meaning hasn't changed, so one of these must be wrong.
It appears that the hardware documentation is completely wrong. It
claims that the "API Mode" bit means:
0h APIMODE_OGL NEAR_VP boundary == 0.0 (NDC)
1h APIMODE_D3D NEAR_VP boundary == -1.0 (NDC)
However, DirectX typically uses 0.0 for the near plane, while unextended
OpenGL uses -1.0. i965's gen6_clip_state.c uses APIMODE_D3D for the
GL_ZERO_TO_ONE case, so I believe the meanings are backwards from what
the documentation says.
Section 23.2 ("Primitive Clipping") of the Vulkan 1.0.21 specification
contains the following equations:
-w_c <= x_c <= w_c
-w_c <= y_c <= w_c
0 <= z_c <= w_c
This means that Vulkan follows D3D semantics.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Refactoring to leave existing simd_* intrinsics in "simdintrin.h" unchanged,
adding corresponding simd16_* intrinsics in "simd16intrin.h" on the side,
with emulation, that we can use piecemeal, rather than the all-or-nothing
approach to bring up avx512.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
Add support for enhanced attribute swizzling. Currently supports constant
source overrides to handle PrimitiveID support. No support yet for input
select swizzling or wrap shortest. Removes obsoleted linkageMask and
associated code.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
Moved the setting into the existing component control code. Fixes bad
interaction between attribute/component setting for vertex/instance ID
and component packing.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
From the Sky Lake PRM:
"For SURFTYPE_CUBE: For Sampling Engine Surfaces and Typed Data Port
Surfaces, the range of this field is [0,340], indicating the number of
cube array elements (equal to the number of underlying 2D array elements
divided by 6). For other surfaces, this field must be zero."
In other words, the depth field for cube maps is in number of cubes not
number of 2-D slices so we need to divide by 6. ISL will do this correctly
for us assuming that we provide it with the correct array bounds which it
expects to be in 2-D slices. It appears as if we've been doing this wrong
ever since we first added cube map arrays for Sandy Bridge and the change
to ISL made things slightly worse. While we're at it, we now need to remoe
the shader hacks we've always done since they were only needed because we
were setting the depth field six times too large.
v2: Fix the vec4 backend as well (not sure how I missed this).
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
This matches what we do for cube maps where logical_depth0 is in number of
face-layers rather than number of cubes. This does mean that we will
temporarily be setting the surface bounds too loose for cube map textures
but we are already setting them too loose for cube arrays and we will be
fixing that in the next commit anyway.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Cc: "12.0 11.2 11.1" <mesa-stable@lists.freedesktop.org>
The GL API and mesa internals do this differently than we do. In GL, there
is no depth parameter for 1-D arrays and height is used. In the i965
miptree code we do the sane thing and make height == 1 and use depth for
number of slices. This makes for a mismatch every time we create a 1-D
array texture from GL. Instead of actually solving this problem, we just
said "1-D is hard, let's make sure it works no matter which way we pass the
parameters" and called it a day.
This commit fixes the one GL -> i965 transition point where we weren't
already handling 1-D array textures to do the right thing and then replaces
the magic fixup code with an assert that you're doing the right thing.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Cc: "12.0 11.2 11.1" <mesa-stable@lists.freedesktop.org>
This 'last' variable used in FindGLXFunction(...) may become negative,
but has been defined as unsigned int resulting in an overflow,
finally resulting in a segfault when accessing _glXDispatchTableStrings[...].
Fixed this by definining it as signed int. 'first' variable also needs to be
defined as signed int. Otherwise condition for while loop fails due to C
implicitly converting signed to unsigned values before comparison.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Stefan Dirsch <sndirsch@suse.de>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Current implementation of the DRI image loader does not free the images
created in get_back_bo() and so leaks memory. Moreover, it creates a new
image every time the DRI driver queries for buffers, even if the backing
native buffer has not changed. leaking memory again.
This patch adds missing call to destroyImage() in droid_enqueue_buffer()
and a check if image is already created to get_back_bo() to fix the
above.
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
It is much easier to debug issues when the application gives some
meaningful error messages. This patch adds few to the EGL Android
platform backend.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
As the spec allows for {server,client}_wait_sync to be called without
currently bound context, while our implementation requires context
pointer.
v2: Add a mutex and acquire it for the duration of
brw_fence_client_wait() and brw_fence_is_completed() as suggested
by Chad.
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Without this, if a configuration is, say, available only on GLES2/3, but
not on GLES1, and is rejected by the dri module's bindContext call,
eglMakeCurrent fails with error "EGL_SUCCESS".
In this patch, we set error to EGL_BAD_MATCH, which is what CTS/dEQP
dEQP-EGL.functional.surfaceless_context expect.
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
An earlier patch fixed the problem for classic drivers, however Gallium
was still left broken. This patch applies the same workaround to
Gallium, when compiled for Android. Following is a quote from the
original patch:
0cbc90c57c mesa: dri: Add shared glapi to LIBADD on Android
/system/vendor/lib/dri/*_dri.so actually depend on libglapi: without
this, loading the so file fails with:
cannot locate symbol "__emutls_v._glapi_tls_Context"
On non-Android (non-bionic) platform, EGL uses the following
workflow, which works fine:
dlopen("libglapi.so", RTLD_LAZY | RTLD_GLOBAL);
dlopen("dri/<driver>_dri.so", RTLD_NOW | RTLD_GLOBAL);
However, bionic does not respect the RTLD_GLOBAL flag, and the dri
library cannot find symbols in libglapi.so, so we need to link
to libglapi.so explicitly. Android.mk already does this.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
In the case of building in out-of-tree fashion, while having generated
in-tree sources, the latter [likely stale] files will be used.
Flip the order to prevent that.
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
This allows Gallium drivers to advertise the subpixel precision
for floating point viewports bounds.
v2:
- Set ViewportSubpixelBits in st_init_limits.
Signed-off-by: Józef Kucia <joseph.kucia@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
MS images have to be handled explicitly and I don't plan to implement
them for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
On Maxwell, images binding is slightly different (and much better)
regarding Fermi and Kepler because a texture view needs to be uploaded
for each image and this is going to simplify the thing a lot.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Currently, we can store 32 tex handles of 32-bits integer each and
that fits perfectly with the underlying hardware except on GM107+
which requires to upload a texture view for each images.
This patch increases the number of storable texture handles in the
driver constant buffer from 32 to 40 because we expose 8 images.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
For intrinsics we don't care about, just skip to the next loop iteration
and process the next instruction. We don't want to execute the rest of
the code.
This was a bug in commit cdfc05ea6e.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes a 10-20% performance regression in OglCSDof caused by commit
5a8c89038a, which made images (in the
image load/store sense) use BDW_MOCS_PTE instead of BDW_MOCS_WB.
This seems sketchy, as the default PTE value is supposed to be
WB LLC eLLC, which is the same as our MOCS WB setting. It's only
supposed to change when using a surface for display, which won't
ever happen for images. Something may be wrong in the kernel...
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I noticed this when I tried to do frexp(float(some_unsigned)) in the
ir_unop_find_lsb lowering pass. The code generated for frexp() uses
fabs, and this resulted in an extra instruction. Ultimately I ended up
not using frexp.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously SHADER_OPCODE_MULH could only exist on Gen7+, so the
assertion assumed the Gen7+ accumulator rules. A future patch will
allow this instruction on at least Gen6, so update the assertion.
v2: Use get_lowered_simd_width instead of open coding it. Suggested by
Curro.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
v2: Retype LZD source as UD to avoid potential problems with 0x80000000.
Suggested by Matt. Also update comment about problem values with
LZD(abs(x)). Suggested by Curro.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This uses one less instruction.
v2: Move emit_find_msb_using_lzd out of the visitor classes. Suggested
by Curro.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
With the existing lowering passes, the functions from this extension
become a bunch of bit twiddling operations that have always been
supported.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This extension does not depend on the Gen. It only depends on the
availability of GLSL 1.30.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This isn't the lowering pass you want. Most GPUs that can support GLSL
1.30 have a multiply unit that can do something more interesting than
32x32->32. Many have 32x16->48. Any GPU that does, should do the
lowering in the backend. This is just the thing that will always work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: Fix typo in #extension line noticed by Ken.
v3: Update spec status.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Add entrypoint to distinguish H.264 decode and encode. For example, in patch
5/11 when is calling "VaCreateContext", "pps" and "sps" shouldn't be allocated
for H.264 encoding. So we need to use the entry_point to determine this is
H.264 decode or H.264 encode. We can use config to determine the entrypoint
since config_id is passed to us for VaCreateContext call. However, for
VaDestoyContext call, only context_id is passed to us. So we need to know the
entrypoint in order to not free the pps/sps for encoding case.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Mark both L8_SRGB and L8A8_SRGB as non-renderable (the latter already
didn't have the bind flags). This makes the state tracker pick a
different format when rendering is required, or mark the fb as
incomplete. This fixes:
bin/getteximage-formats init-by-clear-and-render -auto -fbo
bin/getteximage-formats init-by-rendering -auto -fbo
which previously ran into srgb-encoding differences.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
This is useful for pbo downloads, which are now accelerated with images.
BGRA8 is a moderately common format to do that in.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
At this point there is no reason not to be using the linked shaders,
using the linked shaders should be faster and will make things simpler
for upcoming shader cache work.
The previous variable name suggests the linked shaders were intended
to be used here anyway.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This reverts commit 82f8c23950.
KHR_texture_compression_astc_sliced_3d is not a requirement for
GLES 3.2.
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>\
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
This way we have unlimited UVD sessions.
v2: only enable it when kernel supports it as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Inspired by fix for mem leak of vdpau interop, resource_from_handle
set texture reference count, that need to be decreased and released,
recall there is a similar case for DRI3, that is with VA-API glx
extension, there is temporary TFP(texture from pixmap), we target it
through dma-buf. leak happens when without count down the reference.
Checked and found with mpv vo=opengl case, there only one static TFP,
the leak happens once, but for totem player using gstreamer VA-API glx,
the dynamic TFP for each frame, so leak quite a bit.
This fixes mem leak for mpv and totem.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
We totally ignored this before because there were no piglit tests for
indirect loads in tessellation stages with doubles.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Our indirect URB read messages take both a direct and an indirect offset
so when we emit the second message for a 64-bit input load we can just
always incremement the immediate offset, even for the indirect case.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
pulls_bary should be set when the shader uses a pixel interpolator
message. So, setting it from the function that emits pixel interpolator
messages makes a lot of sense.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This patch makes emit_general_interpolation take a destination register
as an argument, and write directly to that. This is simpler than the
old approach of ralloc'ing a register, writing to that temporary, and
then making the caller emit per-component MOVs to copy it to the actual
destination.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The unlit centroid workaround starts being necessary on Gen6, which
is the first platform with multisampling. PLN exists on G45+, so all
platforms which need this workaround have PLN.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
glsl_to_nir always produces a system value for gl_FrontFacing, rather
than an input. So there should never be an input with this slot,
making this code dead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Likewise, rename the enum type to glsl_interp_mode.
Beyond the GLSL front-end, talking about "interpolation modes" seems
more natural than "interpolation qualifiers" - in the IR, we're removed
from how exactly the source language specifies how to interpolate an
input. Also, SPIR-V calls these "decorations" rather than "qualifiers".
Generated by:
$ find . -regextype egrep -regex '.*\.(c|cpp|h)' -type f -exec sed -i \
-e 's/INTERP_QUALIFIER_/INTERP_MODE_/g' \
-e 's/glsl_interp_qualifier/glsl_interp_mode/g' {} \;
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Dave Airlie <airlied@redhat.com>
The hardware can only do alphatest when using a blendable format. This
means that the various *16 norm formats didn't work with alphatest. It
appears that Talos Principle uses such formats, as well as alpha tests,
for some internal renders, which made them be incorrect. However this
does not appear to affect the final renders, but in a different game it
easily could.
The approach we take is that when alphatests are enabled and a suitable
format is used (which we anticipate is the vast minority of the time),
we insert code into the shader to perform the comparison and discard.
Once inserted, that code lives in the shader forever, and we re-upload
it each time the function changes with a fixed-up compare. To avoid
re-uploading too often, if we switch back to a blendable format, the
test is (effectively) disabled and the hw alphatest functionality is
used.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
In d035d50 this changed to 64b.. which I'm pretty sure was
unintentional. Revert it back to 32b so the entire state struct
is a nice round 64b.
(Note sure that it would actually be measurable, but I did notice
that check_state() was hot in some benchmarks.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Adds a second optional cleanup callback, called after the fence is
signaled. This is needed if, for example, the queue has the last
reference to the object that embeds the util_queue_fence. In this
case we cannot drop the ref in the main callback, since that would
result in the fence being destroyed before it is signaled.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
According to firmware guys, the new sequence that we added for Polaris should
work on all CIK parts, and should actually be faster on some parts.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
I recently refactored this to share code between load and atomic
lowering. loads used intrin->num_components, while atomics used
intrin->dest.ssa.num_components. They should be equivalent, but
Jason wanted me to use the latter. I missed applying his review.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is more readable and also offers assertions that protect against
setting const_index fields on the wrong kind of intrinsic.
Suggested by Jason.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The original function was becoming a bit hard to read, with the details
of creating and filling out load/store/atomic atomics all in one
function.
This patch makes helpers for creating each type of intrinsic, and also
combines them with the *_op() helpers, as they're closely coupled and
not too large.
v2: Minor style nits from Jason.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This can't happen, the caller asserts that mode is shader_out or shared.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Both loads and atomics had identical code to rewrite destinations,
and all cases had the same two lines to replace instructions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The load/store/atomic cases all duplicated the get_io_offset code, with
a few tiny differences: stores didn't bother checking for per-vertex
inputs, because they can't be stored to, and atomics didn't check at
all, since shared variables aren't per-vertex.
However, it's harmless to check, and allows us to share more code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Less typing and word wrapping issues than intrin->variables[0]->var.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rather than computing the barycentric mode each time we emit a LINTERP,
we can simply compute it once, as soon as we know we're doing non-flat
interpolation.
At that point, emit_linterp() doesn't do much, so fold it into the
call sites and drop it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
brw_wm_barycentric_interp_mode is wordy, brw_barycentric_mode is less
typing and suffers from fewer line wrapping problems.
The enum values themselves don't really benefit from "WM" in the name,
either. Put "BARYCENTRIC" first instead of at the end and drop "WM".
Generated by:
for file in *.c *.cpp *.h; do sed -i \
-e 's/brw_wm_barycentric_interp_mode/brw_barycentric_mode/g' \
-e 's/BRW_WM_\([A-Z_]*\)_BARYCENTRIC/BRW_BARYCENTRIC_\1/g' \
-e 's/BRW_WM_BARYCENTRIC_INTERP_MODE_COUNT/BRW_BARYCENTRIC_MODE_COUNT/g' \
$file;
done
with a few whitespace changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This consolidates a bunch of hacks in a single place - by setting
the interpolation modes and locations on variables appropriately,
we can simply trust them in the rest of the code. This avoids
having to handle INTERP_QUALIFIER_NONE, gl_Color overrides,
sample-shading overrides, and Gen4-5 centroid-overrides in a bunch
of places.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The only useful thing left was gen6_init_vtable_surface_functions which we
can easily put in brw_wm_surface_state.c.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
THe offset type has special implications that it's intended to be some form
of aligned memory address. These assumptions allow it to handle the case
where there is some alignment requirement on the offset and the bottom bits
are used for other things. However, the offsets in the surface state field
are really just unsigned integers.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
The old calculation, which used view->offset, encorporated buffer->offset
into the size calculation where it doesn't belong. This meant that, if
buffer->offset > buffer->size, you would always get a negative size. This
fixes 170 dEQP-VK.renderpass.attachment.* Vulkan CTS tests on Haswell.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
We can totally do it, we were just only setting up one BLEND_STATE and, now
that the code is unified with gen8, we should be handling it correctly.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This renames BLEND_STATE to BLEND_STATE_ENTRY and adds an new struct
BLEND_STATE which is just an array of 8 BLEND_STATE_ENTRYs. This will make
it much easier to write gen-agnostic blend handling code.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
To support general GL_TEXTURE_BASE_LEVEL we have to copy to a temporary
miptree. However, if a single level is being selected, we can use the
existing miptree and force all the sampling to be from that particular
level.
This avoids a ton of software fallbacks in glGenerateMipmaps(), which uses
base levels in the blit implementation in gallium. Improves "glmark2 -b
terrain" from 2 fps to 3 (perhaps some more precision would be useful?),
and cuts its CPU usage during the benchmarking from ~30% to ~10% (total
CPU time from 8.8s to 7.6s).
The compiler uses the per-stage flags already, so it didn't need this.
vc4_uniforms was using it, so just replace it with both of the stage flags
for now.
If numSamples > 0, we can compute the size of the whole mipmapped texture.
That's the case for glTexStorage(GL_PROXY_TEXTURE_x).
Also, multiply the texture size by numSamples for MSAA textures.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
So that the function can work properly with glTexStorage(), where we know
how many mipmap levels there are. And so we can compute storage for MSAA
textures.
Also, remove the obsolete texture border parameter.
A subsequent patch will update _mesa_test_proxy_teximage() to use these
new parameters.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This avoids a failed assert(img->_BaseFormat != -1) in
init_teximage_fields_ms() because the internalFormat argument is GL_NONE.
This was hit when using glTexStorage() to do a proxy texture test.
Fixes a failure with the updated Piglit tex3d-maxsize test.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixed the remaining redundant SetRenderTargets command emission.
Tested with lightsMark2008, Heaven, mtt piglit, glretrace, conform.
Reviewed-by: Brian Paul <brianp@vmware.com>
Weak doesn't work the same on PE/COFF as on ELF, they are only weak
references. Specifically, since nothing else pulls in the object which
contains pthread_mutexattr_init() (and coming from the C library, that is
the only thing that object contains), means that it ends up as 0
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Commit 1f4869a2 unconditionally requires pthread-stubs. Unfortunately, the
cleverness that pthread-stubs is doesn't work with PE/COFF, and historically
Cygwin doesn't have a pthread-stubs.pc.
Don't require pthread-stubs on Cygwin.
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Cygwin headers are now a bit more correct in handling feature test macros,
so use _GNU_SOURCE when building for Cygwin, as well.
(Notwithstanding f381c27c, we should probably have always been using
_GNU_SOURCE, since asprintf() is used by mesa in places)
Signed-off-by: Yaakov Selkowitz <yselkowi@redhat.com>
Reviewed-by: Jon Turney <jon.turney@dronecode.org.uk>
This reverts commit 091f1da902 .
Although a user may specify a specfic tiling bit, ISL should still
prevent incompatible tiling/surface combinations.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
In the next patch, ISL will unconditionally perform verification of a
surface's tiling and usage. Since it will require that w-tiled images
be stencil buffers, create a stencil surface to copy from a
w-tiled/stencil surface.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If an internal user creates an image with Vulkan tiling VK_IMAGE_TILING_OPTIMAL
and an ISL tiling that isn't set, ISL will fail to create the image as
anv_image_create_info::isl_tiling_flags will be an invalid value.
Correct this by making anv_image_create_info::isl_tiling_flags an opt-in,
filtering bitmask, that allows the caller to specify which ISL tilings are
acceptable, but not contradictory to the Vulkan tiling.
Opt-out of filtering for vkCreateImage.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Set limits that are consistent with ISL's assertions in
isl_genX(buffer_fill_state_s)() and Anvil's format-DescriptorType
mapping in anv_isl_format_for_descriptor_type().
Fixes the following new crucible tests:
* stress.limits.buffer-update.range.uniform
* stress.limits.buffer-update.range.storage
These tests are in this patch: https://patchwork.freedesktop.org/patch/98726/
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
A ternary is clearer because the range member is assigned one of two values
dependant on one condition.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Section 13.2.3. of the Vulkan spec requires that implementations be able to
bind sparsely-defined Descriptor Sets without any errors or exceptions.
When binding a descriptor set that contains a dynamic buffer binding/descriptor,
the driver attempts to dereference the descriptor's buffer_view field if it is
non-NULL. It currently segfaults on undefined descriptors as this field is never
zero-initialized. Zero undefined descriptors to avoid segfaulting. This
solution was suggested by Jason Ekstrand.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96850
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
in svga_init_shader_key_common(). Since the CSO module only tracks
sampler views for fragment shaders, the number of samplers and sampler
views can be mismatched for other types of shaders. This situation
triggered an assertion in Chrome with maps.google.com
This patch adds defensive code to handle that situation.
Fixes VMware bug 1694027
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The uninitialized list should be checked and returned.
Thank Julien for the notification and suggested fix.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This is way better than the stupid string approach especially since you
could overflow the string. Again, I thought I had something better at one
point but it obviously got lost.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
It was returning true if the function types have different lengths rather
than false. This was new with the SPIR-V to NIR pass and I thought I'd
fixed it a while ago but it may have gotten lost in rebasing somewhere.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Assign previously hardcoded values for OMX to newly defined
structure. As a result, OMX behaviour will not change at all.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Allow to specify more parameters in the encoding interface
which previously just hardcoded in the encoder
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
If a block might be entered from multiple locations, then the uniform
stream will (probably) be at different points, and we need to make sure
that it's pointing where we expect it to be. The kernel also enforces
that any block reading a uniform resets uniforms, to prevent reading
outside of the uniform stream by using looping.
With control flow, we can't be sure that we'll see the uses of a variable
before its def as we walk backwards. Given that NIR is eliminating our
long chains of dead code, a simple solution for now seems fine.
This slightly changes the order of some optimizations, and so an opt_vpm
happens before opt_dce, causing 3 dead MOVs to be turned into dead FMAXes
in Minecraft:
instructions in affected programs: 52 -> 54 (3.85%)
Previously, we could assume that a MOV from a temp was always an available
copy, because all temps were SSA in NIR, and their non-SSA state in QIR
was just due to the fact that they were from a bcsel or pack_unorm_4x8, so
we could use the current value of the temp after that series of QIR
instructions to define it.
However, this is no longer the case with control flow. Instead, we track
a new array of MOVs defined within the block that haven't had their source
or dest killed yet, and use that primarily. We fall back to looking
through the QIR defs array to handle across-block MOVs, but now require
that copies from the SSA defs have an SSA src as well.
v2 (Matt):
- Use brw_imm_df() as source argument of DIM instruction.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
According to HSW's PRM, vol02b, the DIM instruction has the following
restriction:
"Restriction : src0 must be immediate. src0 must specify the :f (F, Float)
type encoding but is an immediate 64-bit DF (Double Float) value. dst
must have type DF."
This commit allows to upload the immediate 64-bit DF value to the source
of a DIM instruction even when it is of float type encoding.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2 (Matt):
- Take a DF source argument for the DIM instruction emission
in the visitors.
- Indentation.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When we initially dropped bpb in favor of bs, we accidentally didn't change
this one line properly. This brings it back to what it should be.
Reviewed-by: Chad Versace <chad.versace@intel.com>
A while ago we got rid of the bits-per-block because we thought we didn't
need it. We're about to introduce some very useful 1 and 2-bit formats so
we really should be able to handle them again.
Reviewed-by: Chad Versace <chad.versace@intel.com>
This is based on a very long set of discussions between Chad and myself
about how we should properly represent HiZ and CCS buffers. The end result
of that discussion was that a tiling actually has two different sizes, a
logical size in elements, and a physical size in bytes and rows. This
commit reworks ISL's pitch and size calculations to work in terms of these
two sizes.
Reviewed-by: Chad Versace <chad.versace@intel.com>
We helpfully inserted a PRM quotation about how we need to use
ARRAY_PITCH_SPAN_FULL and then set it to COMPACT. Oops...
Reviewed-by: Chad Versace <chad.versace@intel.com>
The row pitch already specifies the size of a row of elements.
Multiplying by the block height simply causes us to allocate as muc as 12
times more memory than needed for compressed textures.
Reviewed-by: Chad Versace <chad.versace@intel.com>
I have no idea why we were multiplying by 4 before. The offsets we get
from SPIR-V are in bytes and so is nir->num_uniforms so there's no need to
do any adjustment whatsoever.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Not sure if this is the right way to do it, but it seems to work.
v2: make it a no-op on LLVM <= 3.5
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Fix compilation on Cygwin, since 50b22354, by adding c99_alloca.h include,
which should know how to portably make the alloc() prototype available.
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Setup for pixel shader push constants is the same as for other
stages. Note that on gen8+ the if-else branches were identical
and the generation check for packet size redundant.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is partial revert of commit cc2d0e64.
It looks that even though blorp disables a stage the corresponding
3DSTATE_CONSTANT_XS packet is needed to be programmed. Hardware
seems to try to fetch the constants even for disabled stages.
Therefore care needs to be taken that the constant buffer is
set up properly. Blorp will continue to trash it into non-existing
such as before.
It is possible that this could be omitted on SKL where the
constant buffer is considered when the corresponding binding table
settings are changed. Bspec:
"The 3DSTATE_CONSTANT_* command is not committed to the shader
unit until the corresponding (same shader)
3DSTATE_BINDING_TABLE_POINTER_* command is parsed."
However, as CONSTANT_XS packet itself does not seem to stall on its
own, it is safer to emit the packets for SKL also.
Possible alternative to blorp trashing could have been to setup
defaults in the beginning of each batch buffer. However, hardware
doesn't seem to tolerate these packets being programmed multiple
times per primitive. Bspec for IVB:
"It is invalid to execute this command more than once between
3D_PRIMITIVE commands."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96878
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Coverity's analysis is too weak to understand that
r600_init_flushed_depth(_, _, NULL) only returns true when
flushed_depth_texture was assigned a non-NULL value.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
So that we can have gen7 split large writes produced by this lowering pass.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
So far we only used instructions with this size in situations where we
did not operate per-channel and we wanted to ignore the execution mask,
but gen7 fp64 will need to emit code with a width of 4 that needs
normal execution masking.
v2:
- Modify the assert instead of deleting it (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
In fp64 we can produce code like this:
mov(16) vgrf2<2>:UD, vgrf3<2>:UD
That our simd lowering pass would typically split in instructions with a
width of 8, writing to two consecutive registers each. Unfortunately, gen7
hardware has a bug affecting execution masking and as a result, the
second GRF register write won't work properly. Curro verified this:
"The problem is that pre-Gen8 EUs are hardwired to use the QtrCtrl+1
(where QtrCtrl is the 8-bit quarter of the execution mask signals
specified in the instruction control fields) for the second
compressed half of any single-precision instruction (for
double-precision instructions it's hardwired to use NibCtrl+1,
at least on HSW), which means that the EU will apply the wrong
execution controls for the second sequential GRF write if the number
of channels per GRF is not exactly eight in single-precision mode (or
four in double-float mode)."
In practice, this means that we cannot write more than one
consecutive GRF in a single instruction if the number of channels
per GRF is not exactly eight in single-precision mode (or four
in double-float mode).
This patch makes our SIMD lowering pass split this kind of instructions
so that the split versions only write to a single register. In the
example above this means that we split the write in 4 instructions, each
one writing 4 UD elements (width = 4) to a single register.
v2 (Curro):
- Make explicit that the thing about hardwiring NibCtrl+1 for the second
compressed half is known to happen in Haswell and the issue with IVB
might not be exactly the same.
- Assign max_width instead of returning early so that we can handle
multiple restrictions affecting to the same instruction.
- Avoid division by 0 if the instruction does not write any registers.
- Ignore instructions what have WE_all set.
- Use the instruction execution type size instead of the dst type size.
v3 (Curro):
- Move the implementation down so it is not placed in the middle of another
workaround.
- Declare channels_per_grf as const.
- Don't break the loop early if we find a BAD_FILE source.
- Fix the number of channels that the hardware shifts for the second half
of a compressed instruction to be 8 in single precision and 4 in double
precision.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Gen7 hardware does not support double immediates so these need
to be moved in 32-bit chunks to a regular vgrf instead. Instead
of doing this every time we need to create a DF immediate,
create a helper function that does the right thing depending
on the hardware generation.
v2:
- Define setup_imm_df() as an independent function (Curro)
- Create a specific builder to get rid of some instruction field
assignments (Curro).
v3:
- Get devinfo from builder (Kenneth)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, there were occasionally NIR registers in our programs, but
they were always actually used SSA-only. Now that we're trying to support
control flow, we need to actually conditionally move to registers based on
whether channels are active or not.
We're already checking that branch instructions are within the
contents of the shader and the proper PROG_END sequence is present.
The other thing we need in the presence of branching is to verify that
the shader doesn't overflow past the end of the uniforms stream.
To do that, we require that at the start of any basic block reading
uniforms have the following instructions:
load_imm temp, <offset within uniform stream>
add unif_addr, temp, unif
The instructions are generated by userspace, and the kernel verifies
that the load_imm is of the expected offset, and that the add adds it
to a uniform. We track which uniform in the stream that is, and at
draw call time fix up the uniform stream to have the address of the
start of the shader's uniforms for that draw call.
Signed-off-by: Eric Anholt <eric@anholt.net>
This isn't used yet, it's just a first step toward loop validation.
During the main parsing of instructions, we need to know when we hit a new
basic block so that we can reset validated state.
Basically we just treat each block independently. The only inter-block
scheduling I can think of that would be be interesting would be to move
texture result collection to after a short loop/if block that doesn't do
texturing. However, the kernel disallows that as part of its security
validation.
We still decide which uniform to lower based on how many
instructions-that-need-lowering use that uniform, but now we emit a new
temporary uniform load in each of the basic blocks containing an
instruction being lowered.
This commit is best reviewed with diff -b.
We need to apply the peephole pass to each of the blocks in the program.
We don't do dataflow analysis for SF across blocks, but we also don't
generate code that would need us to do so.
The optimization passes and scheduling aren't actually ready for multiple
blocks with control flow yet (as seen by the "cur_block" references in
them instead of iterating over blocks), but this creates the structures
necessary for converting them.
We have the prior list_foreach() all over the code, but I need to move
where instructions live as part of adding support for control flow. Start
by just converting to a helper iterator macro. (The simpler
"qir_for_each_inst()" will be used for the for-each-inst-in-a-block
iterator macro later)
This avoids a bunch of code gen regressions when enabling loops in vc4.
Prior to that, the GLSL that would have generated these optimizable phi
nodes was being lowered to csels between either (undef, a) or (a, a), and
those were being dealt with by nir_opt_undef and nir_opt_algebraic.
The allocation has succeeded by that point, so it needs to be freed.
CovID: 1358929
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
This was appearing in vc4 VS/CS in mupen64, due to vertex attrib lowering
producing some constants that were getting compared.
total instructions in shared programs: 112276 -> 112198 (-0.07%)
instructions in affected programs: 2239 -> 2161 (-3.48%)
total estimated cycles in shared programs: 283102 -> 283038 (-0.02%)
estimated cycles in affected programs: 2365 -> 2301 (-2.71%)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Small api cleanup. Make all api functions call GetContext instead
of locally casting handle. Makes debugging easier by providing a
single point to track context changes.
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
with encode tunneling
The idea of encode tunneling is to use video buffer directly for encoder,
but currently the encoder doesn’t support interlaced surface, the OMX
decoder set progressive surface before on that purpose.
Since now we are polling the driver for interlacing information for
decoder, we got the interlaced as preferred as other APIs(VDPAU, VA-API),
thus breaking the transcode with tunneling.
The solution is when with tunnel detected, re-allocate progressive target
buffers, and then converting the interlaced decoder results to there.
This has been tested with transcode results bit to bit matching as before
with surface from progressive to progressive.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
[ Francisco Jerez: Use validate_build_common for error checking,
simplify control flow slightly and handle additional exception
types. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
header_map was the only definition left in compiler.hpp, move it into
program.hpp which is its only user in clover/core.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
[ Serge Martin: Fix inverted opts and log build ctor args.
Keep the log related to the build. Fix indentation ]
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
This gets rid of the program::build_* query methods and replaces them
with the program::build() method that returns a single data structure
containing all parameters for the last build done on the given target
device (including build logs, options and the binary itself).
[ Serge Martin: Fix inverted opts and log build ctor args ]
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
This partially reverts 7e0180d57d.
Having two different exception subclasses for compilation and linking
makes it more difficult to share or move code between the two
codepaths, because the exact same function under the same error
condition would need to throw one exception or the other depending on
what top-level API is being implemented with it. There is little
benefit anyway because clCompileProgram() and clLinkProgram() can tell
whether they are linking or compiling a program.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
Return an API object from an intrusive reference to a Clover object,
incrementing the reference count of the object.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
[ Serge Martin: disable internalize pass when building a library.
Otherwise some functions may be inlined and removed ]
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
Split the work previously done by compile_program_llvm() into
compile_program() (which simply runs the front-end and serializes the
resulting LLVM IR) and link_program() (which takes care of everything
else down to binary codegen).
[ Serge Martin: allow LLVM IR dump after compilation ]
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
Drop a few include and using directives which are no longer necessary.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
This is the common part of the code used to generate a clover::module
from LLVM bitcode, shared between the native and LLVM paths.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
This switches compile_native() to the C++ API (which the rest of this
file makes use of anyway so there is little benefit from using the C
API), what should get rid of an amount of boilerplate and fix a leak
of the TargetMachine object in the error path.
v2: Additional fixes for LLVM 3.6.
v3: Update for the latest LLVM SVN changes.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
This function was doing three separate things:
- Initializing and releasing the ELF parsing state (the latter can be
better done using RAII).
- Searching for the symbol table in the ELF file.
- Extraction of kernel symbol offsets from the symbol table.
Split each one into a separate function for clarity and clean up the
result slightly.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
Some of these will be useful from a different compilation unit in the
same subtree so put them in a publicly accessible header file.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
Most significant change is debugging flags are now a scoped enum and
all debugging helpers live in the debug namespace.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
Some assorted and mostly trivial clean-ups for the source to bitcode
compilation path.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
This allows simplifying the interface of compile_llvm() because it no
longer needs to read out and return the optimization level and address
space map from the compiler instance. Instead declare the compiler
instance at the top level so that both properties are available
directly.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
This gets rid of most ifdef's from the invocation.cpp code -- Only a
couple of them are left which will be removed differently in the
following commits.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
This ifdef'ed out code was meant to handle compilation into TGSI, but
it doesn't seem likely that it will ever be useful even if the TGSI
back-end is resurrected because the TGSI bitcode can just be plumbed
through in ELF format and dealt with as a regular "native" back-end.
Reviewed-by: Serge Martin <edb+mesa@sigluy.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
Probably a copy-paste from mesa_meta_pbo_GetTexSubImage where tex_image
may apparently be null.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Even when the backend driver does not support ETC formats, we handle the
decoding into an uncompressed backing texture. However as far as core
mesa is concerned, it's an ETC texture and we should return the relevant
ETC mesa format. This condition can get hit when using glTexStorage to
create the texture object.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
The size of the driver constant buffer for each stage should be 2048
and not 512 because it has been increased recently for buffers/images.
While we are at it, do the same change for indirect draws.
This fixes all ARB_shader_draw_parameters tests on GM107.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
This fixes the following piglits:
arb_arrays_of_arrays-basic-imagestore-mixed-const-non-const-uniform-index
arb_arrays_of_arrays-basic-imagestore-mixed-const-non-const-uniform-index2
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
I have seen a hang during application shutdown that could be explained by the
following race condition which this patch fixes:
1. Worker thread enters util_queue_fence_signal, sets fence->signalled = true.
2. Main thread calls util_queue_job_wait, which returns immediately.
3. Main thread deletes the job and fence structures, leaving garbage behind.
4. Worker thread calls pipe_condvar_broadcast, which gets stuck forever because
it is accessing garbage.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
The blitter sets mask == 1, which is fine since it doesn't use smoothing.
Fixes a regression introduced in commit 5bcfbf91.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
This partially reverts commit d41f5396f3.
That untested commit broke the tex-skipped-unit piglit test and the
arbvparray Mesa demo when run with indirect GLX.
state->array_state is used during initialization, so its assignment cannot be
moved to the end of the function.
The backtrace looked like:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff77c7a5c in __glXGetActiveTextureUnit (state=0x6270e0) at indirect_vertex_array.c:1952
1952 return state->array_state->active_texture_unit;
(gdb) bt
0 0x00007ffff77c7a5c in __glXGetActiveTextureUnit (state=0x6270e0) at indirect_vertex_array.c:1952
1 0x00007ffff77cbf62 in get_client_data (gc=0x626f50, cap=34018, data=0x7fffffffd7a0) at single2.c:159
2 0x00007ffff77cce51 in __indirect_glGetIntegerv (val=34018, i=0x7fffffffd830) at single2.c:498
3 0x00007ffff77c4340 in __glXInitVertexArrayState (gc=0x626f50) at indirect_vertex_array.c:193
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There is no draw arrays protocol support for multi-texture coordinate
arrays, so it is implemented by sending batches of immediate mode
commands from emit_element_none in indirect_vertex_array.c. This sends
the target texture unit (which has been previously setup in the
array_state header field), followed by the texture coordinates. But for
GL_DOUBLE coordinates the texture unit must be sent *after* the texture
coordinates. This is documented in the glx protocol description, and can
also be seen in the indirect.c immediate mode commands generated from
gl_API.xml. Sending the target texture unit in the wrong place can crash
the remote X server.
To fix this required some more extensive changes to
indirect_vertex_array.c and indirect_vertex_array_priv.h, in order to
remove the texture unit value out of the array_state "header" field, and
send it separately.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61907
For each indirect context the indirect vertex array state must be initialised
by __glXInitVertexArrayState in indirect_vertex_array.c. As noted in the
routine header it requires that the glx context has been setup prior to the
call, in order to test the server version and extensions.
Currently __glXInitVertexArrayState is called from indirect_bind_context in
indirect_glx.c, as follows:
state = gc->client_state_private;
if (state->array_state == NULL) {
glGetString(GL_EXTENSIONS);
glGetString(GL_VERSION);
__glXInitVertexArrayState(gc);
}
But, the gc context is not yet usable at this stage, so the server queries
fail, and __glXInitVertexArrayState is called without the server version and
extension information it needs. This breaks multi-texturing as
glXInitVertexArrayState doesn't get GL_MAX_TEXTURE_UNITS. It probably also
breaks setup of other arrays: fog, secondary colour, vertex attributes.
To fix this I have moved the call to __glXInitVertexArrayState to the end of
MakeContextCurrent in glxcurrent.c, where the glx context is usable.
Fixes a regression caused by commit 4fbdde889c. Fixes ARB_vertex_program
usage in the arbvparray Mesa demo when run with indirect GLX and also
the tex-skipped-unit piglit test when run with indirect GLX.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61907
It's actually not very clever to claim to support H.264
and then fail to create a decoder.
v2: prefix FW macro with UVD_.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Fixes the one of the piglit arb_copy_image-targets tests for 1D arrays.
Previously, we were applying the 1D array z/face adjustment twice.
Also simplify the copy_region_vgpu10() function. It never has to copy
multiple array layers/slices. The Mesa code for glCopyImageSubData does
the loop over slices/faces.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Currently when we rebind framebuffer resources at the beginning of
the command buffer, we use the color buffer surfaces saved in the context
hw clear state. But the surfaces could be different from the actual
emitted render target surfaces if any of the color buffer surfaces
is also used for shader resource, in that case, we create
a backed surface for the collided render target surface. So to rebind
the framebuffer resources correctly, use the render target surfaces saved
in the context hw draw state.
Tested with Heaven, Lightsmark2008, MTT piglit, glretrace, conform.
Reviewed-by: Brian Paul <brianp@vmware.com>
With this patch, a guest-backed surface will be invalidated
using the SVGA_3D_CMD_INVALIDATE_GB_SURFACE command before
the surface is reused. This fixes the updating dirty image error
from the device when a surface is reused.
v2: Instead of invalidating the surface when it is reused,
send the invalidate command before the surface is put into
the recycle pool.
v3: (1) surface invalidate is a noop operation in Linux winsys, since
surface invalidation is not needed for DMA path.
(2) Instead of invalidating the surface content in
svga_screen_surface_destroy() when a surface is to be destroyed,
it is done in svga_screen_cache_flush() when the surface is
no longer referenced in a command buffer and is ready to
be moved to the unused list. At this point, the surface will
be moved to the invalidate list. When the surface invalidation
is submitted, the surface will be moved to the unused list.
Tested with piglit, glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
If the SVGA3D_DEVCAP_DX_PROVOKING_VERTEX query returns false, never
define rasterizer state objects with provokingVertexLast set. Despite
what the device reports, it may interpret the provokingVertexLast flag
anyway. This fixes an issue when using capability clamping.
Tested with piglit provoking-vertex and glsl-fs-flat-color tests.
VMware bug 1550143.
Reviewed-by: <charmainel@vmware.com>
Since pixel center lies at 0.5, add half_pixel to vtex
before adding offsets to it.
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
We only have to stay single-threaded when debug output must be synchronous.
This yields better parallelism in shader-db runs for me.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Unlike SC, the small primitive filter does not automatically use center
locations in 1xAA mode, so this is needed to avoid artifacts caused by
the small primitive filter discarding triangles that it shouldn't.
As a side effect of how the effective number of samples is now calculated,
this patch also avoids submitting the sample locations for line/poly smoothing
when they're not really needed.
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
GL_OES_texture_float_linear marks R32F, RG32F, RGB32F, and RGBA32F
as texture filterable.
Fixes glGenerateMipmap GL errors when visiting a WebGL demo in Chromium:
http://www.iamnop.com/particles
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This hardware race condition has caused problems several times already
(see "i965: Fix cache pollution race during L3 partitioning set-up.",
"i965: Fix brw_render_cache_set_check_flush's PIPE_CONTROLs." and
"i965: intel_texture_barrier reimplemented"). The problem is that
whenever we attempt to both flush and invalidate multiple caches with
a single pipe control command the flush and invalidation happen in
reverse order, so the contents flushed from the R/W caches aren't
guaranteed to become visible from the invalidated caches after the
PIPE_CONTROL command completes execution if some concurrent rendering
workload happened to pollute any of the invalidated R/O caches in the
short window of time between the invalidation and flush.
This makes sure that brw_emit_pipe_control_flush() has the effect
expected by most callers of making the contents flushed from any R/W
caches visible from the invalidated R/O caches.
Cc: "12.0 11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Review carefully, it sucks to have to keep track of the number of
command packet dwords emitted in the batch epilogue manually. The
MI_REPORT_PERF_COUNT_BATCH_DWORDS calculation was obviously wrong.
Cc: "12.0 11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There were two places in the driver doing a pipe control VF cache
flush, one of them was missing this workaround, move it down into
brw_emit_pipe_control_flush to make sure we don't miss it again.
Cc: "12.0 11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Shouldn't cause any functional changes at this point, but we have
forgotten to apply this workaround several times in the past, make
sure it doesn't happen again.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Fixes a regression induced by commit a0674ce5c4:
When EGL_TEXTURE_FORMAT and EGL_TEXTURE_TARGET were both specified (and
both != EGL_NO_TEXTURE), an error was instantly triggered, before the
other one had even a chance to be checked, which is obviously not the
intended behaviour.
v2: Full commit hash, remove useless variables.
v3: [chadv] Add Fixes footers.
Fixes: piglit "spec/egl 1.4/eglcreatepbuffersurface and then glclear"
Fixes: piglit "spec/egl 1.4/largest possible eglcreatepbuffersurface and then glclear"
Signed-off-by: Guillaume Charifi <guillaume.charifi@sfr.fr>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Remove the two first level `if` as they will always be true, and
flatten the two remaining `if`.
No functional change.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
There are a few places in the code where clearing and reading are done on
incorrect buffers for GLES contexts. See comments for details. This
fixes 75 GLES3 dEQP tests on the surfaceless platform with no regressions.
v2: Corrected unclear comment
v3: Make the change in context.c instead of get.c
v4: Removed whitespace
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Since the function is exported like any other
public api function and put in the header
as if you could link against it, export it also
from shared objects.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
These workarounds are not required for HSW and above so stop
copying them at VS key generation which is called at draw time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were storing arrays in vectors, which was leading to some really bad
spill code for large arrays. allocas instructions are a better fit for
arrays and LLVM optimizations are more geared toward dealing with
allocas instead of vectors.
For arrays that have 16 or less 32-bit elements, we will continue to use
vectors, because this will force LLVM to store them in registers and
use indirect registers, which is usually faster for small arrays.
In the future we should use allocas for all arrays and teach LLVM
how to store allocas in registers.
This fixes the piglit test:
spec/glsl-1.50/execution/geometry/max-input-component
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Atm the actual rule will expand to foo.o which is used for static
libraries only.
Thus the automake manual recommendation [to use OBJEXT] won't help us,
since since we're working with a shared library.
Thus let's 'demote' the file and add it back to BUILT_SOURCES. This will
manage all the complexity for us, at the (existing expense) of working
only with the all, check and install targets.
The crazy (why the issue was hard to spot):
If the dependencies (.deps/*.Plo) are already created one can alter the
anv_device.$(OBJEXT) line and/or nuke it all together. That won't lead
to any warnings/issues, even though the Makefile is regenerated.
Moral of the story:
Always rm -rf top_builddir or don't resolve the dependencies manually
and use BUILT_SOURCES.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96825
Fixes: d7a604c3f7a ("anv: use cache uuid based on the build timestamp.")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
In bc4e0c4 (vbo: Use a bitmask to track the active arrays in vbo_exec*.)
we stopped looping over all the attributes and resetting all slots.
Which exposed an issue in vbo_exec_bind_arrays() for handling GENERIC0
vs. POS.
Split out a helper which can reset a particular slot, so that
vbo_exec_bind_arrays() can re-use it to reset POS.
This fixes an issue with 0ad (and possibly others).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Before, it would happily copy list_head next/prev (ie. pointer to the
*from* list_head), leaving things in a confused state and causing much
mayhem.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Be more consistent with the other u_inlines util_copy_xyz_state()
helpers and support NULL src.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If a depth/stencil texture has no mipmaps, we can always get a layout that is
compatible with DB and TC.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes a rare bug with stencil texturing -- seen on Polaris and Tonga,
though it's basically a function of the memory configuration so could affect
other parts as well.
Fixes piglit "unaligned-blit * stencil downsample" and various
"fbo-depth-array *stencil*" tests.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Note that this has no effect yet. A case where can_sample_z/s can be false
in radeonsi will be added in a later patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is a left-over of when I considered generalizing the separate stencil
support. I do prefer the new name since it emphasizes what flushing vs.
non-flushing means from a functional point-of-view, namely special handling
of the texture format.
v2: adjust r600_init_color_surface as well
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Height should be aligned with 2 macroblocks, thus making safer
for tiled mode
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
It has proven subtle to get it right both from the build side POV (see
commit list below) and builders due to their varying workflows.
Furthermore it does not fully fulfil the reason why it was enforced -
to detect uniqueness between different builds, in order to distinguish
and invalidate Vulkan/GL caches.
With that having a much better solution (previous commit) we can drop
this solution.
This effectively reverts the following commits:
359d9dfec3 ("mesa: automake: add directory prefix for git_sha1.h")
2c424e00c3 ("mesa: automake: ensure that git_sha1.h.tmp has the right
attributes")
b7f7ec7843 ("mesa: automake: distclean git_sha1.h when building OOT")
8229fe68b5 ("automake: get in-tree `make distclean' working again.")
Cc: Timo Aaltonen <tjaalton@debian.org>
Cc: Haixia Shi <hshi@chromium.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Do not rely on the git sha1:
- its current truncated form makes it less unique
- it does not attribute for local (Vulkand or otherwise) changes
Use a timestamp produced at the time of build. It's perfectly unique,
unless someone explicitly thinkers with their system clock. Even then
chances of producing the exact same one are very small, if not zero.
v2: Remove .tmp rule. Its not needed since we want for the header to be
regenerated on each time we call make (Eric).
v3:
- Honour SOURCE_DATE_EPOCH, to make the build reproducible (Michel)
- Replace the generated header with a define, to prevent needless
builds on consecutive `make' and/or `make install' calls. (Dave)
v4:
- Keep the timestamp generation at make time. (Jason)
v5:
- Ensure that file is regenerated on incremental builds.
Cc: Michel Dänzer <michel@daenzer.net>
Cc: Dave Airlie <airlied@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This just stops counting and assigning a storage location for
these uniforms, the count is only used to create the uniform storage.
These uniform types don't use this storage.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Code was inspired from _mesa_update_shader_textures_used
However unlike _mesa_update_shader_textures_used that only check for a single
stage, it will check all stages.
It avoids to loop on all uniforms, only active samplers are checked.
For my use case: high FS frequency switches with few samplers.
Perf event (relative to nouveau_dri.so) goes from 5.01% to 1.68% for
the _mesa_sampler_uniforms_pipeline_are_valid function.
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This reverts commit 27d456cc87.
DOH, what seems right and what is right with fp64 are always
two different things.
This regressed:
spec@arb_gpu_shader_fp64@shader_storage@layout-std140-fp64-mixed-shader
on radeonsi
Reported-by: Michel Dänzer <michel@daenzer.net>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
While we are at it, fix a typo inside the comment which describes
what those constants are for.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
In presence of an indirect image access, the base offset should be
zeroed because the stride will be computed twice. This is a pretty
rare situation but it can happen when tex.r > 0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
When emitting OP_SUB, the sign bit for FADD and FADD32I is not
at the same position. It's at position 45 for FADD but 51 for FADD32I.
This fixes the following piglit test:
tests/spec/arb_fragment_program/fdo30337b.shader_test
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
ALU0 didn't have the _dest variant, and ALU2 didn't unset the def the way
ALU1 did. This should make the ALU[012] macros much clearer, by moving
most of their contents to vc4_qir.c
Now that we're about to start generating control flow in our NIR, we want
this in place. It optimizes things frequently in the CS, when the GL VS
has control flow that doesn't affect the vertex position.
Tiny change on shader-db currently, but it will be important when we start
emitting a lot of SFs from the same variable as part of control flow
support.
total instructions in shared programs: 89463 -> 89430 (-0.04%)
instructions in affected programs: 1522 -> 1489 (-2.17%)
total estimated cycles in shared programs: 250060 -> 250015 (-0.02%)
estimated cycles in affected programs: 8568 -> 8523 (-0.53%)
The DCE pass is going to change significantly to handle control flow,
while we don't really need to change it for the SF handling. We also need
to add some more SF peephole optimization for SF updates generated by
control flow support.
No change on shader-db.
I'm going to add an optimization for redundant SF update removal, which
will just remove the SF and leave us (in many cases) with an instruction
with a NULL destination and no side effects. Rather than teaching that
pass whether the whole instruction can be removed, leave that
responsibility to this pass.
We need to not DCE them even though they don't have a destination in QIR.
We also shouldn't relocate them in vc4_opt_vpm. Neither of these things
happen, but I'm about to make DCE consider instructions with a NULL
destination.
Noticed by code inspection. This hasn't been too big of a deal, because
our cond usages all start out as adder ops, either MOVs or the FTOI for Z
writes. MOVs *can* get converted to mul ops during scheduling, but
apparently we hadn't hit this.
Main shader parts and geometry shaders are compiled asynchronously
by util_queue. si_create_shader_selector doesn't wait and returns.
si_draw_vbo(si_shader_select) waits for completion.
This has the best effect when shaders are compiled at app-loading time.
It doesn't help much for shaders compiled on demand, even though
VS+PS compilation should take as much as time as the bigger one of the two.
If an app creates more shaders, at most 4 threads will be used to compile
them.
Debug output disables this for shader stats to be printed in the correct
order.
(We could go even further and build variants asynchronously too, then emit
draw calls without waiting and emit incomplete shader states, then force IB
chaining to give the compiler more time, then sync the compilation at the IB
flush and patch the IB with correct shader states. This is great for
compilation before draw calls, but there are some difficulties such as
scratch and tess states requiring the compiler output, and an on-disk shader
cache will likely be a much better and simpler solution.)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
to allow multiple shaders to be compiled simultaneously.
ALso, shader-db can again use all 4 cores.
v2: Remove the pipe_mutex_unlock call in the error path.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
The function interface is ready to be used by util_queue.
Also, si_shader_select_with_key can no longer accept si_context.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will be needed after some LLVM changes that haven't landed yet.
v2: - use LLVMIsConstant to fix an LLVM assertion failure.
LLVMSetMetadata doesn't work with constants.
- don't set float metadata as string
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Handle the bc_optimize SGPR bit if both CENTER and CENTROID are enabled.
This should increase the PS launch rate for big primitives with MSAA.
Based on discussion with SPI guys.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This should increase the PS launch rate for shaders using at least 2 pairs
of perspective (i,j) and same for linear.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This makes it possible to skip urb re-configuration if the
subsequent renders agree with the settings.
Also allows blorp to allocate the maximun amount of vs entries
available. Core upload logic already knows how to calculate this.
Helps one synthetic benchmark.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Packet 3DSTATE_CONSTANT_PS is still emitted explicitly as ps stage
itself is enabled and hardware may try to prefetch constants from
the buffer. From the BSpec: 3D Pipeline - Windower -
3DSTATE_PUSH_CONSTANT_ALLOC_PS
"Specifies the size of the PS constant buffer. This value will
determine the amount of data the command stream can pre-fetch
before the buffer is full."
This is not possible on gen6. From the BSpec about 3DSTATE_CONSTANT_PS:
"This packet must be followed by WM_STATE."
Binding table emissions for stages other than PS can be now dropped,
they were only needed for the 3DSTATE_CONSTANT_XS to be effective:
From the BSpec:
"The 3DSTATE_CONSTANT_* command is not committed to the shader unit
until the corresponding (same shader) 3DSTATE_BINDING_TABLE_POINTER_*
command is parsed."
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In preparation for loading as flat vertex input.
v2: Use LOAD_INPUT() macro
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In addition, as these are never used in parallel, add a few
assertions.
v2 (Jason): Skip some complexity by putting them into a union but
pad rectangle grid into a vec4 instead. Also keep the
LOAD_UNIFORM macro.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The image usage specified by the caller of vkCreateSwapchainKHR should be
passed onto the internal image creation. Otherwise the driver might later
crash when the user tries to use the image as a combined sampler even though
the creation was explicitly created with VK_IMAGE_USAGE_TRANSFER_SRC_BIT.
Leaving the previous VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT as this might be
expected even if the swapchain is created without any flag.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96791
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
On gen >= 8 one doesn't provide ending address but number of bytes
available. This is relative to the given offset.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Devices with smaller GMEM size need more tiles. On db410c at 2048x1152,
glmark2 shadow needed ~330 tiles for fullscreen. Lets bump it up to
512. (Maybe with MRT you could end up needing more, but at that point
things are probably going to be painfully slow.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Some games are sloppy.. perhaps because it is defined behavior for DX or
perhaps because nv blob driver defaults things to zero.
So add driconf param to force uninitialized variables to default to zero.
This issue was observed with rust, from steam store. But has surfaced
elsewhere in the past.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For .vert/.frag, now multiple can be specified on the cmdline for
purposes of linking, and the last one specified is the one that is
fed into the ir3 backend (and dumped along the way if --verbose is
specified)
Without this, varyings in frag shaders would appear as undefined.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rather than doing a separate submit at context create, move these cmds
to before first tile, as is done on a3xx/a4xx. Otherwise state can
be overwritten by other contexts.
Signed-off-by: Rob Clark <robdclark@gmail.com>
gcc6 does not like the trick where we point to one entry before the
array start and then start a while with a pre-increment.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
These are all new false positives with gcc6.
In nouveau_compiler.c: gcc6 no longer assumes that passing a pointer
to a variable into a function initialises that variable.
In nv50_ir_from_tgsi.cpp op and mode are not set if there are 0
enabled dst channels, this never happens, but gcc cannot know this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
In order to implement get_work_dim() the driver may need to know the
clEnqueueNDRangeKernel() work_dim parameter, so pass it to the driver.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Add a new WORK_DIM SV type, this is will return the grid dimensions
(1-4) for compute (opencl) kernels.
This is necessary to implement the opencl get_work_dim() function.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
For the case (both src or dst) where we had a texobject, but the
texobject target was not the same that the method target, this spec
paragraph was appplied:
/* Section 18.3.2 (Copying Between Images) of the OpenGL 4.5 Core
* Profile spec says:
*
* "An INVALID_VALUE error is generated if either name does not
* correspond to a valid renderbuffer or texture object according
* to the corresponding target parameter."
*/
But for that case, the correct spec paragraph should be:
/* Section 18.3.2 (Copying Between Images) of the OpenGL 4.5 Core
* Profile spec says:
*
* "An INVALID_ENUM error is generated if either target is
* not RENDERBUFFER or a valid non-proxy texture target;
* is TEXTURE_BUFFER or one of the cubemap face selectors
* described in table 8.18; or if the target does not
* match the type of the object."
*/
specifically the last sentence: "or if the target does not match the
type of the object".
This patch fixes the error returned (s/INVALID/ENUM) for that case,
and moves up the INVALID_VALUE spec paragraph, as that case (invalid
texture object) was handled before.
Fixes:
GL44-CTS.copy_image.target_miss_match
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Immediates are stored into a separate table, and are
consolidated, so if we get an immediate we don't need
to offset it as the index it has is correct.
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Basically we just have to scale up the coordinates and then add the
relevant sample offset. The code to handle this was already largely
present from Christoph's earlier attempts to pipe images through back in
the dark ages, this just hooks it all up.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixes a regression introduced by commit 42624ea83 which triggered
an assertion in
dEQP-GLES2.functional.texture.completeness.cube.not_positive_level_0
While stImage must have a non-zero size as verified by the caller, we also
look at the size of the base image in an attempt to make a better guess at
the level0 size (this is important when the base image size is odd). However,
the base image may have a zero size even when it exists.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96629
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
use bicubic filtering as high quality scaling L1.
v2: fix a typo and add a newline to code
v3: -render the unscaled image on a temporary surface (Christian)
-apply noise reduction and sharpness filter on
unscaled surface
-render the final scaled surface using bicubic
interpolation
v4: support high quality scaling
v5: set dst_area and dst_clip in bicubic filter
v6: set buffer layer before setting dst_area
v6.1: add PIPE_BIND_LINEAR when creating resource
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This is a shader based bicubic interpolater which uses cubic
Hermite spline algorithm.
v2: set dst_area and dst_clip during scaling (Christian)
v3: clear the render target before rendering
v4: intialize offsets while initializing shaders
use a constant buffer to send dst_size to frag shader
small changes to reduce calculation in shader
v5: send half pixel offset instead of sending dst_size
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Fix this build error with GCC 4.4.
CC state_tracker/st_nir_lower_builtin.lo
In file included from state_tracker/st_nir_lower_builtin.c:61:
state_tracker/st_nir.h:34: error: redefinition of typedef ‘nir_shader’
../../src/compiler/nir/nir.h:1830: note: previous declaration of ‘nir_shader’ was here
Suggested-by: Rob Clark <robdclark@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96235
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
silent, flush, incomplete_tex and incomplete_fbo flags were not
documented (see src/mesa/main.debug.c for more info).
FP is not checked anymore.
v2 (Brian Paul):
* MESA_DEBUG accepts a comma-separated list of parameters.
* Clarify how MESA_DEBUG behaves with mesa debug and release builds.
* Updated wording.
v3: Better wording for one paragraph (Brian Paul)
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes:
GL44-CTS.texture_barrier_ARB.same-texel-rw-multipass
On Haswell, Broadwell and Skylake (note that in order to execute that
test, it is needed to override GL and GLSL versions).
On gen6 this test was already working without this change. It keeps
working after it.
This commit replaces the call to brw_emit_mi_flush for gen6+ with two
calls to brw_emit_pipe_control_flush:
* The first one with RENDER_TARGET_FLUSH and CS_STALL set to initiate
a render cache flush after any concurrent rendering completes and
cause the CS to stop parsing commands until the render cache
becomes coherent with memory.
* The second one have TEXTURE_CACHE_INVALIDATE set (and no CS stall)
to clean up any stale data from the sampler caches before rendering
continues.
Didn't touch gen4-5, basically because I don't have a way to test
them.
More info on commits:
0aa4f99f5672473658c5
Thanks to Curro to help to tracking this down, as the root case was a
hw race condition.
v2: use two calls to pipe_control_flush instead of a combination of
gen7_emit_cs_stall_flush and brw_emit_mi_flush calls (Curro)
v3: no need to const cache invalidation (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The output of draw requires a null viewport transform, which the regular
code is ill-equiped to do. Reinstate the original settings in the render
path, and add setting of the viewport clip polygon based on fb
width/height (as that is all taken care of by draw).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This aligns the 4-element color float array to 16 byte boundaries. This
should allow compiler vectorizers to generate better optimizations.
Also fixes broken vectorization generated by Intel compiler.
v2: Fixed indentation and added a lengthy comment explaining the
reason for the alignment.
Cc: <mesa-stable@lists.freedesktop.org>
Reported-by: Tim Rowley <timothy.o.rowley@intel.com>
Tested-by: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
Encapsulate the test for which flags are needed to get a compiler to
support certain features. Along with this, give various options to try
for AVX and AVX2 support. Ideally we want to use specific instruction
set feature flags, like -mavx2 for instance instead of -march=haswell,
but the flags required for certain compilers are different. This
allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c
while the Intel compiler which doesn't support those flags can fall
back to using -march=core-avx2.
This addresses a bug where the Intel compiler will silently ignore the
AVX2 instruction feature flags and then potentially fail to build.
v2: Pass preprocessor-check argument as true-state instead of
false-state for clarity.
v3: Reduce AVX2 define test to just __AVX2__. Additional defines suchas
__FMA__, __BMI2__, and __F16C__ appear to be inconsistently defined
w.r.t thier availability.
v4: Fix C++11 flags being added globally and add more logic to
swr_require_cxx_feature_flags
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Tested-by: Tim Rowley <timothy.o.rowley@Intel.com>
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
The linker deals with atomic counters in terms of uniforms but the
data structure are called after the atomic counters.
Renamed the data structures used in the linker for disambiguation.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Currently the linker uses the uniform count for the total number of
atomic counters. However uniforms don't include the innermost array
dimension in their count, but atomic counters are expected to include
them.
Although the spec doesn't directly state this, it's clear how offsets
will be assigned for arrays.
From OpenGL 4.2 (Core Profile), page 98:
" * Arrays of type atomic_uint are stored in memory by element
order, with array element member zero at the lowest offset. The
difference in offsets between each pair of elements in the
array in basic machine units is referred to as the array
stride, and is constant across the entire array. The stride can
be queried by calling GetIntegerv with a pname of
ATOMIC_COUNTER_- ARRAY_STRIDE after a program is linked."
From that it is clear how arrays of atomic counters will interact with
GL_MAX_ATOMIC_COUNTER_BUFFER_SIZE.
For other kinds of uniforms it's also clear that each entry in an
array counts against the relevant limits.
Hence, although inferred, this is the expected behavior.
Fixes GL44-CTS.arrays_of_arrays_gl.AtomicDeclaration
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
So that we do copies host-side rather than in the guest with map/memcpy.
Tested with piglit arb_copy_buffer-subdata-sync test and new
arb_copy_buffer-intra-buffer-copy test.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
With host-side buffer copies (via SVGA3D_vgpu10_BufferCopy()) we have
to make sure any pending map-write operations are completed before reading
if the buffer is dirty. Otherwise the ReadbackSubResource operation could
get stale data from the host buffer.
This allows the piglit arb_copy_buffer-subdata-sync test to pass when
we start using the SVGA3D_vgpu10_BufferCopy command.
v2: check the sbuf->dirty flag in the outer conditional, per Charmaine.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We previously could do blits with util_resource_copy_region() when doing
'loose' format checking. Also do blits with util_resource_copy_region()
when the blit src/dst formats (not the underlying resources) exactly
match. Needed for GL_ARB_copy_image.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
v2: remove extra svga_define_texture_level() call, per Charmaine.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Do texture->texture copies host-side with this command when possible.
Use the previous software fallback otherwise.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We don't normally support rendering to SNORM surfaces, but with
GL_ARB_copy_image we can copy to them if we treat them as typeless
and use a UNORM surface view.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We previously handled the case of a RGBX sampler view of a RGBA surface.
Add the reverse case too. For GL_ARB_copy_image.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
For GL_ARB_copy_image we may be asked to create an RGBA view of
a RGBX surface. Use an RGBX view format for that case.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We want to be able to copy between different 32-bit, 3-channel surface
formats for GL_ARB_copy_image but since we don't support R32G32B32_FLOAT
for textures (it's not blendable and wouldn't work for render to texture)
we can't support 32-bit, 3-channel integer formats.
The state tracker will choose 4-channel formats instead.
Fixes the piglit arb_copy_image-format test for several cases.
Note: This change may need to be revisited if/when the texture_view exension
is enabled in driver.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This allows us to do copies between different, but compatible, surface
formats such as RGBA8_UNORM, RGBA8_SINT, RGBA8_UINT, etc. for
GL_ARB_copy_image.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Since only the src box can have negative dims for flipping, just
comparing the src/dst box sizes is enough to detect flips.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Pulled out of the util_try_blit_via_copy_region() function. Subsequent
changes build on this.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
SVGA_3D_CMD_DX_GENRATE_MIPMAP & SVGA_3D_CMD_DX_SET_PREDICATION commands
are not presents in fedora 24 kernel module. Because of this
reason application like supertuxkart are not running.
v2: Add few comments and code modifications suggested by Brian P.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Set when the stw_dev object's initialization is completed. We test
for this in the window callback function to avoid potential crashes
on start-up in multi-threaded applications.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Split the old stw_framebuffer_reference() function into two new
functions: stw_framebuffer_reference_locked() which increments
the refcount and stw_framebuffer_release_locked() which decrements
the refcount and destroys the buffer when the count hits zero.
Original patch by Jose. Modified by Brian (clean-ups, lock assertion
checks, etc).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Otherwise we were leaking DC GDI objects and if wglBindTexImageARB()
was called enough we'd eventually hit the GDI limit of 10,000 objects.
Things started failing at that point.
v2: also release DC if we return early, per Charmaine.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This reverts commit 644e015f0b.
PrimitiveMode from the program doesn't always hold a valid value that
is neither of GL_TRIANGLES, GL_QUADS nor GL_ISOLINES when reaching
this code. This caused regressions in the following CTS tests:
GL44-CTS.stencil_texturing.functional
GL44-CTS.shading_language_420pack.binding_images
GL44-CTS.shading_language_420pack.binding_samplers
GL44-CTS.shading_language_420pack.binding_uniform_single_block
GL44-CTS.shading_language_420pack.implicit_conversions
GL44-CTS.shading_language_420pack.initializer_list
GL44-CTS.shading_language_420pack.length_of_vector_and_matrix
GL44-CTS.shading_language_420pack.line_continuation
Hence, we rather take it from the linked shader.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
There are two distinctly different uses of this struct. The first
is to store GL shader objects. The second is to store information
about a shader stage thats been linked.
The two uses actually share few fields and there is clearly confusion
about their use. For example the linked shaders map one to one with
a program so can simply be destroyed along with the program. However
previously we were calling reference counting on the linked shaders.
We were also creating linked shaders with a name even though it
is always 0 and called the driver version of the _mesa_new_shader()
function unnecessarily for GL shader objects.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
_mesa_write_shader_to_file() is only used to print gl shader objects
so Program should never be set as it only gets set for linked shaders.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
This will allow us to split gl_shader into two different structs, one for
shader objects and one for linked shaders.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Rather than passing in gl_shader we now pass in the IR. This will
allow us to later split gl_shader into two structs. One for use
as a linked per stage shader struct and one for use as a GL shader
object.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Khronos recommends that the GLES 3.1 library also be called libGLESv2.
It also requires that functions be statically linkable from that
library.
NOTE: Mesa has supported the EGL_KHR_get_all_proc_addresses extension
since at least Mesa 10.5, so applications targeting Linux should use
eglGetProcAddress to avoid problems running binaries on systems with
older, non-GLES 3.1 libGLESv2 libraries.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Cc: Mike Gorchak <mike.gorchak.qnx@gmail.com>
Reported-by: Mike Gorchak <mike.gorchak.qnx@gmail.com>
Acked-by: Chad Versace <chad.versace@intel.com>
This is unusual. Usually IDs listed on early stages of platform
definition are kept there as reserved for later use.
However these IDs here are not listed anymore in any of steppings
and devices IDs tables for Kabylake on configurations overview
section of BSpec.
So it is better removing them before they become used in any
other future platform.
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
DRI3:
- Only slows clears can enable it for the first frame.
- A good PS/draw ratio can enable it for other frames.
DRI2:
- Only slows clears can enable it for a frame.
- Page-flipped color buffers are unref'd at the end of each frame,
so it can't be enabled in any other way.
- Relying on slow clears is sufficient for our synthetic benchmarks.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
DCC for displayable surfaces is allocated in a separate buffer and is
enabled or disabled based on PS invocations from 2 frames ago (to let
queries go idle) and the number of slow clears from the current frame.
At least an equivalent of 5 fullscreen draws or slow clears must be done
to enable DCC. (PS invocations / (width * height) + num_slow_clears >= 5)
Pipeline statistic queries are always active if a color buffer that can
have separate DCC is bound, even if separate DCC is disabled. That means
the window color buffer is always monitored and DCC is enabled only when
the situation is right.
The tracking of per-texture queries in r600_common_context is quite ugly,
but I don't see a better way.
The first fast clear always enables DCC. DCC decompression can disable it.
A later fast clear can enable it again. Enable/disable typically happens
only once per frame.
The impact is expected to be negligible because games usually don't have
a high level of overdraw. DCC usually activates when too much blending
is happening (smoke rendering) or when testing glClear performance and
CMASK isn't supported (Stoney).
v2: rename stuff, add assertions
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Replace some open-coded ioctls with intel_get_param().
This is just a cleanup. No change in behavior.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Replace the function's __DRIscreen parameter with struct intel_screen.
The callsites feel more natural that way.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
AA lines are not completely correct (see TODO), but everything else
should be.
+ 3 linestipple piglits
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Commit 87d062a940 ("i965: Fix shared local memory size for Gen9+.")
added u_math.h include which broke the Android build:
In file included from external/mesa3d/src/intel/isl/isl_storage_image.c:25:
In file included from external/mesa3d/src/mesa/drivers/dri/i965/brw_compiler.h:29:
external/mesa3d/src/mesa/main/macros.h:35:10: fatal error: 'util/u_math.h' file not found
^
Add the missing include paths for libmesa_isl.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Kenneth Garunke <kenneth@whitecape.org>
With commit fb9fe35, we start using transfer_inline_write
for memcpy of TexSubImage. But SurfaceDMA command does not work
well with texture array. This patch forces direct map when
transfering multiple slices of a texture array.
Fixes piglit regression "texelFetch fs sampler1DArray"
Tested with MTT piglit, glretrace, conform.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Rely on the existence of a second destination when emitting a setcond
flag is dangerous, because this doesn't mean that the flag has been
correctly set. Instead rely on flagsDef like what emitX() does
for flagsSrc.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
Remove 'reg' option that does not actually exist, elaborate more about
'sync' and add the missing options.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The call for vl_video_buffer_adjust_size is with wrong order of
arguments, apparently it will have problem when interlaced false;
The size of OMX result buffer depends on real size of clips, vl buffer
dimension is aligned with 16, so 1080p(1920*1080) video will overflow
the OMX buffer
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
Make pipe_loader_sw_probe_kms take ownership of the passed in fd,
like pipe_loader_drm_probe_fd does.
The only caller is dri_kms_init_screen which passes in a dupped fd,
just like dri2_init_screen passes in a dupped fd to
pipe_loader_drm_probe_fd.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This file still only has my name on the copyright notice even though
most of the code (likely more than 90% of it) was authored by various
contributors -- It doesn't seem right to have the whole file
attributed to myself.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Serge Martin <edb+mesa@sigluy.net>
This was useful when debugging the previous commit's issue.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
emit_urb_writes() contains code to emit an EOT write with no actual
data when there are no output varyings. This makes sense for the VS
and TES stages, where it's called once at the end of the program.
However, in the geometry shader stage, emit_urb_writes() is called once
for every EmitVertex(). We explicitly emit a URB write with EOT set at
the end of the shader, separately from this path. So we'd better not
terminate the thread. This could get us into trouble for shaders which
do EmitVertex() with no varyings followed by SSBO/image/atomic writes.
It also caused us to emit multiple sends with EOT set, which apparently
confuses the register allocator into not using g112-g127 for all but
the first one. This caused EU validation failures in OglGSCloth
shaders in shader-db. (The actual application was fine, but shader-db
thinks there are no outputs because it doesn't understand transform
feedback.)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The only part of an ir_texture which can be an array is the
offsets array in textureGatherOffsets() calls. We don't want
to lower those, because they're required to remain constants.
Fixes textureGatherOffsets with Gallium drivers such as llvmpipe,
which commit ef78df8d3b regressed.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
These need to be passed from the host in caps structure if they
are larger, this fixes a bunch of tests on Intel hw, that I'd
put the limits too high for.
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Call to handle_table_get in vlVaDestroySurfaces can
return NULL on failure.
CID: 1243522
Signed-off-by: Gurkirpal Singh <gurkirpal204@gmail.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
The intent was to continue down the indirect chain, not to call ourselves
with unchanged input arguments. Found by code inspection, and comparison
to copy_prop_alu_src().
We haven't hit this because callers of NIR's copy prop are doing so in
SSA, before indirect variable dereferences have been lowered to registers.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
On at least Kepler hardware, the units differ based on RT format. Emit a
properly scaled value for Z16 depth buffers vs other formats, to help
out st/nine.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
It defaults to true so default behavior doesn't change but it allows you to
do NIR_VALIDATE=false if you don't want validation. Disabling validation
can substantially speed up shader compiles so you frequently want to turn
it off if compiler invariants aren't in question.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
offset_units_unscaled enables proper support
for depth bias for gallium nine. Use it
if available.
Solves issues with some games using depth bias.
For example:
https://github.com/iXit/Mesa-3D/issues/220
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Empirical tests show that the polygon offset
behaviour is entirely determined by the content of
the PA_SU_POLY_OFFSET states, and not by the depth buffer
format bound.
PA_SU_POLY_OFFSET seems to directly set the parameters of
the polygon offset formula, and setting 0 for
PA_SU_POLY_OFFSET_DB_FMT_CNTL (ie setting the unorm depth
bias behaviour with a scale of 2^0 = 1.0f) gives the unscaled
behaviour.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Empirical tests show that the polygon offset
behaviour is entirely determined by the content of
the PA_SU_POLY_OFFSET states, and not by the depth buffer
format bound.
PA_SU_POLY_OFFSET seems to directly set the parameters of
the polygon offset formula, and setting 0 for
PA_SU_POLY_OFFSET_DB_FMT_CNTL (ie setting the unorm depth
bias behaviour with a scale of 2^0 = 1.0f) gives the unscaled
behaviour.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
pa_su_poly_offset_db_fmt_cntl usages were removed in
previous patches.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Emit PA_SU_POLY_OFFSET_DB_FMT_CNTL with the other poly_offset states.
This will be useful to implement
PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED.
v2: Increase the num_dw field for the poly offset atom
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Emit PA_SU_POLY_OFFSET_DB_FMT_CNTL with the other poly_offset states.
This will be useful to implement
PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED.
v2: Increase the num_dw field for the poly offset atom
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Emit PA_SU_POLY_OFFSET_DB_FMT_CNTL with rasterizer poly_offset states.
This will be useful to implement
PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
D3D9 has a different behaviour for depth bias.
For OGL/D3D1X, the depth bias unit is the
minimal resolvable value for the depth buffer,
which depends on the format (and has different
behaviour for float depth buffers).
For D3D9, the depth bias unit is 1.0f.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We recently had a mistake where we emitted SEND instructions with EOT
set, but from g107 rather than g112-g127. Adding validation code should
prevent these sorts of problems from slipping back in.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These only exist post-Sandybridge, and always use send-from-GRF.
So inst->base_mrf will be -1, and we will have already returned 0.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On MRF platforms, we need to set base_mrf to the first MRF value we'd
like to use for the message. On send-from-GRF platforms, we set it to
-1 to indicate that the operation doesn't use MRFs.
As MRF platforms are becoming increasingly a thing of the past, we've
forgotten to bother with this. It makes more sense to set it to -1 by
default, so we don't have to think about it for new code.
I searched the code for every instance of 'mlen =' in brw_fs*cpp, and
it appears that all MRF-based messages correctly program a base_mrf.
Forgetting to set base_mrf = -1 can confuse the register allocator,
causing it to think we have a large fake-MRF region. This ends up
moving the send-with-EOT registers earlier, sometimes even out of
the g112-g127 range, which is illegal. For example, this fixes
illegal sends in Piglit's arb_gpu_shader_fp64-layout-std430-fp64-shader,
which had SSBO messages with mlen > 0 but base_mrf == 0.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The old return type of GLuint was wonky - it should have been bool.
But nothing actually uses the return value anyway, so we can just drop
that and make it a void function.
In theory, it might make sense to ask whether the texture validated
successfully, but just checking intel_obj->mt != NULL works for that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_tex.c is a tiny file containing a single function. It's closely
tied to the validation logic in intel_tex_validate.c, so it makes sense
to put both in the same file.
While we're at it, update the function to our modern style.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Under some circumstances, the driver may choose to return a temporary
surface instead of a pointer to the original. Make sure to pass the
actual view volume to be mapped to the transfer function rather than
adjusting the map pointer after-the-fact.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
- make sure FP32 denormals will stay disabled in LLVM in the future
(the current default is disabled)
- tell LLVM that FP64 denormals are enabled
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
v2: Merge with PIPE_SHADER_CAP_DOUBLES
Add CHIP_HEMLOCK
v3: only set the instruction on EG and CM
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Checking "signalled" is first done without a mutex, then with a mutex.
Also, checking without waiting doesn't lock the mutex. This is racy, but
should be safe.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Will be needed for resolving auxiliary surfaces.
I didn't add anv_render_pass_attachment::stencil_store_op, as the driver
would likely never use it, as stencil surfaces never have auxiliary
surfaces.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Clean up misrepetitions ('if if', 'the the' etc) found throughout the
comments. This has been done manually, after grepping
case-insensitively for duplicate if, is, the, then, do, for, an,
plus a few other typos corrected in fly-by
v2:
* proper commit message and non-joke title;
* replace two 'as is' followed by 'is' to 'as-is'.
v3:
* 'a integer' => 'an integer' and similar (originally spotted by
Jason Ekstrand, I fixed a few other similar ones while at it)
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
We currently use CL_INVOCATION_COUNT for the GL_PRIMITIVES_GENERATED
query, which involves passing all primitives to the clipper. When
rasterizer discard is enabled, we program the clipper in REJECT_ALL
mode, rather than using the SOL stage's "Rendering Disable" feature.
See commit f09b91f782 for an explanation
of why we implement GL_PRIMITIVES_GENERATED this way.
Apparently the SOL stage's "Rendering Disable" feature is a lot faster
than having the clipper reject all primitives. It's safe to use when
no GL_PRIMITIVES_GENERATED query is active, as we don't care about
CL_INVOCATION_COUNT incrementing.
This patch makes us use SO_RENDERING_DISABLE when no query is active,
but continues falling back to the clipper in REJECT_ALL mode when the
queries are enabled. It brings back the perf_debug for the clipper
case (which I removed in commit 1f9445ff57, thinking it wasn't useful).
Improves performance in Gl32GSCloth by 84.8303% +/- 2.07132% (n = 10)
on my Broadwell GT2 laptop.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Constant propagation on arrays doesn't make a lot of sense. If the
array is only accessed with constant indexes, then opt_array_splitting
would split it up. Otherwise, we have variable indexing. If there's
multiple accesses, then constant propagation would end up replicating
the data.
The lower_const_arrays_to_uniforms pass creates uniforms for each
ir_constant with array type that it encounters. This means that it
creates redundant uniforms for each copy of the constant, which means
uploading too much data. It can even mean exceeding the maximum number
of uniform components, causing link failures.
We could try and teach the pass to de-duplicate the data by hashing
constants, but it makes more sense to avoid duplicating it in the first
place. We should promote constant arrays to uniforms, then propagate
the uniform access.
Fixes the TressFX shaders from Tomb Raider, which exceeded the maximum
number of uniform components by a huge margin and failed to link.
On Broadwell:
total instructions in shared programs: 9067702 -> 9068202 (0.01%)
instructions in affected programs: 10335 -> 10835 (4.84%)
helped: 10 (Hoard, Shadow of Mordor, Amnesia: The Dark Descent)
HURT: 20 (Natural Selection 2)
loops in affected programs: 4 -> 0
The hurt programs appear to no longer have a constarray uniform, as
all constants were successfully propagated. Apparently before this
patch, we successfully unrolled a loop containing array access, but
only after promoting constant arrays to uniforms. With this patch,
we unroll it first, so all array access is direct, and the array
is split up, and individual constants are propagated. This seems
better.
Cc: mesa-stable@lists.freedesktop.org
Reported-by: Karol Herbst <nouveau@karolherbst.de>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
There's really no point in looking at ir_dereference_array of a
constant. It also misses cases like:
(assign () (var_ref tmp) (constant (array ...) ...))
No changes in shader-db, but keeps it working after the next commit.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The scalar backend currently doesn't support variable indexing on
temporary arrays, but it does support it on uniform arrays, and
some stages support it for input arrays. Make sure these are
propagated through before exploding indirects into piles of
if-ladders unnecessarily.
On Broadwell, no instruction count change in shader-db.
total cycles in shared programs: 80675652 -> 80674928 (-0.00%)
cycles in affected programs: 649972 -> 649248 (-0.11%)
helped: 386
HURT: 165
This will help avoid code quality regressions in a future commit.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Previously, we failed to split constant arrays. Code such as
int[2] numbers = int[](1, 2);
would generates a whole-array assignment:
(assign () (var_ref numbers)
(constant (array int 4) (constant int 1) (constant int 2)))
opt_array_splitting generally tried to visit ir_dereference_array nodes,
and avoid recursing into the inner ir_dereference_variable. So if it
ever saw a ir_dereference_variable, it assumed this was a whole-array
read and bailed. However, in the above case, there's no array deref,
and we can totally handle it - we just have to "unroll" the assignment,
creating assignments for each element.
This was mitigated by the fact that we constant propagate whole arrays,
so a dereference of a single component would usually get the desired
single value anyway. However, I plan to stop doing that shortly;
early experiments with disabling constant propagation of arrays
revealed this shortcoming.
This patch causes some arrays in Gl32GSCloth's geometry shaders to be
split, which allows other optimizations to eliminate unused GS inputs.
The VS then doesn't have to write them, which eliminates the entire VS
(5 -> 2 instructions). It still renders correctly.
No other change in shader-db.
v2: Drop !AOA check and improve a comment (feedback from Tim Arceri).
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
opt_constant_propagation.cpp contains constant folding code which can
actually do constant propagation in some cases. It was happily
propagating constants into the left-hand-side of assignments.
For example,
(assign () (var_ref temp) (constant ...))
would brilliantly be turned into:
(assign () (constant ...) (constant ....))
This is a bigger hammer than necessary - it prevents propagation
into the left-hand-side altogether. We could certainly do better
someday. Notably, the constant propagation pass itself already
takes this approach - it's just the constant propagation pass's
built-in constant folding code (which actually propagates, too)
that was broken.
No change in shader-db, but prevents regressions after future commits.
It seems plausible that this could be hit today, but I haven't seen it
happen.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Without vertex elements originating directly from vertex fetcher
are not passed to wm-state correctly.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Never be dependent on "draw 0", instead have a bool that makes the draw
dependent on the previous draw or not dependent at all.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Handle SGV stores separate from the stream fetch code.
Because of this change, there is a potential to jit an extra unused store.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Was trying to store an extra uninitialized component.
Only affects component packing, which isn't enabled (yet).
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Earlier MSVC 2013 releases have troubles compiling some of our C99 code,
so make sure we have Update 4 to avoid confusion.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
With commit fb9fe35, we start using transfer_inline_write
for memcpy TexSubImage path, but that triggers a regression with
texture array in the svga driver.
With this patch, the direct map code will update the texture array
correctly.
Fixes VMware bug 1679293.
Tested with MTT piglit, glretrace, conform.
Reviewed-by: Brian Paul <brianp@vmware.com>
Currently with the SetVertexBuffers optimization, we avoid emitting
redundant DXSetVertexBuffers commands. However, these buffers surfaces
will still need to be referenced, otherwise, in the case of linux,
the subsequent surface discard map will map to the existing mob instead
of a new one, causing rendering artifacts.
With this patch, we'll call resource_rebind() to reference the resources
even if we are avoiding the actual set command. This fixes the
rendering artifacts in the window title area running with unity in
Ubuntu 14.04
Tested with piglit, glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
This patch fixes three issues with vertex buffer references:
(1) Instead of copy the vertex buffer resource handles to the hw state
in the context structure, use pipe_resource_reference to properly
reference the vertex buffer resources in the context.
(2) Make sure to unbind those unused vertex buffer resources.
(3) Force to rebind the vertex buffer resources at the first draw of each
command buffer to make sure the vertex buffer resources are paged in.
Reviewed-by: Brian Paul <brianp@vmware.com>
Instead of copy the index buffer resource handle to the hw state in
the context structure, use pipe_resource_reference to properly reference
the index buffer resource in the context.
Reviewed-by: Brian Paul <brianp@vmware.com>
We already store these in gl_shader and gl_program here we
remove it from gl_shader_program and just use the values
from gl_shader.
This will allow us to keep the shader cache restore code as
simple as it can be while making it somewhat clearer where these
values originate from.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We already store this in gl_shader and gl_program here we
remove it from gl_shader_program and just use the values
from gl_shader.
This will allow us to keep the shader cache restore code as
simple as it can be while making it somewhat clearer where these
values originate from.
V2: remove unnecessary NULL check
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral <itoral@igalia.com>
This solves a race condition where we can end up having different stages
stomp on each other because they're all trying to scratch in the same BO
but they have different views of its layout.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
While we're here, we also fixup MEDIA_VFE_STATE and rename the field in
3DSTATE_VS on gen6-7.5 to be consistent with the others.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
The pack header generation scripts can't handle the case where you have
two addresses in the same dword; they just take whatever is the last one.
This meant that the MCS address wasn't properly getting handled. Since we
don't care about append counters, we can just re-arrange the XML for now.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
ISL was being a bit too clever for its own good and lowering the format for
us. This is all well and good *if* we always want to lower it. However,
the GL driver selectively lowers the format depending on whether the
surface is write-only or not.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This field is ignored by the hardware in this case and, on very large 1-D
textures, it can end up being larger than the maximum allowed value.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This matches better what happens on gen8 where the "Tiled Surface" and
"Tile Walke" bits are combined into a single two-bit value. This is also
more consistent with what the GL driver does.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
For depth/stencil 1-D textures on SKL, we want them layed out in the old
format that has been used since gen4. In order for the surface state
fill-out code to handle, this it needs to distinguish based on layout
rather than just dimensionality.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
The docs specify that this only matters for render targets and surfaces
used with typed dataport messages. On some platforms (gen4-6) the Depth
field has more bits than RenderTargetViewExtent so we can have textures
with more levels than we can render to.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
According to the PRM, you can't set SurfaceArray for 3D or buffer textures.
There doesn't seem to be a good reason not to set it when we can. On the
other hand, if we don't set it we can end up getting strange results for
1-layer array textures such as textureSize() returning the wrong results.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
We already set the bit in the few cases where it's required by the docs so
there's no need to set it all the time. This has no noticable perf impact
for Dota 2 on Vulkan with the time demo I have.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This commit switches clear colors to use #if's instead of a C if. This
lets us properly handle SNB where the clear color field doesn't exist.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This moves the #if's around so that halign and valign have different sets
of #if conditions. This also prepares us for SNB because isl_to_gen_halign
is not defined at all on gen6.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
While designated initializers are nice, they also force us to put some
things in the initializer and some things later. Surface state setup is
complicated enough that this really hurts readability in the long run.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Otherwise, we end up with a bogus value in the third component. On gen6-7
where we always use 2D textures, this can cause problems if the
SurfaceArray bit is set in the SURFACE_STATE.
Acked-by: Chad Versace <chad.versace@intel.com>
There's no real reason why we shouldn't set this bit. It does affect how
the sampler operates a bit but since you can have a 2D non-array view of a
2D_ARRAY texture that distinction is very weak. Also, this is what ISL
will do and we would like this change to be isolated from using ISL.
Reviewed-by: Chad Versace <chad.versace@intel.com>
The PRM states that the values put in Width, Height, and Depth should be
various bits from the value size - 1. We seem to have done this wrong
more-or-less from the start.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
Previously, we were incrementing length but not actually putting anything
in the Y coordinate. This meant that 1-D TXF operations had a garbage
array index. If the surface is emitted as 1-D non-array, the coordinate
gets discarded and it works fine. If it happens to be bound as an array
surface, it may count as an out-of-bounds array access and you get zero.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
The RenderTargetViewExtent field of RENDER_SURFACE_STATE is supposed to be
set to the depth of a 3-D texture when rendering. Unfortunatley, that
field is only 9 bits on Sandy Bridge and prior so we can't actually bind
a 3-D texturing for rendering if it has depth > 512. On Ivy Bridge, this
field was bumpped to 11 bits so we can go all the way up to 2048. On Iron
Lake and prior, we don't support layered rendering and we use OffsetX/Y
hacks to render to particular layers so 2048 is ok there too.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
Our previous code worked for desktop GL, and ES without geometry or
tessellation shaders. But those features require fancier point size
handling. Fortunately, we can use one rule for all APIs.
Fixes a number of dEQP tests with EXT_tessellation_shader enabled:
dEQP-GLES31.functional.tessellation_geometry_interaction.point_size.*
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This avoids costly address recomputations, function overhead, and may trigger
large copy optimizations.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There's special logic around finding gl_FragData. It latches onto any
array with FRAG_RESULT_DATA0. However gl_SecondaryFragDataEXT[], added
by GL_EXT_blend_func_extended, fits those parameters as well. The real
frag data array should have index 0 though, so we can use that to
distinguish them.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96617
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ever since c2581a9375, the binding table layout has depended on the
pipeline. This means that whenever we change pipelines we also need to
re-emit binding tables for the new layout.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
SPIR-V treats it as an input but NIR wants the system value. This
shouldn't have been too much of a surprise given that we have to do the
same conversion in the GLSL IR to NIR pass.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This reduces time spend in glGenerateMipmap by a half.
v2: don't decompress the levels to be overwritten
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
for pipe_context::generate_mipmap
first move some of the blit code from util_blitter_blit_generic
to a separate function, then use it from util_blitter_generate_mipmap
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Whenever a draw happens or some other function call might change the result
of future glReadPixels calls, we must invalidate the cache.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
As far as I can tell, a sequence of glBitmap followed by texture functions
that refer to a texture bound as the framebuffer is well within what should
be allowed.
Found by inspection.
Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In the unlikely case that a program uses glBitmap to render to a framebuffer
whose texture is bound in a compute shader.
Found by inspection.
Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is more consistent with what we do elsewhere and will allow
us to only cache one of the values in the shader cache.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cherryview and Broxton don't support DW x DW multiplication. We have
piles of code to handle this, but apparently weren't retyping in the
immediate case.
For example,
tests/spec/arb_tessellation_shader/execution/dvec3-vs-tcs-tes
makes the simulator angry about instructions such as:
mul(8) r18<1>:D r10.0<8;8,1>:D 0x00000003:D
Just retype to W or UW. It should be safe on all platforms.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Just setting builder->exact isn't sufficient because that only applies to
instructions that are built with the builder but instructions created
manually and only inserted using the builder are left alone.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This pass is similar to propagate_invariance in the GLSL compiler. The
real "output" of this pass is that any algebraic operations which are
eventually consumed by an invariant variable get marked as "exact".
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
While mathematically correct, these two optimizations result in an
expression with substantially lower precision than the original. For any
positive finite floating-point value, log2(x) is well-defined and finite.
More precisely, it is in the range [-150, 150] so any sum of logarithms
log2(a) + log2(b) is also well-defined and finite as long as a and b are
both positive and finite. However, if a and b are either very small or
very large, their product may get flushed to infinity or zero causing
log2(a * b) to be nowhere close to log2(a) + log2(b).
This imprecision was causing incorrect rendering in Talos Principal because
part of its HDR rendering process involves doing 8 texture operations,
clamping the result to [0, 65000], taking a dot-product with a constant,
and then taking the log2. This is done 6 or 8 times and summed to produce
the final result which is written to a red texture. In cases where you
have a region of the screen that is very dark, it can end up getting a
result value of -inf which is not what is intended.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96425
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
Apparently, these are deprecated. There's some AutoUpgrade feature which
is supposed to promote these to cmp/select, which apparently doesn't work
with jit code. It is possible it's not actually even meant to work (see
the bug filed against llvm which couldn't provide an answer neither)
but in any case this is meant to be only temporary unless the intrinsics
are really illegal. So, just use the fallback code (which should be cmp/select,
we're actually doing cmp/sext/trunc/select, but in any case llvm 3.9 manages
to optimize this back to pmin/pmax in the end).
This addresses https://llvm.org/bugs/show_bug.cgi?id=28176
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Tested-by: Aaron Watry <awatry@gmail.com>
This makes the check match up what we do on nv50 as well - there's no
point in switching over the push path if everything's in managed
buffers. This can happen when a shader uses a vertex without an enabled
array - we end up passing it a constant attribute.
This also has the effect of "fixing" some flickering in Talos. I have no
idea why. I've stared at the push logic forwards, backwards, and
sideways. By always forcing the push path (which is slow), the
flickering also goes away, but other rendering is still wrong
(specifically draw 383068 as identified in the bug). However by not
switching over to the push path, draw 383068 is correct.
Note that other flickering remains in Talos, like the red/green
walls/floors. This takes care of the shadow flickering though.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90513
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
If we have a loop, instructions before the tex might be added as tex
uses, and those may in fact dominate all other uses of the tex results.
This however doesn't mean that we don't need a texbar after the tex.
Only check if uses dominate each other they are dominated by the tex.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96565
Fixes: 7752bbc44 (gk104/ir: simplify and fool-proof texbar algorithm)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Make sure to pass the requisite information in draws, blits, and clears
that work on the context's draw buffer.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This says how many window rectangles are supported by the
implementation, although it may not exceed PIPE_MAX_WINDOW_RECTANGLES.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Window rectangles apply to all framebuffer operations, either in
inclusive or exclusive mode. They may also be specified as part of a
blit operation.
In exclusive mode, any fragment inside any of the specified rectangles
will be discarded.
In inclusive mode, any fragment outside every rectangle will be
discarded.
The no-op state is to have 0 rectangles in exclusive mode.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is pretty useful for debugging purposes and those should
not be omitted.
Fixes: 517a93b3 ("nvc0: add ARB_shader_draw_parameters support")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
A pipe pointer in the screen allows for access to current device context
in flush_frontbuffer and resource_destroy. This wasn't tracking current
context in multi-context situations.
v2: More caffeine. Corrected compare, removed unnecessary set of
screen-pipe in create_context, and added a few comments.
To match what's done in the automake build.
v2: Use git rev-parse to get a 10-character hash ID
Fix Python imports
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
From the Cherryview's PRM, Volume 7, 3D Media GPGPU Engine, Register Region
Restrictions, page 844:
"When source or destination datatype is 64b or operation is integer DWord
multiply, indirect addressing must not be used."
v2:
- Fix it for Broxton too.
v3:
- Simplify code by using subscript() and not creating a new num_components
variable (Kenneth).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the Cherryview PRM, Volume 7, 3D Media GPGPU Engine,
Register Region Restrictions:
"When source or destination is 64b (...), regioning in Align1
must follow these rules:
1. Source and destination horizontal stride must be aligned to
the same qword.
(...)"
v2:
- Fix it for Broxton too.
v3:
- Remove inst->regs_written change as it is not necessary (Ken)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I recently experimented with performing rasterizer discard in the SOL
unit instead of the clipper, and as far as I can tell, it's basically
the same performance. The clipper comes directly after SOL anyway,
and setting the clipper to REJECT_ALL should be pretty darn cheap.
Keep the perf_debug on Sandybridge, where the GS actually does work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
There are almost no tests in any test suite, but what little I've found
seems to work. Ilia believes everything is in place.
v2: Predicate the enable on ES 3.1 being available (Gen8+) and also
ARB_compute_shader being available (requested by Ilia).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The use of a bitmask makes functions iterating only active
attributes less visible in profiles.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The use of a bitmask makes functions iterating only active
attributes less visible in profiles.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.
The bitmask used here for iteration is a combination
of different enabled masks present for texture units.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Replaces an iterate and test bit in a bitmask loop by a
loop only iterating over the bits set in the bitmask.
The bitmask used here for iteration is a combination
of different enabled masks present for texture units.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Replaces a loop that iterates all lights and test
which of them is enabled by a loop only iterating over
the bits set in the enabled bitmask.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Replaces a loop that iterates all lights and test
which of them is enabled by a loop only iterating over
the bits set in the enabled bitmask.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Replaces loops that iterate all lights and test
which of them is enabled by a loop only iterating over
the bits set in the enabled bitmask.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Replaces a loop that iterates all lights and test
which of them is enabled by a loop only iterating over
the bits set in the enabled bitmask.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Replaces loops that iterate all lights and test
which of them is enabled by a loop only iterating over
the bits set in the enabled bitmask.
v2: Use _mesa_bit_scan{,64} instead of open coding.
v3: Use u_bit_scan{,64} instead of _mesa_bit_scan{,64}.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The aim is to replace the CoordReplace array by
a bitfield. Until all drivers are converted,
establish the bitfield in parallel to the
CoordReplace array.
v2: Fix bitmask logic.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Instead use the internal gl_shader_stage enum everywhere. This
makes things more consistent and gets rid of unnecessary
conversions.
Ideally it would be nice to remove the Type field from gl_shader
altogether but currently it is used to differentiate between
gl_shader and gl_shader_program in the ShaderObjects hash table.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the log file specified by the GALLIUM_LOG_FILE begins with '+', open
the file in append mode. This is useful to log all gallium output for
an entire piglit run, for example.
v2: put GALLIUM_LOG_FILE support inside an #ifdef DEBUG block.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
anv_pipeline_binding::index is a uint8_t, but some code assigned to it
UINT16_MAX.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewd-by: Jason Ekstrand <jason@jlekstrand.net>
The expected stride calculation is completely wrong. It should
ultimately be multiplying cpp and width rather than dividing. The width
also needs to be aligned to the tiling width first before converting to
stride bytes.
The whole stride check here is possibly pointless. Any buffers which
were allocated outside of vc4 may have strides with larger alignment
requirements.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We still need to recompile the passthrough shader when this value
changes, as it also affects the output vertex count. But otherwise,
we can eliminate recompiles on Gen8+.
We probably want to do this for Gen7 as well, but that requires
rewriting the input release code to use a loop, which is a trade-off
I'd need to consider in more detail.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Fixes three GL44-CTS.tessellation_shader subtests:
- max_patch_vertices
- single.max_patch_vertices
- tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn
These use gl_PatchVerticesIn in the TES, but don't link against a
TCS (which would allow the linker to lower it to a constant). We had
no handling for the system value in the backend, so it would just
assert fail.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
i965 has no special hardware for this, so we need to pass this value in
as a uniform (unless the TES is linked against a TCS, in which case the
linker can just replace this with a constant).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Found by -fsanitize=undefined. Note that this should be a harmless issue in
practice because the inst->op check always dominates anyway.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
type_size_vec4_times_4() was introduced as a fix in 8dcf807cb4
however since 3810c1561 we can just use type_size_scalar() and
get the actual number of outputs we need.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We've had some trouble in the past with copying integers around via
float pointers, as the C compiler sometimes uses x87 floating point
registers to load values on 32-bit systems. Passing the
gl_constant_value union should be safer.
To avoid churn, this patch creates a "GLfloat *value" variable so
existing uses can stay the same.
Not observed to fix anything, but I was in the area adding more integer
state vars, and thought it'd be wise.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
We could also do MSAA resolve in a compute shader like Vulkan and remove
these workarounds.
v2: comment the magic numbers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This change enables the creation of pbuffer
surfaces on the surfaceless platform.
v3: Going back to single-buffered pbuffer
plus additional code review changes
Reviewed-by: Chad Versace <chad.versace@intel.com>
The previous assertions required for texture sizes smaller than block_size
that src_box.x + src_box.width still be block size.
(e.g. for a texture with width 3, and src_box.x = 0, src_box.width would
have to be 4 to not assert.)
This caused some assertions with some other state tracker.
It looks though like callers aren't expected to round up widths to block sizes
(for sizes larger than block size the assertion would still have verified it
wouldn't have been rounded up) so we simply shouldn't use a minify which
rounds up to block size.
(No piglit change with llvmpipe.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The gallium contract would be that bind flags must indicate all possible
bindings a resource might get used, but fact is the mesa state tracker does
not set bind flags correctly, and this is more or less unfixable due to GL.
This caused a bug with piglit arb_uniform_buffer_object-rendering-dsa
since 6e6fd911da - the commit is correct,
but it caused us to miss updates to fs UBOs completely, since the
corresponding buffer didn't have the appropriate bind flag set (thus we
wouldn't check if it is indeed currently bound).
See the discussion about this starting here:
https://lists.freedesktop.org/archives/mesa-dev/2016-June/119829.html
So, update the bind flags when we detect such usage.
Note we update this value for now only in places which matter for us - that
is creating sampler/surface view, or binding constant buffer. There's plenty
more places (setting streamout buffers, vertex/index buffers, ...) where
things can be set with the wrong bind flags, but the bind flags there never
matter.
While here also make sure we only set dirty constant bit when it's a fs
constant buffer - totally doesn't matter if it's vs/gs.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Possibly this should move into an fd2 wrapper fxn, similar to the
texture state tracking done for fd3/fd4 (clamp emulation, etc)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The images struct is an uninitialized local variable on the stack. If the
callback returns 0, the struct might not have been updated and so should
be considered uninitialized. Currently the code ignores the return value,
which (depending on stack contents) might end up in reading a non-zero
value from images.image_mask and dereferencing further fields.
Another solution would be to initialize image_mask with 0, but checking
the return value seems more sensible and it is what Gallium is doing.
v2: fix typos in commit message,
fix indentation,
remove unnecessary parentheses and pointer dereference to keep line
length reasonable.
Cc: 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This makes sure that dri_set_tex_buffer2 -> dri_drawable_validate_att
will re-create the front left attachment buffer after the drawable got
invalidated.
Fixes window contents not updating until the window is resized when
using DRI2 PRIME.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Built-in variable "MaxCombinedShaderStorageBlocks" was added to GLSL 4.40
revision 9.
Section "1.2.1 Changes since revision 8 of GLSL version 4.40",
page 3 of the PDF states:
"Bug 11734: Add gl_MaxCombinedShaderOutputResources and mark
gl_MaxCombinedImageUnitsAndFragmentOutputs as deprecated."
Fixes: GL44-CTS.shader_image_load_store.basic-glsl-const
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In order to do zero-copy between two different devices
the memory should not be tiled.
Tested with GStreamer on a laptop that has 2 GPUs:
1- gstvaapidecode:
HW decoding and dmabuf export with nouveau driver on Nvidia GPU.
2- glimagesink:
EGLImage imports dmabuf on Intel GPU.
TEST: DRI_PRIME=1 gst-launch vaapidecodebin ! glimagesink
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This replaces the current bash generator with a python based generator
using mako. It's quite fast and works with both python 2.7 and python
3.5, and should work with 3.3+ and maybe even 3.2.
It produces an almost identical file except for a minor layout changes,
and the addition of a "generated file, do not edit" warning.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The functions are also useful for mesa.
Introduce src/util/bitscan.{h,c}. Move ffs function
implementations from src/mesa/main/imports.{h,c}.
Move bit scan related functions from
src/gallium/auxiliary/util/u_math.h. Merge platform
handling with what is available from within mesa.
v2: Try to fix MSVC compile.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Avoid ASan new-delete-type-mismatch when Function::domTree is created as
DominatorTree in Function::convertToSSA but destroyed only as base
Graph in ~Function.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Pulling DF uniforms from pull constant buffer generates messages like:
send(4) g12<1>DF g12<0,1,0>F
sampler ld SIMD4x2 Surface = 1 Sampler = 0 mlen 1 rlen 1
which produces GPU hangs in Cherryview/Braswell:
"For 64-bit Align1 operation or multiplication of dwords in CHV,
source horizontal stride must be aligned to qword."
This seems to be documented in the Cherryview PRM, Volume 7, Page 843:
"When source or destination datatype is 64b or operation is integer
DWord multiply, regioning in Align1 must follow these rules:
1. Source and Destination horizontal stride must be aligned to the
same qword."
We should set the destination type to UD, D, or F so that
the register stride checker doesn't notice. The destination type of
send messages is basically irrelevant anyway.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Pulling DF inputs from the URB generates messages like:
send(8) g23<1>DF g1<8,8,1>UD
urb 3 SIMD8 read mlen 1 rlen 2 { align1 1Q };
which makes the simulator angry:
"For 64-bit Align1 operation or multiplication of dwords in CHV,
source horizontal stride must be aligned to qword."
This seems to be documented in the Cherryview PRM, Volume 7, Page 823:
"When source or destination datatype is 64b or operation is integer
DWord multiply, regioning in Align1 must follow these rules:
1. Source and Destination horizontal stride must be aligned to the
same qword."
Setting the source horizontal stride to QWord is insane, as it's the
message header containing 8 URB handles in a single 32-bit DWord.
Instead, we should whack the destination type to UD, D, or F so that
the register stride checker doesn't notice. The destination type of
send messages is basically irrelevant anyway.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cherryview/Broxton annoyingly have a minimum number of VS URB entries
of 34, which is not a multiple of 8. When the VS size is less than 9,
the number of VS entries has to be a multiple of 8.
Notably, BLORP programmed the minimum number of VS URB entries (34), with
a size of 1 (less than 9), which is invalid.
It seemed like this could be a problem in the regular URB code as well,
so I went ahead and updated that to be safe.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Change MESA into Mesa in CL_PLATFORM_VERSION and CL_DEVICE_VERSION. For
both, always append git version suffix from git_sha1.h.
v5: move semicolon to same line as MESA_GIT_SHA1.
v4: drop #ifdef guards.
v3: add missing include.
v2: change CL_DEVICE_VERSION as well.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
ISTR having suggested this during review of the recent FP64 changes to
the SIMD lowering pass, but it doesn't look like it was taken into
account in the end. Using the fs_reg::component_size helper instead
of this open-coded variant makes sure that the stride is taken into
account correctly. Fixes at least the following piglit tests with
spilling forced on (since otherwise regs_written would be calculated
incorrectly and the spilling code would be rather confused about how
much data needs to be spilled):
spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-shader
spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-mixed-shader
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I haven't found any mention of this in the hardware docs, but
experimentally what seems to be going on is that when the per-thread
scratch slot size is changed between two pipelined draw calls, shader
invocations using the old and new scratch size setting may end up
being executed in parallel, causing their scratch offset calculations
to be based in a different partitioning of the scratch space, which
can cause their thread-local scratch space to overlap leading to
cross-thread scratch corruption.
I've been experimenting with alternative workarounds, like emitting a
PIPE_CONTROL with DC flush and CS stall between draw (or dispatch
compute) calls using different per-thread scratch allocation settings,
or avoiding reuse of the scratch BO if the per-thread scratch
allocation doesn't exactly match the original. Both seem to be as
effective as this workaround, but they have potential performance
implications, while this should be basically for free.
Fixes over 40 failures in our CI system with spilling forced on
(including CTS, dEQP and Piglit failures) on a number of different
platforms from Gen4 to Gen9. The 'glsl-max-varyings' piglit test
seems to be able to reproduce this bug consistently in the vertex
shader on at least Gen4, Gen8 and Gen9 with spilling forced on.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used to find out what per-thread slot size a previously
allocated scratch BO was used with in order to fix a hardware race
condition without introducing additional stalls or memory allocations.
Instead of calling brw_get_scratch_bo() manually from the various
codegen functions, call a new helper function that keeps track of the
per-thread scratch size and conditionally allocates a larger scratch
BO.
v2: Handle BO allocation manually instead of relying on
brw_get_scratch_bo (Ken).
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The bitwise arithmetic trick used in brw_get_scratch_size() to clamp
the scratch allocation to 1KB has the unintended side effect that it
will cause us to allocate 2x the required amount of scratch space if
the original per-thread scratch size happened to be already a power of
two. Instead use the obvious MAX2 idiom to clamp the scratch
allocation to the expected range.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Two dEQP tests expect INVALID_VALUE errors for negative width/height
parameters, but get INVALID_OPERATION because they haven't actually
created a destination image. This is arguably not a bug in Mesa, as
there's no specified ordering of error conditions.
However, it's also really easy to make the tests pass, and there's
no real harm in doing these checks earlier.
Fixes:
dEQP-GLES3.functional.negative_api.texture.texsubimage3d_neg_width_height
dEQP-GLES31.functional.debug.negative_coverage.get_error.texture.texsubimage3d_neg_width_height
v2: Drop redundant check (caught by Anuj Phogat).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
To cope with copies of compressed images which are not multiples of
the block size. Suggested by Jose.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@sroland@vmware.com>
In the Vulkan driver, we have the generation number (a compile time
constant) but not necessarily the brw_device_info struct. I meant
to rework the function to take a generation number instead of a
brw_device_info pointer to accomodate this. But I forgot, and left
it taking a brw_device_info pointer, while making Vulkan pass the
generation number (8, 9, ...) directly. This led to crashes.
Brown paper bag fix for commit 87d062a940.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96504
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Add guards to prevent dereferencing NULL dynamic pipeline state. Asserts
of pCreateInfo members are moved to the earliest points at which they
should not be NULL.
This fixes a segfault seen in the McNopper demo, VKTS_Example09.
v3 (Jason Ekstrand):
- Fix disabled rasterization check
- Revert opaque detection of color attachment usage
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
To reduce confusion, clarify that the state being copied is not dynamic.
This agrees with the Vulkan spec's usage of the term. Various sections
specify that the various pipeline state which have VkDynamicState enums
(e.g. viewport, scissor, etc.) may or may not be dynamic.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
We already check that the address is not "too far", but we should also
clamp the UBO index in order to avoid looking at the wrong place in the
driver cb. This is a pretty rare situation though.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Removes the need to set LIBVA_DRIVER_NAME=gallium for supported targets and is
consistent with vdpau and general gallium drivers.
Note: some versions of libva can detect the gallium name and use the
backend. Although that behaviour seems inconsistent since it only works
for some platforms/backends.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fix warnings like these due to HAVE_LIBDRM being inconsistently defined:
external/libdrm/include/drm/drm.h:839:30: warning: redefinition of typedef 'drm_clip_rect_t' is a C11 feature [-Wtypedef-redefinition]
typedef struct drm_clip_rect drm_clip_rect_t;
HAVE_LIBDRM needs to be set project wide to fix this. This change also
harmlessly links libdrm with everything, but simplifies the makefiles a
bit.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Turn off warnings for -Wpointer-arith, -Wno-missing-field-initializers,
-Wno-initializer-overrides, and -Wno-mismatched-tags. These are all deemed
pointless, on purpose or no plans to fix.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Inline the function into it's only caller. This way it's more obvious
how the classic and gallium drivers (st/mesa) use _mesa_initialize_context.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It has been unused for a long time, plus makes the gallium dri modules
require an extra glapi symbol relative to their classic counterparts.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
The actual code of the function print_table_stats() is guarded
by a ifdef GET_DEBUG, which was not been defined in years.
The last fix in 2013 (7db6b5aa91) indicates that it's rarely
used/tested. Since the issue has gone unnoticed for a whole year
(broken with 2ad4a47547).
Let's remove it for now. We can always revive it at a later stage.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
All of the functions and related data is internal, so there's no point
if using the GL types.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... and associated file(s).
No longer needed since commit 057259655e ("i965: Don't link libmesa or
libdri_test_stubs into tests")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When building from a release tarball (where the generated/built files
are in srcdir) in an OOT fashion we need to have both builddir and
srcdir in the includes list.
Otherwise we'll error out, as the file (header gen_knobs.h in this case)
won't be in the location where we are looking.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
With earlier commit we've handled the `make distclean' out of tree
build, yet we failed to attribute that for in-tree builds the test
condition will return 1. Thus effectively the target will be considered
as "failed".
Fixes: b7f7ec7843 ("mesa: automake: distclean git_sha1.h when building
OOT")
Cc: <mesa-stable@lists.freedesktop.org>
Tested-by: Andy Furniss <adf.lists@gmail.com>
Reported-by: Andy Furniss <adf.lists@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Instead of changing the format on the existing template
which makes error handling not nice and confuses coverity.
CoverityID: 1337953
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
pipe_resource_reference(&res, NULL) will decrement reference counting,
i.e. p_atomic_dec(res->count). But the va surface still has the initial
reference since it has created the resource. So calling vaDestroyImage
on a derived image calls VaDestroyBuffer but the decrementation won't
reach 0. It is just wrong for vlVaDestroyBuffer to rely on the
export_refcount flag. Finally the vaapi intel driver has the same logic.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This change makes sure to remove arrays when checking if type
is a double.
The check for the end of the first slot of a multi-slot double
is also fixed by bumping the check to 4 rather than 3.
Previously we were we not reserving the last component.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since this extension allows more than one varying to share a single
location we can't just count the number of slots a varying takes and
add it to the total.
Instead we now reuse the reserved varyings bitfield to determine how
many slots are reserved for explicit locations instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were programming the number of threads per subslice, when we should
have been programming the total number of threads on the GPU as a whole.
Thanks to Curro and Jordan for helping track this down!
On Skylake GT3e:
- Improves performance in Unreal's Elemental Demo by roughly 1.5-1.7x.
- Improves performance in Synmark's Gl43CSDof by roughly 3.7x.
- Improves performance in Synmark's Gl43GSCloth by roughly 1.18x.
On Broadwell GT2:
- Improves performance in Unreal's Elemental Demo by roughly 1.2-1.5x.
- Improves performance in Synmark's Gl43CSDof by roughly 2.0x.
- Improves performance in Synmark's Gl43GSCloth by 1.47035% +/-
0.255654% (n=25).
On Haswell GT3e:
- Improves performance in Unreal's Elemental Demo (in GL 4.3 mode)
by roughly 1.10x.
- Improves performance in Synmark's Gl43CSDof by roughly 1.18x.
- Decreases performance in Synmark's Gl43CSCloth by -1.99484% +/-
0.432771% (n=64).
On Ivybridge GT2:
- Improves performance in Unreal's Elemental Demo (in GL 4.2 mode)
by roughly 1.03x.
- Improves performance in Synmark's G/43CSDof by roughly 1.25x.
- No change in Synmark's Gl43CSCloth (n=28).
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I don't know that anything actually guarantees this, but if we exceed
the limits, we may end up overflowing and trashing random buffers that
happen to be nearby in the VMA space, leading to rendering corruption,
hangs, or worse.
We should really fix this properly. However, the pitfall has existed
for ages, so for now we should at least detect it.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Most scratch stages use power of two sizes, in kilobytes, where
0 means 1kB. But compute shaders on Haswell have a minimum of 2kB,
and use a representation where 0 = 2kB.
This meant that we were effectively telling the hardware to allocate
each thread twice as much space as we meant to, while simultaneously
not allocating that much space in the buffer, leading to overflows.
Note that the existing code is completely wrong for Ivybridge,
but that will take additional work to sort out, so I've left it
as is for now. A subsequent commit will take care of that.
Together with the previous patches, this fixes rendering corruption
on Synmark's Gl43CSDof on Haswell.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Curro figured this out by investigating the simulator. Apparently
there's also a workaround in the Windows driver. I'm not sure it's
actually documented anywhere.
We were underallocating the scratch buffer by a factor of 128/70.
v2: Rename threads_per_subslice to scratch_ids_per_subslice
(suggested by Jordan Justen).
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We were allocating enough space for the number of threads per subslice,
when we should have been allocating space for the number of threads in
the entire GPU.
Even though we currently run with a reduced thread count (due to a bug),
we might still overflow the scratch buffer because the address
calculation is based on the FFTID, which can depend on exactly which
threads, EUs, and threads are executing. We need to allocate enough
for every possible thread that could run.
Fixes rendering corruption in Synmark's Gl43CSDof on Gen8+.
Earlier platforms need additional bug fixes.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This was disabled due to occasionally incorrect behavior when trying to
upload data. It later became apparent that nvc0 also had a similar but
slightly different issue, which was resolved in commit e50c01d5. This
takes the same logic as nvc0 and applies it to nv50 (which has somewhat
different interfaces).
Unfortunately I did not note down precisely what was broken with UBOs
when removing the support from nv50, but I've tested a bunch of local
traces, and none of them appear to regress. This should hopefully
improve performance when UBOs are used, but this was not directly
verified.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This was fixed in revision 47 of the ARB_dsa spec in Oct 22, 2015. Since
it's horrible to have differing APIs across library versions, we should
attempt to minimize the impact by backporting it as far as possible and
hope no one notices.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
This brings in the fixed glClearNamedFramebufferfi definition, as well
as a lot of GLsizei -> GLsizeiptr changes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
This basically disallows all 8-bit x 3 and 16-bit x 3 formats for
textures and render targets. Some 3-component formats were already
disallowed before. This avoids problems with GL_ARB_copy_image.
v2: the previous version of this patch disallowed all 3-component formats
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Mesa and gallium don't have a complete set of matching 3-component
texture formats. For example, 8-bit sRGB unorm. To fully support
the GL_ARB_copy_image extension we need to have support for all of
these formats: RGB8_UNORM, RGB8_SNORM, RGB8_SRGB, RGB8_UINT, and
RGB8_SINT using the same component order. Since we don't have that,
disable the 3-component formats for now.
v2: Simplify 3-component format check, per Marek.
Also check that target != PIPE_BUFFER.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
1. Try to choose R8G8B8A8 unorm/srgb formats before others in an
effort to try to match component ordering for UINT/SINT/etc.
2. If we can't get a format such as PIPE_FORMAT_A16_UNORM, try
PIPE_FORMAT_R16G16B16A16_UNORM before shallower formats.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
>From OpenGL 4.0 spec, section 4.3.2 "Copying Pixels":
"The pixels corresponding to these buffers are copied from the source
rectangle bounded by the locations (srcX0, srcY 0) and (srcX1, srcY 1)
to the destination rectangle bounded by the locations (dstX0, dstY 0)
and (dstX1, dstY 1). The lower bounds of the rectangle are inclusive,
while the upper bounds are exclusive."
So, the rectangles sharing just an edge shouldn't overlap.
-----------
| |
------- ---
| | |
| | |
------- ---
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
>From OpenGL 4.0 spec, section 4.3.2 "Copying Pixels":
"The pixels corresponding to these buffers are copied from the source
rectangle bounded by the locations (srcX0, srcY 0) and (srcX1, srcY 1)
to the destination rectangle bounded by the locations (dstX0, dstY 0)
and (dstX1, dstY 1). The lower bounds of the rectangle are inclusive,
while the upper bounds are exclusive."
So, the rectangles sharing just an edge shouldn't overlap.
-----------
| |
------- ---
| | |
| | |
------- ---
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This converts to testing for 64-bit types and renames some things
in anticipation of 64-bit integer support.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just makes some generic code that currently emits double
suitable for emitting 64-bit values.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Currently this just doubles, but we'll convert users to this
so making adding 64-bit integers easier.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reworks the #if guards a bit. When Emil originally wrote them, he
just guarded everything. However, part of what anv_entrypoints_gen.py
generates is a hash table for looking up entrypoints based on their name.
This table *cannot* get out of sync between C and python regardless of
preprocessor flags. In order to prevent this, this commit makes us use
void pointers in the dispatch table for those entrypoints which aren't
available. This means that the dispatch table size and entry order is
constant and it should never get out-of-sync with the python.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
When an applications specifies mip levels _before_ setting a mipmap texture
filter, we will initially guess a single texture level. When the second level
image is created, we try to allocate the full texture -- however, we get the
base level size guess wrong if that size is odd. This leads to yet another
re-allocation of the texture later during st_finalize_texture.
Even worse, this re-allocation breaks a (reasonable) assumption made by
st_generate_mipmaps, because the re-allocation in the finalization call will
again allocate a single-level pipe texture (based on the non-mipmap texture
filter!). As a result, mipmap generation fails in interesting ways.
All of this can be avoided by just using the fact that we already know the
size of the base level.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95529
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
At this point, the limits are probably more-or-less correct. If there is
an invalid limit, that's a bug not a FINSHME.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This way the the bind map (which we're caching) is mostly independent of
the pipeline layout. The only coupling remaining is that we pull the array
size of a binding out of the layout. However, that size is also specified
in the shader and should always match so it's not really coupled. This
rendering issues in Dota 2.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Since applications are allowed to specify some set of bindings which need
not be dense they also need not be in order. For most things, this doesn't
matter, but it could result getting the wrong dynamic offsets. This adds a
quick-and-dirty sort to ensure that everything is always in increasing
order of binding index.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
With glx of gstreamer-vaapi, the temporary pixmap for front buffer gets
renewed in each frame, so when we receive a new pixmap, should get a new
front buffer for it.
This also fixes Totem player playback corruption.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
From original commit, the macro "if HAVE_DRI3" was in Makefile.sources,
this file is shared with SCons, SCons is not able to parse this marco,
the SCons build failed. Jose quickly gave two approaches and quick fix
with his second approach, thanks Jose for the solutions and fixes.
This patch is Jose's first approach, and it's more proper, because the
dri3 c file should not be included to build when DRI3 is not enabled.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Besides the old JIT bug, it seems the X86 backend on LLVM 3.3 doesn't
handle llvm.fmuladd and instead it fall backs to a C function. Which in
turn causes a segfault on Windows.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
As suggested by Roland Scheidegger.
Use the same logic as f16c, since fma requires VEX encoding.
But disable FMA on LLVM 3.3 without MCJIT.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This fixes a problem with the CE preamble and restoring only stuff in the
preamble when needed.
To illustrate suppose we have two graphics IB's 1 and 2, which are submitted in
that order. Furthermore suppose IB 1 does not use CE ram, but IB 2 does, and we
have a context switch at the start of IB 1, but not between IB 1 and IB 2.
The old code put the CE RAM loads in the preamble of IB 2. As the preamble of
IB 1 does not have the loads and the preamble of IB 2 does not get executed, the
old values are not load into CE RAM.
Fix this by always restoring the entire CE RAM.
v2: - Just load all descriptor set buffers instead of load and store the entire
CE RAM.
- Leave the ce_ram_dirty tracking in place for the non-preamble case.
v3: - Fixed parameter alignment.
- Rebased to master (Nicolai's descriptor series).
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The reality is that this doesn't matter, because we manually emit the
ARL to the sampler reladdr, and those arguments don't get an extra load
later, so it's effectively just a boolean. However having the types be
wrong is confusing and could trigger very odd bugs should usage change
down the line.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Adding 64-bit integers support was going to make this file worse,
just remove the tabs from it now.
Acked-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
An update in graphics specs has deleted the halign and valign fields
from XY_FAST_COPY_BLT command. See mesa commit 97f0f91.
Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The extension is already advertised in compatibility profile, but
the _mesa_has_compute_shaders only returns true in core profile.
If we advertise it, we should allow it to work.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
v2: only load the clip vertex once
v3: fix clip enable logic, add cullDistance
v4: remove duplicate fields in vs jit key, fix test of clip fixup needed
v5: fix clipdistance linkage for slot!=0,4
v6: support clip+cull; passes most piglit clip (failures understood)
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
GLX documentation states:
glXCreateNewContext can generate the following errors: (...)
GLXBadFBConfig if config is not a valid GLXFBConfig
Function checks if the given config is a valid config and sets proper
error code.
Fixes currently crashing glx-fbconfig-bad Piglit test.
v2: coding style cleanups (Emil, Topi)
use DefaultScreen macro (Emil)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Apply the luma key filter to the YCbCr values during the CSC conversion
in video buffer shader. The initial values of max and min luma are set
to opposite values to disable the filter initially and will be set when
enabling it.
Add extra parmeters min and max luma for the luma key filter in
vl_compositor_set_csc_matrix in va, xvmc. Setting them
to opposite value 1.f and 0.f respectively won't effect the CSC
conversion
v2: -Squash 1,2 and 3 into one patch to avoid breaking build of
other components. (Christian)
-use ureg_swizzle. (Christian)
-change name of the variables. (Christian)
v3: -Squash all patches in one to avoid breaking of build. (Emil)
-wrap functions properly. (Emil)
-use 0.0f and 1.0f instead of 0.f and 1.f respectively. (Emil)
v4: -Divide it in two patches one which introduces the functionality
and assigs dummy values to the changed functions and second which
implements the lumakey filter. (Christian)
-use ureg_scalar instead ureg_swizzle. (Christian)
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
When Kristian implemented GL_TEXTURE_EXTERNAL_OES, he hooked it up for gen8
but not for gen7 or earlier. It all works, we just need to emit the states
for the extra planes.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
When calling virgl_fence_wait() with timeout=0,
virgl_{drm,vtest}_resource_is_busy() is called. However, it returns TRUE
for a busy resource, whereace virgl_fence_wait() should return TRUE for
a completed (non-busy) resource.
This fixes running supertuxkart in a VM (I could not reproduce locally
with vtest though there is a similar fix)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the future int64 support will have the same requirements.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This uses the new types interfaces to check for 64-bit types,
as futureproofing against int64 support.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just some better type safety that I noticed while working
on 64-bit integer support.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just moves to the new interfaces in advance of int64.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just prep work for int64 support, changing
places where 64-bit matters no doubles.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just moves code to the new check in advance of int64 support.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds an inline and type query for if a type is 64-bit.
Fow now this is equivalent to double, but int64 will change
this.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Coverity complains that the computed sizes can lead to negative lengths
passed to memcpy. If that happens we've been handed invalid arguments
anyway, so just bomb out.
The funky "0%s" is because the size string for the variable-length part
of the request is of the form "+ safe_pad() ...", and a unary + would
coerce the result to always be positive, defeating the overflow check.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This was for some old unsupported LLVM version.
Only si_create_context creates the target machine now.
r600g doesn't use this function.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The width0/height0/depth0 on stObj may not have been set at this point.
Observed in a trace that set up levels 2..9 of a 2d texture, and set the base
level to 2, with height 1. This made the guess logic always bail.
Originally investigated by Ilia Mirkin, this patch gets rid of the somewhat
redundant storage of width0/height0/depth0 and makes sure we always compute
pipe texture sizes that are compatible with the base level image of the
GL texture.
Fixes the gl-1.2-texture-base-level piglit test provided by Brian Paul.
v2:
- try to re-use an existing pipe texture when possible
- handle a corner case where the base level is not level 0 and it is of
size 1x1x1
v3:
- ptHeight = ptWidth in cube map 1x1 case (suggested by Brian)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This just stops counting and assigning a storage location for
these uniforms, the count is only used to create the uniform storage.
These uniform types don't use this storage.
Reviewed-by: Dave Airlie <airlied@redhat.com>
We were previously unconditionally doing this for arrays and ubo's, and
ignoring texture/storage/atomic buffers. Instead use the usage history
to determine which atoms need to be revalidated.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
With this change, to enable precise SIN and COS instructions
on Intel hardware, one can put
<option name="precise_trig" value="true"/>
in the proper drirc file.
V2: Make option name more generic
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Stephane Marchesin <stephane.marchesin@gmail.com>
Since DCC is enabled almost everywhere now, it's important not to disable
this fast path.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If first_level > 0 and DCC is disabled for that level, let's skip DCC
reads entirely.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Also add dcc_fast_clear_size for clearing only the necessary subset
of DCC. For no AA, it's equal to the size of the whole DCC level.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We want to keep DCC enabled to save bandwidth. It was a bad idea to disable
it here.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will get more complicated with mipmapped DCC or when DCC is enabled
after allocation.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
R9G9B9E5 is the only uncompressed one hopefully.
This fixes incorrect rendering not discovered (due to a lack of tests)
until DCC mipmapping was enabled.
Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Use rasterizer provoking vertex API.
Fix rasterizer provoking vertex for tristrips and quad list/strips.
v2: make provoking vertex tables static const
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
A texture may be redefined with _NEW_TEXTURE, which might have been
bound to a shader image slot. We have to revalidate the image atoms to
pick up on the new resource.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Sometimes a register source can actually be double- or even quad-wide.
We must make sure that the inserted texbars take that width into
account.
Based on an earlier patch by Samuel Pitoiset.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Reduces CPU load for draw calls that change none or few of the descriptors.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
So that callers outside of si_descriptors.c need to worry less about the
details of descriptor handling.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This mask is irrelevant for the generic descriptor set handling, and having it
outside simplifies subsequent changes slightly.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Now that we emit guards for everything, we can just generate the files and
trust build flags to keep us safe. This should also fix the tarball
problems.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
To avoid blocking other EGL calls, release the display mutex before
we enqueue buffer to android frameworks and re-acquire the mutex
upon return.
v2: moved lock/unlock inside droid_window_enqueue_buffer().
TEST=verify pinch zoom in Photos app no longer causes hangs
Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
In the case of out-of-tree (OOT) builds, in particular when building
from tarball, we'll end up with the file in both srcdir and builddir.
We want the former to remain intact (since we need it on rebuild) while
the latter should be removed otherwise `make distclean' gets angry at
us.
Ideally there'll be a solution that feels a bit less of a hack. Until
then this does the job exactly as expected.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
... when copied from git_sha1.h.
As the latter file can we lacking the write attribute, one should set it
explicitly. Otherwise we'll get a warning/failure at cleanup stage.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Otherwise the build will assume that we've talking about builddir, which
is not the case in the else statement.
Here the file is already generated and is part of the tarball.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
With earlier commit we introduced support for render_node devices, which
was couples with the use of the image loader extension.
As the work was inspired by egl/wayland we (erroneously) added the
extension for the !render_node path as well.
That works for wayland, as the implementations of the DRI2 and IMAGE
loader extensions converge behind the scenes. As that is not yet
the case for Android we shouldn't expose the extension.
Fixes: 34ddef39ce ("egl: android: add dma-buf fd support")
Cc: <mesa-stable@lists.freedesktop.org>
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Tested-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
We don't import textures with DCC now, but soon we will.
v2: if we can't disable DCC for image writes, at least decompress DCC
at bind time
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Could cause issues if you tried to read from an uninitialised pointer.
This just initalises the pointer to null to avoid that being a problem.
Discovered by Coverity.
CID: 1343616
Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We've had a FINISHME here since Eric originally wrote the code in 2011.
This patch implements his suggested approach, which makes us actually
able to copy propagate into the loops, at the unfortunate cost of making
this pass even more expensive.
The shader-db statistics are basically a wash:
No change in instruction counts.
total cycles in shared programs: 78685980 -> 78680730 (-0.01%)
cycles in affected programs: 2102646 -> 2097396 (-0.25%)
helped: 48
HURT: 83
I figured if we're going to do this for one copy propagation pass,
we may as well do it in both.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We've had a FINISHME here since Eric originally wrote the code in 2010.
This patch implements his suggested approach, which makes us actually
able to copy propagate into the loops, at the unfortunate cost of making
this pass even more expensive.
The shader-db statistics are not terribly impressive:
total instructions in shared programs: 9008589 -> 9008613 (0.00%)
instructions in affected programs: 4293 -> 4317 (0.56%)
helped: 0
HURT: 10
total cycles in shared programs: 78550978 -> 78575760 (0.03%)
cycles in affected programs: 655426 -> 680208 (3.78%)
helped: 75
HURT: 88
GAINED: 2
Most of the "regressions" appear to be us successfully copy propagating
uniforms, which i965 is loading as pull constants instead of push, so we
occasionally have two pulls instead of one. That doesn't seem like this
pass's job - it's propagating correctly, and we should be smarter about
pull loads in the backend.
This patch is also useful for a couple of reasons:
1. It can clean up copies created by varying packing (previously, we
couldn't if the uses were inside a loop).
This fixes a bug when interpolateAt*() is used on a packed varying
inside a loop: glsl_to_nir struggles to see through the extra copy
and mistakenly believed the variable was not an input.
2. It will help propagate uniform array access created by
lower_const_array_to_uniforms().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Like floats, we should use the round toward 0 mode instead of the
nearest one (which is the default) for doubles to integers.
This fixes all arb_gpu_shader_fp64 piglits which convert doubles to
integers (16 tests).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
The GL4.5 spec quote seems clear on this:
"The value -1 will be returned by either command if an error occurs,
if name does not identify an active variable on programInterface,
or if name identifies an active variable that does not have a valid
location assigned, as described above."
This fixes:
GL45-CTS.program_interface_query.output-built-in
[airlied: use _mesa_program_resource_location_index as
suggested by Eduardo]
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cast the unsigned semantic index to integer datatype before comparing
to max_generic, otherwise, max_generic which is initialized to -1
will be converted to unsigned int before the comparison, causing a wrong
semantic index to be assigned to a shader output.
Fixes the assert running TurboCAD_gl.trace. (VMware bug 1667265)
Also tested with glretrace, mesa demos pointblast, spriteblast and pointcoord.
v2: use the original max_generic variable but add the (int) cast
to the semantic index, as suggested by Brian.
Reviewed-by: Brian Paul <brianp@vmware.com>
When TGSI debug flag is enabled, print the shader linkage info as well.
Tested with mesa demos with SVGA_DEBUG=tgsi
Reviewed-by: Brian Paul <brianp@vmware.com>
ARB_shader_image_load_store only requires a very fixed list of formats
to be supported, while textures may be in all kinds of formats, like
BGRA which are presently not supported on at least Kepler.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Switches to using truncf in micro_trunc.
Fixes the following piglit tests (for softpipe):
/spec/glsl-1.30/execution/built-in-functions/...
fs-trunc-float
fs-trunc-vec2
fs-trunc-vec3
fs-trunc-vec4
vs-trunc-float
vs-trunc-vec2
vs-trunc-vec3
vs-trunc-vec4
/spec/glsl-1.50/execution/built-in-functions/...
gs-trunc-float
gs-trunc-vec2
gs-trunc-vec3
gs-trunc-vec4
Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
When we are not packing a double input varying, we might need to
read its data in a non-aligned to 64-bit offset, so we read
the wrong data. This is happening when using explicit locations
in varyings because Mesa disables packing varying for that case.
const_index is in 32-bit size units but offset() is multiplying
it by destination type size units. When operating with double
input varyings, const_index value could be not aligned to 64 bits.
To fix it, we load the double vector as if it was a float based vector
with twice the number of components.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Data starts at suboffet 3 in 32-bit units (12 bytes), so it is not
64-bit aligned and the current implementation fails to read the data
properly. Instead, when there is is a double input varying, read it as
vector of floats with twice the number of components.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
From GLSL 4.5 spec, "4.4.2.3 Geometry Outputs".
"all geometry shader output vertex count declarations in a
program must declare the same count."
Fixes:
GL45-CTS.geometry_shader.output.conflicted_output_vertices_max
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Although the glsl_types.h stores this in a bitfield,
we should hide that from everyone else. Hide the cast
in an accessor method and use the enum everywhere.
This makes things a bit nicer in gdb, and improves type
safety.
v2: fix a few pieces of interface I missed that caused some
piglit regressions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
For 3D textures we shouldn't be using NumLayers, we need
to get it from the depth.
This fixes:
GL45-CTS.geometry_shader.layered_framebuffer.clear_call_support
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
With tessellation shaders we can have cases where we have
arrays of anon structs, so make sure we match using without_array().
Fixes:
GL45-CTS.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_in
v2:
test lengths match as well (Ilia)
v3:
descend array lengths to check for matches as well (Ilia)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes a crash in
GL43-CTS.shader_subroutine.subroutines_not_allowed_as_variables_constructors_and_argument_or_return_types
If we can't find the func_name in one of these paths,
we have emitted an earlier error so just return here.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
GL43-CTS.compute_shader.work-group-size does
uniform uint g_uniform[gl_WorkGroupSize.z + 20] = { 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24 };
The initializer triggers the GLSL 4.30/GLES3 tests
for constant sequence subexpressions, so it doesn't
happen unless you are using those, so just return
false as this path is now reachable.
v2: update commit msg with diagnosis
Acked-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
PATH_MAX is apparently not a thing on Windows. Borrow the hack from
pipe_loader.c to try and make this work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This writes linked shader programs to .shader_test files to
$MESA_SHADER_CAPTURE_PATH in the format used by shader-db
(http://cgit.freedesktop.org/mesa/shader-db).
It supports both GLSL shaders and ARB programs. All stages that
are linked together are written in a single .shader_test file.
This eliminates the need for shader-db's split-to-files.py, as Mesa
produces the desired format directly. It's much more reliable than
parsing stdout/stderr, as those may contain extraneous messages, or
simply be closed by the application and unavailable.
We have many similar features already, but this is a bit different:
- MESA_GLSL=dump writes to stdout, not files.
- MESA_GLSL=log writes each stage to separate files (rather than
all linked shaders in one file), at draw time (not link time),
with uniform data and state flag info.
- Tapani's shader replacement mechanism (MESA_SHADER_DUMP_PATH and
MESA_SHADER_READ_PATH) also uses separate files per shader stage,
but allows reading in files to replace an app's shader code.
v2: Dump ARB programs too, not just GLSL.
v3: Don't dump bogus 0.shader_test file.
v4: Add "GL_ARB_separate_shader_objects" to the [require] block.
v5: Print "GLSL 4.00" instead of "GLSL 4.0" in the [require] block.
v6: Don't hardcode /tmp/mesa.
v7: Fix memoization of getenv().
v8: Also print "SSO ENABLED" (suggested by Timothy).
v9: Also handle ES shaders (suggested by Ilia).
v10: Guard against MESA_SHADER_CAPTURE_PATH being too long; add
_mesa_warning calls on error handling (suggested by Ben).
v11: Fix crash when variable is unset introduced in v10.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Images invalidation is a bit weird on Fermi and there is already a hack
which forces invalidating all images when launching a computer shader
to help in fixing 3D<->CP interaction.
However, we need to re-validate images for compute because
nvc0_compute_invalidate_surfaces() will destroy the previous binding.
This is not really good for performance purposes but this might be
improved later.
This fixes the following piglits:
- spec/arb_compute_shader/execution/basic-uniform-access
- spec/arb_compute_shader/execution/mutiple-texture-reading
- spec/arb_compute_shader/execution/multiple-workgroups
- spec/glsl-4.30/execution/built-in-functions/cs-* (207 tests)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This should fix spec@arb_shader_image_load_store@level.
Broken by:
Commit: 95c5bbae66
radeonsi: set some image descriptor fields at bind time
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We would revalidate images when anything was touched at all. Which is
unfortunate, since the state tracker does not use CSO's to reduce the
workload. So instead implement a protocol to ensure that something has
changed before revalidating all the images.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
We would revalidate buffers when anything was touched at all. Which is
unfortunate, since the state tracker does not use CSO's to reduce the
workload. So instead implement a protocol to ensure that something has
changed before revalidating all the SSBOs.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
The fix in:
anv: let anv_entrypoints_gen.py generate proper Wayland/Xcb guards
breaks things if wayland headers aren't installed.
Separate things out properly to avoid that problem.
[airlied: fixed up to put in pre-existing sections].
Reported-by: Arjan van de Ven
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The main impact is that {upload, draw, upload, draw, ..} doesn't flush
framebuffer caches before every upload.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
v3: use PFP_SYNC_ME on EG-CM only when supported by the kernel,
otherwise use MEM_WRITE + WAIT_REG_MEM to emulate that
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
When upscaling you can end up interpolating between the edge pixel and one
past the edge. Using CLAMP_TO_EDGE seems like the most reasonable thing to
do in this case. This fixes two of the new Vulkan CTS tests in
dEQP-VK.api.copy_and_blit.blit_image.*
Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Getting rid of the default case makes the compiler warn if we are missing
cases. While we're here, we also add the one missing case.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
glslang frequently throw bogus decorations into shaders. While we are free
to assert-fail, it's a bit nicer to the application to just warn.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Previously we supported a subset of capabilities and just left a default
case for the others. It's time to stop being lazy and actually audit the
capabilities. This should bring them up-to-date with reality.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This is more accurate than calling
_mesa_active_fragment_shader_has_side_effects because it looks at whether
or not the SSBOs, images, or atomic buffers are actually written rather
than just existing in the program.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
We were using this briefly in the i965 driver to trigger recompiles but we
haven't been using it since we switched to the NIR y-transform lowering
pass.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit c1107cec44.
Apparently the hardware spec text I quoted in the commit message was
outright lying about scalar source math being supported on SNB, the
hardware seems to load 32 contiguous bits of data for each channel
regardless of the regioning mode. Fixes regressions in the following
CTS tests (which we didn't catch early due to CTS being temporarily
disabled in our CI system):
es2-cts.gtf.gl.atan.atan_vec3_frag_xvary
es2-cts.gtf.gl.cos.cos_vec2_frag_xvary
es2-cts.gtf.gl.atan.atan_vec2_frag_xvary
es2-cts.gtf.gl.pow.pow_vec2_frag_xvary_yconsthalf
es2-cts.gtf.gl.cos.cos_float_frag_xvary
es2-cts.gtf.gl.pow.pow_float_frag_xvary_yconsthalf
es2-cts.gtf.gl.atan.atan_vec3_frag_xvaryyvary
es2-cts.gtf.gl.pow.pow_vec3_frag_xvary_yconsthalf
es2-cts.gtf.gl.cos.cos_vec3_frag_xvary
es2-cts.gtf.gl.atan.atan_vec2_frag_xvaryyvary
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96346
Reported-by: Mark Janes <mark.a.janes@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
The conditional mod of these instructions determines the semantics of
the comparison itself (rather than being evaluated based on the result
of the instruction as is usually the case for most other instructions
that allow conditional mods), so it's in general not legal to
propagate a conditional mod into a CMP instruction. This prevents
cmod propagation from (mis)optimizing:
cmp.z.f0 tmp, ...
mov.z.f0 null, tmp
into:
cmp.z.f0 tmp, ...
which gives the negation of the flag result of the original sequence.
I originally noticed this while working on SIMD32 in the scalar
back-end, but the same scenario is likely to be possible in vec4
programs so this commit ports the bugfix with the same name from the
scalar back-end to the vec4 cmod propagation pass.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Give algebraic-opt pass a chance to catch udiv by const power-of-two,
before running lower-idiv pass.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Some optimizations, like converting integer multiply/divide into left/
right shifts, have additional constraints on the search expression.
Like requiring that a variable is a constant power of two. Support
these cases by allowing a fxn name to be appended to the search var
expression (ie. "a#32(is_power_of_two)").
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When a shader image view into a buffer texture can be written to, the buffer's
valid range must be updated, or subsequent transfers may incorrectly skip
synchronization.
This fixes a bug that was exposed in Xephyr by PBO acceleration for glReadPixels,
reported by Michel Dänzer.
Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For ES 3.0 NUM_SAMPLE_COUNTS spec points that some formats will be
always zero. But on ES 3.1 can be different to zero.
The current code is correctly checking exactly against version 3.0,
but the comment only mentions 3.0 spec. It is clearer mentioning both.
v2: better wording on the comment (Ian Romanick)
Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When validating attributes during surface creation we should account
for the default values of texture target and format (EGL_NO_TEXTURE)
since the user is not obligated to explicitly set both via the
attribute list passed to eglCreatePbufferSurface.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
isl library is needed to build i965, libmesa_isl static library is added
to fix related Android building errors.
Any attempt to build libmesa_genxml as phony package module failed to deliver
gen{7,75,8,9}_pack.h generated headers, needed for libmesa_isl_gen{7,75,8,9}
Due to constraints in Android Build System, libmesa_genxml is built as static,
at least one source is needed, so dummy.c is autogenerated for this scope,
libmesa_genxml dependency is declared using LOCAL_WHOLE_STATIC_LIBRARIES,
to avoid building errors due to missing genxml/gen{7,75,8,9}_pack.h headers.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixes the following building error:
target C++: libmesa_glsl <= external/mesa/src/compiler/glsl/glsl_to_nir.cpp
In file included from external/mesa/src/compiler/glsl/glsl_to_nir.h:28:0,
from external/mesa/src/compiler/glsl/glsl_to_nir.cpp:28:
external/mesa/src/compiler/nir/nir.h:42:25: fatal error: nir_opcodes.h: No such file or directory
compilation terminated.
build/core/binary.mk:432: recipe for target 'out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_glsl_intermediates/glsl/glsl_to_nir.o' failed
make: *** [out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_glsl_intermediates/glsl/glsl_to_nir.o] Error 1
make: *** Waiting for unfinished jobs....
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Including the file in both ISL_FILES and ISL_GENERATED_FILES makes
the actual dependency list less obvious.
v2: Drop unrelated vulkan hunk (Jason).
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With earlier commit 3689ef32af ("automake: rework the git_sha1.h rule,
include in tarball") we/I erroneously removed the PHONY rule and the
temporary file.
The former is used to ensure that the header is regenerated when on each
make invocation, while the latter helps us avoid the unneeded rebuild(s)
when the SHA1 hasn't changed.
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Instead of just allow copy of a rectangle in svga_transfer_dma_band(),
this patch allows it to copy a box, hence allows copy a 3d texture
in one transfer.
Fixes black screen in running Heaven after commit fb9fe35. (Bug 1663282)
Tested with Heaven, glretrace, piglit.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Coverity doesn't realize idx will never be negative. Throw in some
assert()s to help it out.
(Hopefully assert() isn't getting compiled out for coverity build.. but
there seems to be just one way to find out. We might have to change
these to assume())
Fixes CID 1362442, 1362443
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Maybe we should switch to ureg to build the builtin shaders. But at any
rate, if they fail to compile it is because someone messed them up (or
changed TGSI syntax?).
CID 1362444
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Skipping the temporary allocation and copy instructions is easy (just
return dst), but the conditions used to find out whether the copy can
be optimized out safely without breaking the program are rather
complex: The destination must be exactly one component of at most the
execution width of the lowered instruction, and all source regions of
the instruction must be either fully disjoint from the destination or
be aligned with it group by group.
v2: Don't handle partial source-destination overlap for simplicity
(Jason). No instruction count regressions with respect to v1 in
either shader-db or the few FP64 shader_runner test-cases with
partial overlap I've checked manually.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The specs says INVALID_VALUE for exceeding dimensions,
which is really what is happening here.
This fixes:
GL45-CTS.copy_image.non_existent_mipmap
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This test was only happening for textures, but there is
nothing in the spec to say this, so test it for all cases.
This fixes:
GL45-CTS.copy_image.invalid_target
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Never can happen, since query would not have been created in the first
place if pidx(query_type) return negative. Lets let coverity realize
this.
CID 1362460
Signed-off-by: Rob Clark <robclark@freedesktop.org>
ptr can actually never be null so just drop the check.
CID 1362464 (#1 of 1): Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking ptr suggests that it may be null,
but it has already been dereferenced on all paths leading to the check.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Coverity warns in multiple places about the potential for division by
zero, caused by this function's default case.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
According to the EGL specifications before binding an API
we must check whether it's supported first. If not eglBindAPI
should return EGL_FALSE and generate a EGL_BAD_PARAMETER error.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These two lines have been here since the file was created.
I'm guessing the second one was just for testing during dev, so it's the
one that's going away.
CoverityID: 1296205
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
Check for null pointer before accessing arrays in get/put bits
native/YCbCr/Indexed in VdpOutputSurface and VdpVideoSurface.
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The comment clarifies that the driver is called only to try to get
a preferred internalformat, and that it was already checked if the
format is supported or not.
Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Right now the implementation only checks if the internalformat is
supported or not. But that implementation is wrong, returning
unsupported for some internalformats. Additionally, checking if
the internalformat is supported or not is already done at mesa/main
before calling the driver hook, so this new check is not needed.
Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Among other thigs, fix a gpu hang when using INTEL_DEBUG=shader_time
for any shader.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The old method pushed data for each channels uvec3 data of
gl_LocalInvocationID.
The new method pushes 1 dword of data that is a 'thread local ID'
value. Based on that value, we can generate gl_LocalInvocationIndex
and gl_LocalInvocationID with some calculations.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.
One complication is that cross-thread constants are loaded into
registers before per-thread constants. Previously, our local IDs were
loaded before the uniform data and treated as 'payload' data, even
though they were actually pushed into the registers like the other
uniform data.
Therefore, in this patch we simultaneously enable a newer layout where
each thread now uses a single uniform slot for a unique local ID for
the thread. This uniform is handled specially to make sure it is added
last into the uniform push constant registers. This minimizes our
usage of push constant registers, and maximizes our ability to use
cross-thread constants for registers.
To swap from the old to the new layout, we also need to flip some
lowering pass switches to let our driver handle the lowering instead.
We also no longer force thread_local_id_index to -1.
v4:
* Minimize size of patch that switches from the old local ID layout
to the new layout (Jason)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.
We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.
v4:
* Support the old local ID push constant layout as well (Jason)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.
We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.
v4:
* Support the old local ID push constant layout as well (Jason)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We need information about push constants in a few places for the GL
driver, and another couple places for the vulkan driver.
When we add support for uploading both a common (cross-thread) set of
push constants, combined with the previous per-thread push constant
data, things are going to get even more complicated. To simplify
things, we add push constant info into the cs prog_data struct.
The cross-thread constant support is added as of Haswell. To support
it we need to make sure all push constants with uniform values are
added to earlier registers. The register that varies per thread and
holds the thread invocation's unique local ID needs to be added last.
For now we add the code that would calculate cross-thread constatn
information for hsw+, but we force it (cross_thread_supported) off
until the other parts of the driver support it.
v4:
* Support older local ID push constant layout as well. (Jason)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We add a lowering pass for nir intrinsics. This pass can replace nir
intrinsics with driver specific nir lower code.
We lower the gl_LocalInvocationIndex intrinsic based on a uniform
which is loaded with a thread specific ID.
We also lower the gl_LocalInvocationID based on
gl_LocalInvocationIndex.
v2:
* Create variable during lowering pass. (Ken)
v3:
* Don't create a variable, but instead just insert an intrisic call
to load a uniform from the allocated location. (Jason)
v4:
* Don't run this pass if thread_local_id_index < 0
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This thread ID uniform will be used to compute the
gl_LocalInvocationIndex and gl_LocalInvocationID values.
It is important for this uniform to be added in the last push constant
register. fs_visitor::assign_constant_locations is updated to make
sure this happens.
The reason this is important is that the cross-thread push constant
registers are loaded first, and the per-thread push constant registers
are loaded after that. (Broadwell adds another push constant upload
mechanism which reverses this order, but we are ignoring this for
now.)
v2:
* Add variable in intrinsics lowering pass
* Make sure the ID is pushed last in assign_constant_locations, and
that we save a spot for the ID in the push constants
v3:
* Simplify code based with Jason's suggestions.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v4:
* Force thread_local_id_index to -1 for now, and have
fs_visitor::setup_cs_payload look at thread_local_id_index. This
enables us to more easily cut over from the old local ID layout to
the new layout, as suggested by Jason.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We will chain multiple chunks together and will keep pointers to the older
chunks to support IB dumping.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This avoids allocating giant IBs from the outset, especially for CE and DMA.
Since we now limit max_dw only by the size that the buffer happens to be
(which, due to the buffer cache, can be even larger than the rounded-up size
we request), the new function amdgpu_ib_max_submit_dwords controls when we
submit an IB.
With this change, we effectively never flush prematurely due to the CE IB,
after an initial warm-up phase.
v2:
- clean up buffer_size calculation
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Isolines aren't reversed. commit 5b2d8c2273 fixed this for the vec4
TES backend, but not the scalar one.
Found while debugging GL45-CTS.tessellation_shader.
tessellation_control_to_tessellation_evaluation.gl_tessLevel.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
v2: require PIPE_CAP_SAMPLER_VIEW_TARGET; technically only needed for some of
the texture targets, but all hardware that has shader images should also
have this cap.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For better bisectability given that the order of some of the fallback tests
in the blit path are rearranged.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will be used to select a slice of a 3D texture.
v2: fix a comment (Marek)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For downloads, the fragment shader must know the source texture target, hence
we may cache multiple fragment shaders.
v2: break long line (Marek)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
At the same time, rename members that are upload-specific to say so.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The string "[0]\0" is the same as "[0]" as far as the C string datatype
is concerned. That string has length 3. strncmp(s, length_3_string, 4)
is the same as strcmp(s, length_3_string), so make it be strcmp.
v2: Not the same as strncmp(..., 3). Noticed by Ilia.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This improves throughput by keeping TTM overhead down.
Some piglit tests such as texelFetch and streaming-texture-leak will
use less memory now.
v2: use gart_size / 4 as the threshold
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
mainly the fields that can change by reallocating a texture and changing
the tile mode
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Just for consistency. This doesn't fix anything, because DCC is not
supported with non-mipmapped textures.
v1.1: fix the comment about DCC
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
With the introduction of fp64 and fp16 to nir, there are now a bunch of
float types running around. A F1 2015 shader ends up with an i2f.sat
operation, which has a nir_type_float32 destination. Allow sat on all
the float destination types.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The initial ARB_sampler_objects spec had GL_INVALID_VALUE in it,
however version 8 of it fixed this, and the GL specs also have
the fixed value in them.
Fixes:
GL45-CTS.texture_border_clamp.samplerparameteri_non_gen_sampler_error
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes rendering in Shadow of Mordor with rbc. Application writes
RGBA_UNORM texture filling it with values the application wants to
later on treat as SRGB_ALPHA.
Intel driver enables lossless compression for the buffer by the time
of writing. However, the driver fails to make sure the buffer can be
sampled as something else later on and unfortunately there is
restriction in the hardware for using lossless compression for srgb
formats which looks to extend itself to the sampling engine also.
Requesting srgb to linear conversion on top of compressed buffer
results the color values to be pretty much garbage.
Fortunately none of tracked benchmarks showed a regression with
this.
v2 (Matt): Add missing space
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We weren't setting up several of the uniform values for the patch
header, so we'd crash when uploading push constants. We at least
need to initialize them to zero. We also had the isoline parameters
reversed, so it would also render incorrectly (if it didn't crash).
Fixes a new Piglit test(*) (isoline-no-tcs), as well as crashes in
GL44-CTS.tessellation_shader.single.max_patch_vertices.
(*) https://lists.freedesktop.org/archives/piglit/2016-May/019866.html
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
The driver was adding the skip components but always for buffer 0.
This fixes:
GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_skip_multiple_buffers
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
e2791b38b4
mesa/program_interface_query: fix transform feedback varyings.
caused a regression in
GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_multiple_streams
on radeonsi.
The problem was it was using the skip components varying to set
the stream id, when it should wait until a varying was written,
this just adds the varying checks in the right place.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
According to GL4.5 spec:
An INVALID_OPERATION error is generated if any part of the speci-
fied buffer range is mapped with MapBufferRange or MapBuffer (see sec-
tion 6.3), unless it was mapped with MAP_PERSISTENT_BIT set in the Map-
BufferRange access flags.
So we should use the if range is mapped path.
This fixes:
GL45-CTS.buffer_storage.map_persistent_buffer_sub_data
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "12.0, 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This really only hits for bitsets with a size of a multiple of 32. We
can end up with pos = -1 as a result of the ffs, which we in turn decide
is a valid position (since we fall through the loop and i == 1, we end
up adding 32 to it, so end up returning 31 again).
Up until recently this was largely unreachable, as the register file
sizes were all 63 or 255. However with the advent of compute shaders
which can restrict the number of registers, this can now happen.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This reverts commit aac90ba292.
The commit caused a regression in:
piglit.spec.glsl-1_50.compiler.gs-input-nonarray-named-block.geom
Also the CTS test it was meant to fix seems like it may be bogus.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
I haven't found any evidence that this isn't supported by the
hardware, in fact according to the SNB hardware spec:
"The supported regioning modes for math instructions are align16,
align1 with the following restrictions:
- Scalar source is supported.
[...]
- Source and destination offset must be the same, except the case of
scalar source."
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is the case for SNB math instructions so we need to be careful
and insert the literal value of the immediate into the table (rather
than its absolute value) if the instruction is unable to invert the
sign of the constant on the fly.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Which requires using a bitset instead of a boolean flag to keep track
of the GRFs we've seen a generating instruction for already. The
search loop continues until all instructions initializing the value of
the source VGRF have been found, or it is determined that coalescing
is not possible.
Fixes a few piglit test cases on Gen4-6 which were regressed by
6956015aa5 due to the different (yet
perfectly valid) ordering in which copy instructions are emitted now
by the simd lowering pass, which had the side effect of causing this
optimization pass to start corrupting the program in cases where a
VGRF-to-MRF copy instruction would be eliminated but only the last
instruction writing to the source VGRF region would be rewritten to
point to the target MRF.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will be required to correctly transform the destination of 8-wide
instructions that write a single GRF of a VGRF to MRF copy marked
COMPR4.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will allow compute_to_mrf to handle cases where the source of the
VGRF-to-MRF copy is initialized by more than one instruction. In such
cases we cannot rewrite the destination of any of the generating
instructions until it's known whether the whole VGRF source region can
be coalesced into the destination MRF, which will imply continuing the
search until all generating instructions have been found or it has
been determined that the VGRF and MRF registers cannot be coalesced.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Compute-to-mrf was checking whether the destination of scan_inst is
more than one component (making assumptions about the instruction data
type) in order to find out whether the result is being fully copied
into the MRF destination, which is rather inaccurate in cases where a
single-component instruction is only partially contained in the source
region, or when the execution size of the copy and scan_inst
instructions differ. Instead check whether the destination region of
the instruction is really contained within the bounds of the source
region of the copy.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Compute-to-mrf was being rather heavy-handed about checking whether
instruction source or destination regions interfere with the copy
instruction, which could conceivably lead to program miscompilation.
Fix it by using regions_overlap() instead of the open-coded and
dubiously correct overlap checks.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now there are not files that require python 3, so for now just remove
the python 3 dependency and use python 2. I think the right plan is to
just get all of the python ready for python 3, and then use whatever
python is available.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
By using a counter to quickly reject textures that are not
bound to a framebuffer, the performance impact when binding
sampler_views/images is not too large.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The generated sources should follow the example set by the vulkan
headers and our non-generated code. Namely: the code for all supported
platforms should be available, each one guarded by its respective
VK_USE_PLATFORM_*_KHR macro.
v2: Reword commit message.
Cc: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96285
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1 over IRC)
This parameter is actually a bitmask of PIPE_TRANSFER_x flags.
Change it back to a simple unsigned type. IIRC, some compilers
complain about masks of enum values. Also, this make the function
signature match u_resource_vtbl::transfer_map() again.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The GL spec has been clarified and the new rule says we should just
copy 1 sample. u_blitter does the right thing.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The GL spec has been clarified and the new rule says we should just
copy 1 sample. u_blitter does the right thing.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Coverity is getting a false positive that a division by zero can occur
here. This change will silence the Coverity warnings as a division by zero
cannot occur in this case.
Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
`unsigned j` would never fail `j >= 0`, leading to an infinite loop as
`j--` wraps around.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The CTS test:
GL45-CTS.multi_bind.dispatch_bind_image_textures
binds 192 image uniforms, we reject this later,
but not until after we trash the contents of the
struct gl_shader.
Error now reads:
Too many compute shader image uniforms (192 > 16)
instead of
Too many compute shader image uniforms (2745344416 > 16)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This effectively limits registers to 32 and 64 for fermi and kepler when
1024 threads are used, but allows the full amount to be used with
smaller thread sizes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
- if not exist "%WINFLEXBISON_ARCHIVE%" appveyor DownloadFile "http://downloads.sourceforge.net/project/winflexbison/%WINFLEXBISON_ARCHIVE%"
- if not exist "%WINFLEXBISON_ARCHIVE%" appveyor DownloadFile "https://downloads.sourceforge.net/project/winflexbison/old_versions/%WINFLEXBISON_ARCHIVE%"
- 7z x -y -owinflexbison\ "%WINFLEXBISON_ARCHIVE%" > nul
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=69622">Bug 69622</a> - eglTerminate then eglMakeCurrent crahes</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89599">Bug 89599</a> - symbol 'x86_64_entry_start' is already defined when building with LLVM/clang</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91342">Bug 91342</a> - Very dark textures on some objects in indoors environments in Postal 2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92306">Bug 92306</a> - GL Excess demo renders incorrectly on nv43</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94148">Bug 94148</a> - Framebuffer considered invalid when a draw call is done before glCheckFramebufferStatus</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=96358">Bug 96358</a> - SSO: wrong interface validation between GS and VS (regresion due to latest gles 3.1)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=96381">Bug 96381</a> - Texture artifacts with immutable texture storage and mipmaps</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=96835">Bug 96835</a> - "gallium: Force blend color to 16-byte alignment" crash with "-march=native -O3" causes some 32bit games to crash</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=96850">Bug 96850</a> - Crucible tests fail for 32bit mesa</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=96971">Bug 96971</a> - invariant qualifier is not valid for shader inputs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97039">Bug 97039</a> - The Talos Principle and Serious Sam 3 GPU faults</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97207">Bug 97207</a> - [IVY BRIDGE] Fragment shader discard writing to depth</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97214">Bug 97214</a> - X not running with error "Failed to make EGL context current"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97225">Bug 97225</a> - [i965 on HD4600 Haswell] xcom switch to ingame cinematics cause segmentation fault</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97231">Bug 97231</a> - GL_DEPTH_CLAMP doesn't clamp to the far plane</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97331">Bug 97331</a> - glDrawElementsBaseVertex doesn't work in display list on i915</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97351">Bug 97351</a> - DrawElementsBaseVertex with VBO ignores base vertex on Intel GMA 9xx in some cases</li>
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice (including the next
# paragraph) shall be included in all copies or substantial portions of the
# Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
RADV_WS_AMDGPU_FILES:=\
winsys/amdgpu/radv_amdgpu_bo.c \
winsys/amdgpu/radv_amdgpu_bo.h \
winsys/amdgpu/radv_amdgpu_cs.c \
winsys/amdgpu/radv_amdgpu_cs.h \
winsys/amdgpu/radv_amdgpu_surface.c \
winsys/amdgpu/radv_amdgpu_surface.h \
winsys/amdgpu/radv_amdgpu_winsys.c \
winsys/amdgpu/radv_amdgpu_winsys.h \
winsys/amdgpu/radv_amdgpu_winsys_public.h
VULKAN_FILES:=\
radv_cmd_buffer.c \
radv_cs.h \
radv_device.c \
radv_descriptor_set.c \
radv_descriptor_set.h \
radv_formats.c \
radv_image.c \
radv_meta.c \
radv_meta.h \
radv_meta_blit.c \
radv_meta_blit2d.c \
radv_meta_buffer.c \
radv_meta_bufimage.c \
radv_meta_clear.c \
radv_meta_copy.c \
radv_meta_decompress.c \
radv_meta_fast_clear.c \
radv_meta_resolve.c \
radv_meta_resolve_cs.c \
radv_pass.c \
radv_pipeline.c \
radv_pipeline_cache.c \
radv_private.h \
radv_radeon_winsys.h \
radv_query.c \
radv_util.c \
radv_util.h \
radv_wsi.c \
si_cmd_buffer.c \
vk_format_table.c \
vk_format.h \
$(RADV_WS_AMDGPU_FILES)
VULKAN_WSI_WAYLAND_FILES:=\
radv_wsi_wayland.c
VULKAN_WSI_X11_FILES:=\
radv_wsi_x11.c
VULKAN_GENERATED_FILES:=\
radv_entrypoints.c \
radv_entrypoints.h \
radv_timestamp.h
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.