Change MESA into Mesa in CL_PLATFORM_VERSION and CL_DEVICE_VERSION. For
both, always append git version suffix from git_sha1.h.
v5: move semicolon to same line as MESA_GIT_SHA1.
v4: drop #ifdef guards.
v3: add missing include.
v2: change CL_DEVICE_VERSION as well.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
ISTR having suggested this during review of the recent FP64 changes to
the SIMD lowering pass, but it doesn't look like it was taken into
account in the end. Using the fs_reg::component_size helper instead
of this open-coded variant makes sure that the stride is taken into
account correctly. Fixes at least the following piglit tests with
spilling forced on (since otherwise regs_written would be calculated
incorrectly and the spilling code would be rather confused about how
much data needs to be spilled):
spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-shader
spec.arb_gpu_shader_fp64.shader_storage.layout-std140-fp64-mixed-shader
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I haven't found any mention of this in the hardware docs, but
experimentally what seems to be going on is that when the per-thread
scratch slot size is changed between two pipelined draw calls, shader
invocations using the old and new scratch size setting may end up
being executed in parallel, causing their scratch offset calculations
to be based in a different partitioning of the scratch space, which
can cause their thread-local scratch space to overlap leading to
cross-thread scratch corruption.
I've been experimenting with alternative workarounds, like emitting a
PIPE_CONTROL with DC flush and CS stall between draw (or dispatch
compute) calls using different per-thread scratch allocation settings,
or avoiding reuse of the scratch BO if the per-thread scratch
allocation doesn't exactly match the original. Both seem to be as
effective as this workaround, but they have potential performance
implications, while this should be basically for free.
Fixes over 40 failures in our CI system with spilling forced on
(including CTS, dEQP and Piglit failures) on a number of different
platforms from Gen4 to Gen9. The 'glsl-max-varyings' piglit test
seems to be able to reproduce this bug consistently in the vertex
shader on at least Gen4, Gen8 and Gen9 with spilling forced on.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used to find out what per-thread slot size a previously
allocated scratch BO was used with in order to fix a hardware race
condition without introducing additional stalls or memory allocations.
Instead of calling brw_get_scratch_bo() manually from the various
codegen functions, call a new helper function that keeps track of the
per-thread scratch size and conditionally allocates a larger scratch
BO.
v2: Handle BO allocation manually instead of relying on
brw_get_scratch_bo (Ken).
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The bitwise arithmetic trick used in brw_get_scratch_size() to clamp
the scratch allocation to 1KB has the unintended side effect that it
will cause us to allocate 2x the required amount of scratch space if
the original per-thread scratch size happened to be already a power of
two. Instead use the obvious MAX2 idiom to clamp the scratch
allocation to the expected range.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Two dEQP tests expect INVALID_VALUE errors for negative width/height
parameters, but get INVALID_OPERATION because they haven't actually
created a destination image. This is arguably not a bug in Mesa, as
there's no specified ordering of error conditions.
However, it's also really easy to make the tests pass, and there's
no real harm in doing these checks earlier.
Fixes:
dEQP-GLES3.functional.negative_api.texture.texsubimage3d_neg_width_height
dEQP-GLES31.functional.debug.negative_coverage.get_error.texture.texsubimage3d_neg_width_height
v2: Drop redundant check (caught by Anuj Phogat).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
To cope with copies of compressed images which are not multiples of
the block size. Suggested by Jose.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@sroland@vmware.com>
In the Vulkan driver, we have the generation number (a compile time
constant) but not necessarily the brw_device_info struct. I meant
to rework the function to take a generation number instead of a
brw_device_info pointer to accomodate this. But I forgot, and left
it taking a brw_device_info pointer, while making Vulkan pass the
generation number (8, 9, ...) directly. This led to crashes.
Brown paper bag fix for commit 87d062a940.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96504
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Add guards to prevent dereferencing NULL dynamic pipeline state. Asserts
of pCreateInfo members are moved to the earliest points at which they
should not be NULL.
This fixes a segfault seen in the McNopper demo, VKTS_Example09.
v3 (Jason Ekstrand):
- Fix disabled rasterization check
- Revert opaque detection of color attachment usage
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
To reduce confusion, clarify that the state being copied is not dynamic.
This agrees with the Vulkan spec's usage of the term. Various sections
specify that the various pipeline state which have VkDynamicState enums
(e.g. viewport, scissor, etc.) may or may not be dynamic.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
We already check that the address is not "too far", but we should also
clamp the UBO index in order to avoid looking at the wrong place in the
driver cb. This is a pretty rare situation though.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Removes the need to set LIBVA_DRIVER_NAME=gallium for supported targets and is
consistent with vdpau and general gallium drivers.
Note: some versions of libva can detect the gallium name and use the
backend. Although that behaviour seems inconsistent since it only works
for some platforms/backends.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fix warnings like these due to HAVE_LIBDRM being inconsistently defined:
external/libdrm/include/drm/drm.h:839:30: warning: redefinition of typedef 'drm_clip_rect_t' is a C11 feature [-Wtypedef-redefinition]
typedef struct drm_clip_rect drm_clip_rect_t;
HAVE_LIBDRM needs to be set project wide to fix this. This change also
harmlessly links libdrm with everything, but simplifies the makefiles a
bit.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Turn off warnings for -Wpointer-arith, -Wno-missing-field-initializers,
-Wno-initializer-overrides, and -Wno-mismatched-tags. These are all deemed
pointless, on purpose or no plans to fix.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Inline the function into it's only caller. This way it's more obvious
how the classic and gallium drivers (st/mesa) use _mesa_initialize_context.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It has been unused for a long time, plus makes the gallium dri modules
require an extra glapi symbol relative to their classic counterparts.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
The actual code of the function print_table_stats() is guarded
by a ifdef GET_DEBUG, which was not been defined in years.
The last fix in 2013 (7db6b5aa91) indicates that it's rarely
used/tested. Since the issue has gone unnoticed for a whole year
(broken with 2ad4a47547).
Let's remove it for now. We can always revive it at a later stage.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
All of the functions and related data is internal, so there's no point
if using the GL types.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... and associated file(s).
No longer needed since commit 057259655e ("i965: Don't link libmesa or
libdri_test_stubs into tests")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When building from a release tarball (where the generated/built files
are in srcdir) in an OOT fashion we need to have both builddir and
srcdir in the includes list.
Otherwise we'll error out, as the file (header gen_knobs.h in this case)
won't be in the location where we are looking.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
With earlier commit we've handled the `make distclean' out of tree
build, yet we failed to attribute that for in-tree builds the test
condition will return 1. Thus effectively the target will be considered
as "failed".
Fixes: b7f7ec7843 ("mesa: automake: distclean git_sha1.h when building
OOT")
Cc: <mesa-stable@lists.freedesktop.org>
Tested-by: Andy Furniss <adf.lists@gmail.com>
Reported-by: Andy Furniss <adf.lists@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Instead of changing the format on the existing template
which makes error handling not nice and confuses coverity.
CoverityID: 1337953
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
pipe_resource_reference(&res, NULL) will decrement reference counting,
i.e. p_atomic_dec(res->count). But the va surface still has the initial
reference since it has created the resource. So calling vaDestroyImage
on a derived image calls VaDestroyBuffer but the decrementation won't
reach 0. It is just wrong for vlVaDestroyBuffer to rely on the
export_refcount flag. Finally the vaapi intel driver has the same logic.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This change makes sure to remove arrays when checking if type
is a double.
The check for the end of the first slot of a multi-slot double
is also fixed by bumping the check to 4 rather than 3.
Previously we were we not reserving the last component.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since this extension allows more than one varying to share a single
location we can't just count the number of slots a varying takes and
add it to the total.
Instead we now reuse the reserved varyings bitfield to determine how
many slots are reserved for explicit locations instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were programming the number of threads per subslice, when we should
have been programming the total number of threads on the GPU as a whole.
Thanks to Curro and Jordan for helping track this down!
On Skylake GT3e:
- Improves performance in Unreal's Elemental Demo by roughly 1.5-1.7x.
- Improves performance in Synmark's Gl43CSDof by roughly 3.7x.
- Improves performance in Synmark's Gl43GSCloth by roughly 1.18x.
On Broadwell GT2:
- Improves performance in Unreal's Elemental Demo by roughly 1.2-1.5x.
- Improves performance in Synmark's Gl43CSDof by roughly 2.0x.
- Improves performance in Synmark's Gl43GSCloth by 1.47035% +/-
0.255654% (n=25).
On Haswell GT3e:
- Improves performance in Unreal's Elemental Demo (in GL 4.3 mode)
by roughly 1.10x.
- Improves performance in Synmark's Gl43CSDof by roughly 1.18x.
- Decreases performance in Synmark's Gl43CSCloth by -1.99484% +/-
0.432771% (n=64).
On Ivybridge GT2:
- Improves performance in Unreal's Elemental Demo (in GL 4.2 mode)
by roughly 1.03x.
- Improves performance in Synmark's G/43CSDof by roughly 1.25x.
- No change in Synmark's Gl43CSCloth (n=28).
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I don't know that anything actually guarantees this, but if we exceed
the limits, we may end up overflowing and trashing random buffers that
happen to be nearby in the VMA space, leading to rendering corruption,
hangs, or worse.
We should really fix this properly. However, the pitfall has existed
for ages, so for now we should at least detect it.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Most scratch stages use power of two sizes, in kilobytes, where
0 means 1kB. But compute shaders on Haswell have a minimum of 2kB,
and use a representation where 0 = 2kB.
This meant that we were effectively telling the hardware to allocate
each thread twice as much space as we meant to, while simultaneously
not allocating that much space in the buffer, leading to overflows.
Note that the existing code is completely wrong for Ivybridge,
but that will take additional work to sort out, so I've left it
as is for now. A subsequent commit will take care of that.
Together with the previous patches, this fixes rendering corruption
on Synmark's Gl43CSDof on Haswell.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Curro figured this out by investigating the simulator. Apparently
there's also a workaround in the Windows driver. I'm not sure it's
actually documented anywhere.
We were underallocating the scratch buffer by a factor of 128/70.
v2: Rename threads_per_subslice to scratch_ids_per_subslice
(suggested by Jordan Justen).
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We were allocating enough space for the number of threads per subslice,
when we should have been allocating space for the number of threads in
the entire GPU.
Even though we currently run with a reduced thread count (due to a bug),
we might still overflow the scratch buffer because the address
calculation is based on the FFTID, which can depend on exactly which
threads, EUs, and threads are executing. We need to allocate enough
for every possible thread that could run.
Fixes rendering corruption in Synmark's Gl43CSDof on Gen8+.
Earlier platforms need additional bug fixes.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This was disabled due to occasionally incorrect behavior when trying to
upload data. It later became apparent that nvc0 also had a similar but
slightly different issue, which was resolved in commit e50c01d5. This
takes the same logic as nvc0 and applies it to nv50 (which has somewhat
different interfaces).
Unfortunately I did not note down precisely what was broken with UBOs
when removing the support from nv50, but I've tested a bunch of local
traces, and none of them appear to regress. This should hopefully
improve performance when UBOs are used, but this was not directly
verified.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This was fixed in revision 47 of the ARB_dsa spec in Oct 22, 2015. Since
it's horrible to have differing APIs across library versions, we should
attempt to minimize the impact by backporting it as far as possible and
hope no one notices.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
This brings in the fixed glClearNamedFramebufferfi definition, as well
as a lot of GLsizei -> GLsizeiptr changes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
This basically disallows all 8-bit x 3 and 16-bit x 3 formats for
textures and render targets. Some 3-component formats were already
disallowed before. This avoids problems with GL_ARB_copy_image.
v2: the previous version of this patch disallowed all 3-component formats
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Mesa and gallium don't have a complete set of matching 3-component
texture formats. For example, 8-bit sRGB unorm. To fully support
the GL_ARB_copy_image extension we need to have support for all of
these formats: RGB8_UNORM, RGB8_SNORM, RGB8_SRGB, RGB8_UINT, and
RGB8_SINT using the same component order. Since we don't have that,
disable the 3-component formats for now.
v2: Simplify 3-component format check, per Marek.
Also check that target != PIPE_BUFFER.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
1. Try to choose R8G8B8A8 unorm/srgb formats before others in an
effort to try to match component ordering for UINT/SINT/etc.
2. If we can't get a format such as PIPE_FORMAT_A16_UNORM, try
PIPE_FORMAT_R16G16B16A16_UNORM before shallower formats.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
>From OpenGL 4.0 spec, section 4.3.2 "Copying Pixels":
"The pixels corresponding to these buffers are copied from the source
rectangle bounded by the locations (srcX0, srcY 0) and (srcX1, srcY 1)
to the destination rectangle bounded by the locations (dstX0, dstY 0)
and (dstX1, dstY 1). The lower bounds of the rectangle are inclusive,
while the upper bounds are exclusive."
So, the rectangles sharing just an edge shouldn't overlap.
-----------
| |
------- ---
| | |
| | |
------- ---
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
>From OpenGL 4.0 spec, section 4.3.2 "Copying Pixels":
"The pixels corresponding to these buffers are copied from the source
rectangle bounded by the locations (srcX0, srcY 0) and (srcX1, srcY 1)
to the destination rectangle bounded by the locations (dstX0, dstY 0)
and (dstX1, dstY 1). The lower bounds of the rectangle are inclusive,
while the upper bounds are exclusive."
So, the rectangles sharing just an edge shouldn't overlap.
-----------
| |
------- ---
| | |
| | |
------- ---
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This converts to testing for 64-bit types and renames some things
in anticipation of 64-bit integer support.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just makes some generic code that currently emits double
suitable for emitting 64-bit values.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Currently this just doubles, but we'll convert users to this
so making adding 64-bit integers easier.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reworks the #if guards a bit. When Emil originally wrote them, he
just guarded everything. However, part of what anv_entrypoints_gen.py
generates is a hash table for looking up entrypoints based on their name.
This table *cannot* get out of sync between C and python regardless of
preprocessor flags. In order to prevent this, this commit makes us use
void pointers in the dispatch table for those entrypoints which aren't
available. This means that the dispatch table size and entry order is
constant and it should never get out-of-sync with the python.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
When an applications specifies mip levels _before_ setting a mipmap texture
filter, we will initially guess a single texture level. When the second level
image is created, we try to allocate the full texture -- however, we get the
base level size guess wrong if that size is odd. This leads to yet another
re-allocation of the texture later during st_finalize_texture.
Even worse, this re-allocation breaks a (reasonable) assumption made by
st_generate_mipmaps, because the re-allocation in the finalization call will
again allocate a single-level pipe texture (based on the non-mipmap texture
filter!). As a result, mipmap generation fails in interesting ways.
All of this can be avoided by just using the fact that we already know the
size of the base level.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95529
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
At this point, the limits are probably more-or-less correct. If there is
an invalid limit, that's a bug not a FINSHME.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This way the the bind map (which we're caching) is mostly independent of
the pipeline layout. The only coupling remaining is that we pull the array
size of a binding out of the layout. However, that size is also specified
in the shader and should always match so it's not really coupled. This
rendering issues in Dota 2.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Since applications are allowed to specify some set of bindings which need
not be dense they also need not be in order. For most things, this doesn't
matter, but it could result getting the wrong dynamic offsets. This adds a
quick-and-dirty sort to ensure that everything is always in increasing
order of binding index.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
With glx of gstreamer-vaapi, the temporary pixmap for front buffer gets
renewed in each frame, so when we receive a new pixmap, should get a new
front buffer for it.
This also fixes Totem player playback corruption.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
From original commit, the macro "if HAVE_DRI3" was in Makefile.sources,
this file is shared with SCons, SCons is not able to parse this marco,
the SCons build failed. Jose quickly gave two approaches and quick fix
with his second approach, thanks Jose for the solutions and fixes.
This patch is Jose's first approach, and it's more proper, because the
dri3 c file should not be included to build when DRI3 is not enabled.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Besides the old JIT bug, it seems the X86 backend on LLVM 3.3 doesn't
handle llvm.fmuladd and instead it fall backs to a C function. Which in
turn causes a segfault on Windows.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
As suggested by Roland Scheidegger.
Use the same logic as f16c, since fma requires VEX encoding.
But disable FMA on LLVM 3.3 without MCJIT.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This fixes a problem with the CE preamble and restoring only stuff in the
preamble when needed.
To illustrate suppose we have two graphics IB's 1 and 2, which are submitted in
that order. Furthermore suppose IB 1 does not use CE ram, but IB 2 does, and we
have a context switch at the start of IB 1, but not between IB 1 and IB 2.
The old code put the CE RAM loads in the preamble of IB 2. As the preamble of
IB 1 does not have the loads and the preamble of IB 2 does not get executed, the
old values are not load into CE RAM.
Fix this by always restoring the entire CE RAM.
v2: - Just load all descriptor set buffers instead of load and store the entire
CE RAM.
- Leave the ce_ram_dirty tracking in place for the non-preamble case.
v3: - Fixed parameter alignment.
- Rebased to master (Nicolai's descriptor series).
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The reality is that this doesn't matter, because we manually emit the
ARL to the sampler reladdr, and those arguments don't get an extra load
later, so it's effectively just a boolean. However having the types be
wrong is confusing and could trigger very odd bugs should usage change
down the line.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Adding 64-bit integers support was going to make this file worse,
just remove the tabs from it now.
Acked-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
An update in graphics specs has deleted the halign and valign fields
from XY_FAST_COPY_BLT command. See mesa commit 97f0f91.
Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The extension is already advertised in compatibility profile, but
the _mesa_has_compute_shaders only returns true in core profile.
If we advertise it, we should allow it to work.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
v2: only load the clip vertex once
v3: fix clip enable logic, add cullDistance
v4: remove duplicate fields in vs jit key, fix test of clip fixup needed
v5: fix clipdistance linkage for slot!=0,4
v6: support clip+cull; passes most piglit clip (failures understood)
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
GLX documentation states:
glXCreateNewContext can generate the following errors: (...)
GLXBadFBConfig if config is not a valid GLXFBConfig
Function checks if the given config is a valid config and sets proper
error code.
Fixes currently crashing glx-fbconfig-bad Piglit test.
v2: coding style cleanups (Emil, Topi)
use DefaultScreen macro (Emil)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Apply the luma key filter to the YCbCr values during the CSC conversion
in video buffer shader. The initial values of max and min luma are set
to opposite values to disable the filter initially and will be set when
enabling it.
Add extra parmeters min and max luma for the luma key filter in
vl_compositor_set_csc_matrix in va, xvmc. Setting them
to opposite value 1.f and 0.f respectively won't effect the CSC
conversion
v2: -Squash 1,2 and 3 into one patch to avoid breaking build of
other components. (Christian)
-use ureg_swizzle. (Christian)
-change name of the variables. (Christian)
v3: -Squash all patches in one to avoid breaking of build. (Emil)
-wrap functions properly. (Emil)
-use 0.0f and 1.0f instead of 0.f and 1.f respectively. (Emil)
v4: -Divide it in two patches one which introduces the functionality
and assigs dummy values to the changed functions and second which
implements the lumakey filter. (Christian)
-use ureg_scalar instead ureg_swizzle. (Christian)
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
When Kristian implemented GL_TEXTURE_EXTERNAL_OES, he hooked it up for gen8
but not for gen7 or earlier. It all works, we just need to emit the states
for the extra planes.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
When calling virgl_fence_wait() with timeout=0,
virgl_{drm,vtest}_resource_is_busy() is called. However, it returns TRUE
for a busy resource, whereace virgl_fence_wait() should return TRUE for
a completed (non-busy) resource.
This fixes running supertuxkart in a VM (I could not reproduce locally
with vtest though there is a similar fix)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: "11.1 11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the future int64 support will have the same requirements.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This uses the new types interfaces to check for 64-bit types,
as futureproofing against int64 support.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just some better type safety that I noticed while working
on 64-bit integer support.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just moves to the new interfaces in advance of int64.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just prep work for int64 support, changing
places where 64-bit matters no doubles.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just moves code to the new check in advance of int64 support.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds an inline and type query for if a type is 64-bit.
Fow now this is equivalent to double, but int64 will change
this.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Coverity complains that the computed sizes can lead to negative lengths
passed to memcpy. If that happens we've been handed invalid arguments
anyway, so just bomb out.
The funky "0%s" is because the size string for the variable-length part
of the request is of the form "+ safe_pad() ...", and a unary + would
coerce the result to always be positive, defeating the overflow check.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This was for some old unsupported LLVM version.
Only si_create_context creates the target machine now.
r600g doesn't use this function.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The width0/height0/depth0 on stObj may not have been set at this point.
Observed in a trace that set up levels 2..9 of a 2d texture, and set the base
level to 2, with height 1. This made the guess logic always bail.
Originally investigated by Ilia Mirkin, this patch gets rid of the somewhat
redundant storage of width0/height0/depth0 and makes sure we always compute
pipe texture sizes that are compatible with the base level image of the
GL texture.
Fixes the gl-1.2-texture-base-level piglit test provided by Brian Paul.
v2:
- try to re-use an existing pipe texture when possible
- handle a corner case where the base level is not level 0 and it is of
size 1x1x1
v3:
- ptHeight = ptWidth in cube map 1x1 case (suggested by Brian)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This just stops counting and assigning a storage location for
these uniforms, the count is only used to create the uniform storage.
These uniform types don't use this storage.
Reviewed-by: Dave Airlie <airlied@redhat.com>
We were previously unconditionally doing this for arrays and ubo's, and
ignoring texture/storage/atomic buffers. Instead use the usage history
to determine which atoms need to be revalidated.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
With this change, to enable precise SIN and COS instructions
on Intel hardware, one can put
<option name="precise_trig" value="true"/>
in the proper drirc file.
V2: Make option name more generic
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Stephane Marchesin <stephane.marchesin@gmail.com>
Since DCC is enabled almost everywhere now, it's important not to disable
this fast path.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If first_level > 0 and DCC is disabled for that level, let's skip DCC
reads entirely.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Also add dcc_fast_clear_size for clearing only the necessary subset
of DCC. For no AA, it's equal to the size of the whole DCC level.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We want to keep DCC enabled to save bandwidth. It was a bad idea to disable
it here.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will get more complicated with mipmapped DCC or when DCC is enabled
after allocation.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
R9G9B9E5 is the only uncompressed one hopefully.
This fixes incorrect rendering not discovered (due to a lack of tests)
until DCC mipmapping was enabled.
Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Use rasterizer provoking vertex API.
Fix rasterizer provoking vertex for tristrips and quad list/strips.
v2: make provoking vertex tables static const
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
A texture may be redefined with _NEW_TEXTURE, which might have been
bound to a shader image slot. We have to revalidate the image atoms to
pick up on the new resource.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Sometimes a register source can actually be double- or even quad-wide.
We must make sure that the inserted texbars take that width into
account.
Based on an earlier patch by Samuel Pitoiset.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Reduces CPU load for draw calls that change none or few of the descriptors.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
So that callers outside of si_descriptors.c need to worry less about the
details of descriptor handling.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This mask is irrelevant for the generic descriptor set handling, and having it
outside simplifies subsequent changes slightly.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Now that we emit guards for everything, we can just generate the files and
trust build flags to keep us safe. This should also fix the tarball
problems.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
To avoid blocking other EGL calls, release the display mutex before
we enqueue buffer to android frameworks and re-acquire the mutex
upon return.
v2: moved lock/unlock inside droid_window_enqueue_buffer().
TEST=verify pinch zoom in Photos app no longer causes hangs
Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
In the case of out-of-tree (OOT) builds, in particular when building
from tarball, we'll end up with the file in both srcdir and builddir.
We want the former to remain intact (since we need it on rebuild) while
the latter should be removed otherwise `make distclean' gets angry at
us.
Ideally there'll be a solution that feels a bit less of a hack. Until
then this does the job exactly as expected.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
... when copied from git_sha1.h.
As the latter file can we lacking the write attribute, one should set it
explicitly. Otherwise we'll get a warning/failure at cleanup stage.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Otherwise the build will assume that we've talking about builddir, which
is not the case in the else statement.
Here the file is already generated and is part of the tarball.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
With earlier commit we introduced support for render_node devices, which
was couples with the use of the image loader extension.
As the work was inspired by egl/wayland we (erroneously) added the
extension for the !render_node path as well.
That works for wayland, as the implementations of the DRI2 and IMAGE
loader extensions converge behind the scenes. As that is not yet
the case for Android we shouldn't expose the extension.
Fixes: 34ddef39ce ("egl: android: add dma-buf fd support")
Cc: <mesa-stable@lists.freedesktop.org>
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Tested-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
We don't import textures with DCC now, but soon we will.
v2: if we can't disable DCC for image writes, at least decompress DCC
at bind time
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Could cause issues if you tried to read from an uninitialised pointer.
This just initalises the pointer to null to avoid that being a problem.
Discovered by Coverity.
CID: 1343616
Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We've had a FINISHME here since Eric originally wrote the code in 2011.
This patch implements his suggested approach, which makes us actually
able to copy propagate into the loops, at the unfortunate cost of making
this pass even more expensive.
The shader-db statistics are basically a wash:
No change in instruction counts.
total cycles in shared programs: 78685980 -> 78680730 (-0.01%)
cycles in affected programs: 2102646 -> 2097396 (-0.25%)
helped: 48
HURT: 83
I figured if we're going to do this for one copy propagation pass,
we may as well do it in both.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We've had a FINISHME here since Eric originally wrote the code in 2010.
This patch implements his suggested approach, which makes us actually
able to copy propagate into the loops, at the unfortunate cost of making
this pass even more expensive.
The shader-db statistics are not terribly impressive:
total instructions in shared programs: 9008589 -> 9008613 (0.00%)
instructions in affected programs: 4293 -> 4317 (0.56%)
helped: 0
HURT: 10
total cycles in shared programs: 78550978 -> 78575760 (0.03%)
cycles in affected programs: 655426 -> 680208 (3.78%)
helped: 75
HURT: 88
GAINED: 2
Most of the "regressions" appear to be us successfully copy propagating
uniforms, which i965 is loading as pull constants instead of push, so we
occasionally have two pulls instead of one. That doesn't seem like this
pass's job - it's propagating correctly, and we should be smarter about
pull loads in the backend.
This patch is also useful for a couple of reasons:
1. It can clean up copies created by varying packing (previously, we
couldn't if the uses were inside a loop).
This fixes a bug when interpolateAt*() is used on a packed varying
inside a loop: glsl_to_nir struggles to see through the extra copy
and mistakenly believed the variable was not an input.
2. It will help propagate uniform array access created by
lower_const_array_to_uniforms().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Like floats, we should use the round toward 0 mode instead of the
nearest one (which is the default) for doubles to integers.
This fixes all arb_gpu_shader_fp64 piglits which convert doubles to
integers (16 tests).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
The GL4.5 spec quote seems clear on this:
"The value -1 will be returned by either command if an error occurs,
if name does not identify an active variable on programInterface,
or if name identifies an active variable that does not have a valid
location assigned, as described above."
This fixes:
GL45-CTS.program_interface_query.output-built-in
[airlied: use _mesa_program_resource_location_index as
suggested by Eduardo]
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cast the unsigned semantic index to integer datatype before comparing
to max_generic, otherwise, max_generic which is initialized to -1
will be converted to unsigned int before the comparison, causing a wrong
semantic index to be assigned to a shader output.
Fixes the assert running TurboCAD_gl.trace. (VMware bug 1667265)
Also tested with glretrace, mesa demos pointblast, spriteblast and pointcoord.
v2: use the original max_generic variable but add the (int) cast
to the semantic index, as suggested by Brian.
Reviewed-by: Brian Paul <brianp@vmware.com>
When TGSI debug flag is enabled, print the shader linkage info as well.
Tested with mesa demos with SVGA_DEBUG=tgsi
Reviewed-by: Brian Paul <brianp@vmware.com>
ARB_shader_image_load_store only requires a very fixed list of formats
to be supported, while textures may be in all kinds of formats, like
BGRA which are presently not supported on at least Kepler.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Switches to using truncf in micro_trunc.
Fixes the following piglit tests (for softpipe):
/spec/glsl-1.30/execution/built-in-functions/...
fs-trunc-float
fs-trunc-vec2
fs-trunc-vec3
fs-trunc-vec4
vs-trunc-float
vs-trunc-vec2
vs-trunc-vec3
vs-trunc-vec4
/spec/glsl-1.50/execution/built-in-functions/...
gs-trunc-float
gs-trunc-vec2
gs-trunc-vec3
gs-trunc-vec4
Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
When we are not packing a double input varying, we might need to
read its data in a non-aligned to 64-bit offset, so we read
the wrong data. This is happening when using explicit locations
in varyings because Mesa disables packing varying for that case.
const_index is in 32-bit size units but offset() is multiplying
it by destination type size units. When operating with double
input varyings, const_index value could be not aligned to 64 bits.
To fix it, we load the double vector as if it was a float based vector
with twice the number of components.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Data starts at suboffet 3 in 32-bit units (12 bytes), so it is not
64-bit aligned and the current implementation fails to read the data
properly. Instead, when there is is a double input varying, read it as
vector of floats with twice the number of components.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
From GLSL 4.5 spec, "4.4.2.3 Geometry Outputs".
"all geometry shader output vertex count declarations in a
program must declare the same count."
Fixes:
GL45-CTS.geometry_shader.output.conflicted_output_vertices_max
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Although the glsl_types.h stores this in a bitfield,
we should hide that from everyone else. Hide the cast
in an accessor method and use the enum everywhere.
This makes things a bit nicer in gdb, and improves type
safety.
v2: fix a few pieces of interface I missed that caused some
piglit regressions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
For 3D textures we shouldn't be using NumLayers, we need
to get it from the depth.
This fixes:
GL45-CTS.geometry_shader.layered_framebuffer.clear_call_support
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
With tessellation shaders we can have cases where we have
arrays of anon structs, so make sure we match using without_array().
Fixes:
GL45-CTS.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_in
v2:
test lengths match as well (Ilia)
v3:
descend array lengths to check for matches as well (Ilia)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes a crash in
GL43-CTS.shader_subroutine.subroutines_not_allowed_as_variables_constructors_and_argument_or_return_types
If we can't find the func_name in one of these paths,
we have emitted an earlier error so just return here.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
GL43-CTS.compute_shader.work-group-size does
uniform uint g_uniform[gl_WorkGroupSize.z + 20] = { 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24 };
The initializer triggers the GLSL 4.30/GLES3 tests
for constant sequence subexpressions, so it doesn't
happen unless you are using those, so just return
false as this path is now reachable.
v2: update commit msg with diagnosis
Acked-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
PATH_MAX is apparently not a thing on Windows. Borrow the hack from
pipe_loader.c to try and make this work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This writes linked shader programs to .shader_test files to
$MESA_SHADER_CAPTURE_PATH in the format used by shader-db
(http://cgit.freedesktop.org/mesa/shader-db).
It supports both GLSL shaders and ARB programs. All stages that
are linked together are written in a single .shader_test file.
This eliminates the need for shader-db's split-to-files.py, as Mesa
produces the desired format directly. It's much more reliable than
parsing stdout/stderr, as those may contain extraneous messages, or
simply be closed by the application and unavailable.
We have many similar features already, but this is a bit different:
- MESA_GLSL=dump writes to stdout, not files.
- MESA_GLSL=log writes each stage to separate files (rather than
all linked shaders in one file), at draw time (not link time),
with uniform data and state flag info.
- Tapani's shader replacement mechanism (MESA_SHADER_DUMP_PATH and
MESA_SHADER_READ_PATH) also uses separate files per shader stage,
but allows reading in files to replace an app's shader code.
v2: Dump ARB programs too, not just GLSL.
v3: Don't dump bogus 0.shader_test file.
v4: Add "GL_ARB_separate_shader_objects" to the [require] block.
v5: Print "GLSL 4.00" instead of "GLSL 4.0" in the [require] block.
v6: Don't hardcode /tmp/mesa.
v7: Fix memoization of getenv().
v8: Also print "SSO ENABLED" (suggested by Timothy).
v9: Also handle ES shaders (suggested by Ilia).
v10: Guard against MESA_SHADER_CAPTURE_PATH being too long; add
_mesa_warning calls on error handling (suggested by Ben).
v11: Fix crash when variable is unset introduced in v10.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Images invalidation is a bit weird on Fermi and there is already a hack
which forces invalidating all images when launching a computer shader
to help in fixing 3D<->CP interaction.
However, we need to re-validate images for compute because
nvc0_compute_invalidate_surfaces() will destroy the previous binding.
This is not really good for performance purposes but this might be
improved later.
This fixes the following piglits:
- spec/arb_compute_shader/execution/basic-uniform-access
- spec/arb_compute_shader/execution/mutiple-texture-reading
- spec/arb_compute_shader/execution/multiple-workgroups
- spec/glsl-4.30/execution/built-in-functions/cs-* (207 tests)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This should fix spec@arb_shader_image_load_store@level.
Broken by:
Commit: 95c5bbae66
radeonsi: set some image descriptor fields at bind time
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We would revalidate images when anything was touched at all. Which is
unfortunate, since the state tracker does not use CSO's to reduce the
workload. So instead implement a protocol to ensure that something has
changed before revalidating all the images.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
We would revalidate buffers when anything was touched at all. Which is
unfortunate, since the state tracker does not use CSO's to reduce the
workload. So instead implement a protocol to ensure that something has
changed before revalidating all the SSBOs.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
The fix in:
anv: let anv_entrypoints_gen.py generate proper Wayland/Xcb guards
breaks things if wayland headers aren't installed.
Separate things out properly to avoid that problem.
[airlied: fixed up to put in pre-existing sections].
Reported-by: Arjan van de Ven
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The main impact is that {upload, draw, upload, draw, ..} doesn't flush
framebuffer caches before every upload.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
v3: use PFP_SYNC_ME on EG-CM only when supported by the kernel,
otherwise use MEM_WRITE + WAIT_REG_MEM to emulate that
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
When upscaling you can end up interpolating between the edge pixel and one
past the edge. Using CLAMP_TO_EDGE seems like the most reasonable thing to
do in this case. This fixes two of the new Vulkan CTS tests in
dEQP-VK.api.copy_and_blit.blit_image.*
Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Getting rid of the default case makes the compiler warn if we are missing
cases. While we're here, we also add the one missing case.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
glslang frequently throw bogus decorations into shaders. While we are free
to assert-fail, it's a bit nicer to the application to just warn.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Previously we supported a subset of capabilities and just left a default
case for the others. It's time to stop being lazy and actually audit the
capabilities. This should bring them up-to-date with reality.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This is more accurate than calling
_mesa_active_fragment_shader_has_side_effects because it looks at whether
or not the SSBOs, images, or atomic buffers are actually written rather
than just existing in the program.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
We were using this briefly in the i965 driver to trigger recompiles but we
haven't been using it since we switched to the NIR y-transform lowering
pass.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit c1107cec44.
Apparently the hardware spec text I quoted in the commit message was
outright lying about scalar source math being supported on SNB, the
hardware seems to load 32 contiguous bits of data for each channel
regardless of the regioning mode. Fixes regressions in the following
CTS tests (which we didn't catch early due to CTS being temporarily
disabled in our CI system):
es2-cts.gtf.gl.atan.atan_vec3_frag_xvary
es2-cts.gtf.gl.cos.cos_vec2_frag_xvary
es2-cts.gtf.gl.atan.atan_vec2_frag_xvary
es2-cts.gtf.gl.pow.pow_vec2_frag_xvary_yconsthalf
es2-cts.gtf.gl.cos.cos_float_frag_xvary
es2-cts.gtf.gl.pow.pow_float_frag_xvary_yconsthalf
es2-cts.gtf.gl.atan.atan_vec3_frag_xvaryyvary
es2-cts.gtf.gl.pow.pow_vec3_frag_xvary_yconsthalf
es2-cts.gtf.gl.cos.cos_vec3_frag_xvary
es2-cts.gtf.gl.atan.atan_vec2_frag_xvaryyvary
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96346
Reported-by: Mark Janes <mark.a.janes@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
The conditional mod of these instructions determines the semantics of
the comparison itself (rather than being evaluated based on the result
of the instruction as is usually the case for most other instructions
that allow conditional mods), so it's in general not legal to
propagate a conditional mod into a CMP instruction. This prevents
cmod propagation from (mis)optimizing:
cmp.z.f0 tmp, ...
mov.z.f0 null, tmp
into:
cmp.z.f0 tmp, ...
which gives the negation of the flag result of the original sequence.
I originally noticed this while working on SIMD32 in the scalar
back-end, but the same scenario is likely to be possible in vec4
programs so this commit ports the bugfix with the same name from the
scalar back-end to the vec4 cmod propagation pass.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Give algebraic-opt pass a chance to catch udiv by const power-of-two,
before running lower-idiv pass.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Some optimizations, like converting integer multiply/divide into left/
right shifts, have additional constraints on the search expression.
Like requiring that a variable is a constant power of two. Support
these cases by allowing a fxn name to be appended to the search var
expression (ie. "a#32(is_power_of_two)").
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When a shader image view into a buffer texture can be written to, the buffer's
valid range must be updated, or subsequent transfers may incorrectly skip
synchronization.
This fixes a bug that was exposed in Xephyr by PBO acceleration for glReadPixels,
reported by Michel Dänzer.
Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For ES 3.0 NUM_SAMPLE_COUNTS spec points that some formats will be
always zero. But on ES 3.1 can be different to zero.
The current code is correctly checking exactly against version 3.0,
but the comment only mentions 3.0 spec. It is clearer mentioning both.
v2: better wording on the comment (Ian Romanick)
Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When validating attributes during surface creation we should account
for the default values of texture target and format (EGL_NO_TEXTURE)
since the user is not obligated to explicitly set both via the
attribute list passed to eglCreatePbufferSurface.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
isl library is needed to build i965, libmesa_isl static library is added
to fix related Android building errors.
Any attempt to build libmesa_genxml as phony package module failed to deliver
gen{7,75,8,9}_pack.h generated headers, needed for libmesa_isl_gen{7,75,8,9}
Due to constraints in Android Build System, libmesa_genxml is built as static,
at least one source is needed, so dummy.c is autogenerated for this scope,
libmesa_genxml dependency is declared using LOCAL_WHOLE_STATIC_LIBRARIES,
to avoid building errors due to missing genxml/gen{7,75,8,9}_pack.h headers.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixes the following building error:
target C++: libmesa_glsl <= external/mesa/src/compiler/glsl/glsl_to_nir.cpp
In file included from external/mesa/src/compiler/glsl/glsl_to_nir.h:28:0,
from external/mesa/src/compiler/glsl/glsl_to_nir.cpp:28:
external/mesa/src/compiler/nir/nir.h:42:25: fatal error: nir_opcodes.h: No such file or directory
compilation terminated.
build/core/binary.mk:432: recipe for target 'out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_glsl_intermediates/glsl/glsl_to_nir.o' failed
make: *** [out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_glsl_intermediates/glsl/glsl_to_nir.o] Error 1
make: *** Waiting for unfinished jobs....
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Including the file in both ISL_FILES and ISL_GENERATED_FILES makes
the actual dependency list less obvious.
v2: Drop unrelated vulkan hunk (Jason).
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With earlier commit 3689ef32af ("automake: rework the git_sha1.h rule,
include in tarball") we/I erroneously removed the PHONY rule and the
temporary file.
The former is used to ensure that the header is regenerated when on each
make invocation, while the latter helps us avoid the unneeded rebuild(s)
when the SHA1 hasn't changed.
Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Instead of just allow copy of a rectangle in svga_transfer_dma_band(),
this patch allows it to copy a box, hence allows copy a 3d texture
in one transfer.
Fixes black screen in running Heaven after commit fb9fe35. (Bug 1663282)
Tested with Heaven, glretrace, piglit.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Coverity doesn't realize idx will never be negative. Throw in some
assert()s to help it out.
(Hopefully assert() isn't getting compiled out for coverity build.. but
there seems to be just one way to find out. We might have to change
these to assume())
Fixes CID 1362442, 1362443
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Maybe we should switch to ureg to build the builtin shaders. But at any
rate, if they fail to compile it is because someone messed them up (or
changed TGSI syntax?).
CID 1362444
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Skipping the temporary allocation and copy instructions is easy (just
return dst), but the conditions used to find out whether the copy can
be optimized out safely without breaking the program are rather
complex: The destination must be exactly one component of at most the
execution width of the lowered instruction, and all source regions of
the instruction must be either fully disjoint from the destination or
be aligned with it group by group.
v2: Don't handle partial source-destination overlap for simplicity
(Jason). No instruction count regressions with respect to v1 in
either shader-db or the few FP64 shader_runner test-cases with
partial overlap I've checked manually.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The specs says INVALID_VALUE for exceeding dimensions,
which is really what is happening here.
This fixes:
GL45-CTS.copy_image.non_existent_mipmap
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This test was only happening for textures, but there is
nothing in the spec to say this, so test it for all cases.
This fixes:
GL45-CTS.copy_image.invalid_target
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Never can happen, since query would not have been created in the first
place if pidx(query_type) return negative. Lets let coverity realize
this.
CID 1362460
Signed-off-by: Rob Clark <robclark@freedesktop.org>
ptr can actually never be null so just drop the check.
CID 1362464 (#1 of 1): Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking ptr suggests that it may be null,
but it has already been dereferenced on all paths leading to the check.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Coverity warns in multiple places about the potential for division by
zero, caused by this function's default case.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
According to the EGL specifications before binding an API
we must check whether it's supported first. If not eglBindAPI
should return EGL_FALSE and generate a EGL_BAD_PARAMETER error.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These two lines have been here since the file was created.
I'm guessing the second one was just for testing during dev, so it's the
one that's going away.
CoverityID: 1296205
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
Check for null pointer before accessing arrays in get/put bits
native/YCbCr/Indexed in VdpOutputSurface and VdpVideoSurface.
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The comment clarifies that the driver is called only to try to get
a preferred internalformat, and that it was already checked if the
format is supported or not.
Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Right now the implementation only checks if the internalformat is
supported or not. But that implementation is wrong, returning
unsupported for some internalformats. Additionally, checking if
the internalformat is supported or not is already done at mesa/main
before calling the driver hook, so this new check is not needed.
Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Among other thigs, fix a gpu hang when using INTEL_DEBUG=shader_time
for any shader.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The old method pushed data for each channels uvec3 data of
gl_LocalInvocationID.
The new method pushes 1 dword of data that is a 'thread local ID'
value. Based on that value, we can generate gl_LocalInvocationIndex
and gl_LocalInvocationID with some calculations.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.
One complication is that cross-thread constants are loaded into
registers before per-thread constants. Previously, our local IDs were
loaded before the uniform data and treated as 'payload' data, even
though they were actually pushed into the registers like the other
uniform data.
Therefore, in this patch we simultaneously enable a newer layout where
each thread now uses a single uniform slot for a unique local ID for
the thread. This uniform is handled specially to make sure it is added
last into the uniform push constant registers. This minimizes our
usage of push constant registers, and maximizes our ability to use
cross-thread constants for registers.
To swap from the old to the new layout, we also need to flip some
lowering pass switches to let our driver handle the lowering instead.
We also no longer force thread_local_id_index to -1.
v4:
* Minimize size of patch that switches from the old local ID layout
to the new layout (Jason)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.
We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.
v4:
* Support the old local ID push constant layout as well (Jason)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.
We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.
v4:
* Support the old local ID push constant layout as well (Jason)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We need information about push constants in a few places for the GL
driver, and another couple places for the vulkan driver.
When we add support for uploading both a common (cross-thread) set of
push constants, combined with the previous per-thread push constant
data, things are going to get even more complicated. To simplify
things, we add push constant info into the cs prog_data struct.
The cross-thread constant support is added as of Haswell. To support
it we need to make sure all push constants with uniform values are
added to earlier registers. The register that varies per thread and
holds the thread invocation's unique local ID needs to be added last.
For now we add the code that would calculate cross-thread constatn
information for hsw+, but we force it (cross_thread_supported) off
until the other parts of the driver support it.
v4:
* Support older local ID push constant layout as well. (Jason)
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We add a lowering pass for nir intrinsics. This pass can replace nir
intrinsics with driver specific nir lower code.
We lower the gl_LocalInvocationIndex intrinsic based on a uniform
which is loaded with a thread specific ID.
We also lower the gl_LocalInvocationID based on
gl_LocalInvocationIndex.
v2:
* Create variable during lowering pass. (Ken)
v3:
* Don't create a variable, but instead just insert an intrisic call
to load a uniform from the allocated location. (Jason)
v4:
* Don't run this pass if thread_local_id_index < 0
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This thread ID uniform will be used to compute the
gl_LocalInvocationIndex and gl_LocalInvocationID values.
It is important for this uniform to be added in the last push constant
register. fs_visitor::assign_constant_locations is updated to make
sure this happens.
The reason this is important is that the cross-thread push constant
registers are loaded first, and the per-thread push constant registers
are loaded after that. (Broadwell adds another push constant upload
mechanism which reverses this order, but we are ignoring this for
now.)
v2:
* Add variable in intrinsics lowering pass
* Make sure the ID is pushed last in assign_constant_locations, and
that we save a spot for the ID in the push constants
v3:
* Simplify code based with Jason's suggestions.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v4:
* Force thread_local_id_index to -1 for now, and have
fs_visitor::setup_cs_payload look at thread_local_id_index. This
enables us to more easily cut over from the old local ID layout to
the new layout, as suggested by Jason.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We will chain multiple chunks together and will keep pointers to the older
chunks to support IB dumping.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This avoids allocating giant IBs from the outset, especially for CE and DMA.
Since we now limit max_dw only by the size that the buffer happens to be
(which, due to the buffer cache, can be even larger than the rounded-up size
we request), the new function amdgpu_ib_max_submit_dwords controls when we
submit an IB.
With this change, we effectively never flush prematurely due to the CE IB,
after an initial warm-up phase.
v2:
- clean up buffer_size calculation
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Isolines aren't reversed. commit 5b2d8c2273 fixed this for the vec4
TES backend, but not the scalar one.
Found while debugging GL45-CTS.tessellation_shader.
tessellation_control_to_tessellation_evaluation.gl_tessLevel.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
v2: require PIPE_CAP_SAMPLER_VIEW_TARGET; technically only needed for some of
the texture targets, but all hardware that has shader images should also
have this cap.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For better bisectability given that the order of some of the fallback tests
in the blit path are rearranged.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will be used to select a slice of a 3D texture.
v2: fix a comment (Marek)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For downloads, the fragment shader must know the source texture target, hence
we may cache multiple fragment shaders.
v2: break long line (Marek)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
At the same time, rename members that are upload-specific to say so.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The string "[0]\0" is the same as "[0]" as far as the C string datatype
is concerned. That string has length 3. strncmp(s, length_3_string, 4)
is the same as strcmp(s, length_3_string), so make it be strcmp.
v2: Not the same as strncmp(..., 3). Noticed by Ilia.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This improves throughput by keeping TTM overhead down.
Some piglit tests such as texelFetch and streaming-texture-leak will
use less memory now.
v2: use gart_size / 4 as the threshold
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
mainly the fields that can change by reallocating a texture and changing
the tile mode
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Just for consistency. This doesn't fix anything, because DCC is not
supported with non-mipmapped textures.
v1.1: fix the comment about DCC
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
With the introduction of fp64 and fp16 to nir, there are now a bunch of
float types running around. A F1 2015 shader ends up with an i2f.sat
operation, which has a nir_type_float32 destination. Allow sat on all
the float destination types.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The initial ARB_sampler_objects spec had GL_INVALID_VALUE in it,
however version 8 of it fixed this, and the GL specs also have
the fixed value in them.
Fixes:
GL45-CTS.texture_border_clamp.samplerparameteri_non_gen_sampler_error
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes rendering in Shadow of Mordor with rbc. Application writes
RGBA_UNORM texture filling it with values the application wants to
later on treat as SRGB_ALPHA.
Intel driver enables lossless compression for the buffer by the time
of writing. However, the driver fails to make sure the buffer can be
sampled as something else later on and unfortunately there is
restriction in the hardware for using lossless compression for srgb
formats which looks to extend itself to the sampling engine also.
Requesting srgb to linear conversion on top of compressed buffer
results the color values to be pretty much garbage.
Fortunately none of tracked benchmarks showed a regression with
this.
v2 (Matt): Add missing space
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We weren't setting up several of the uniform values for the patch
header, so we'd crash when uploading push constants. We at least
need to initialize them to zero. We also had the isoline parameters
reversed, so it would also render incorrectly (if it didn't crash).
Fixes a new Piglit test(*) (isoline-no-tcs), as well as crashes in
GL44-CTS.tessellation_shader.single.max_patch_vertices.
(*) https://lists.freedesktop.org/archives/piglit/2016-May/019866.html
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
The driver was adding the skip components but always for buffer 0.
This fixes:
GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_skip_multiple_buffers
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
e2791b38b4
mesa/program_interface_query: fix transform feedback varyings.
caused a regression in
GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_multiple_streams
on radeonsi.
The problem was it was using the skip components varying to set
the stream id, when it should wait until a varying was written,
this just adds the varying checks in the right place.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
According to GL4.5 spec:
An INVALID_OPERATION error is generated if any part of the speci-
fied buffer range is mapped with MapBufferRange or MapBuffer (see sec-
tion 6.3), unless it was mapped with MAP_PERSISTENT_BIT set in the Map-
BufferRange access flags.
So we should use the if range is mapped path.
This fixes:
GL45-CTS.buffer_storage.map_persistent_buffer_sub_data
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "12.0, 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This really only hits for bitsets with a size of a multiple of 32. We
can end up with pos = -1 as a result of the ffs, which we in turn decide
is a valid position (since we fall through the loop and i == 1, we end
up adding 32 to it, so end up returning 31 again).
Up until recently this was largely unreachable, as the register file
sizes were all 63 or 255. However with the advent of compute shaders
which can restrict the number of registers, this can now happen.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
This reverts commit aac90ba292.
The commit caused a regression in:
piglit.spec.glsl-1_50.compiler.gs-input-nonarray-named-block.geom
Also the CTS test it was meant to fix seems like it may be bogus.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
I haven't found any evidence that this isn't supported by the
hardware, in fact according to the SNB hardware spec:
"The supported regioning modes for math instructions are align16,
align1 with the following restrictions:
- Scalar source is supported.
[...]
- Source and destination offset must be the same, except the case of
scalar source."
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is the case for SNB math instructions so we need to be careful
and insert the literal value of the immediate into the table (rather
than its absolute value) if the instruction is unable to invert the
sign of the constant on the fly.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Which requires using a bitset instead of a boolean flag to keep track
of the GRFs we've seen a generating instruction for already. The
search loop continues until all instructions initializing the value of
the source VGRF have been found, or it is determined that coalescing
is not possible.
Fixes a few piglit test cases on Gen4-6 which were regressed by
6956015aa5 due to the different (yet
perfectly valid) ordering in which copy instructions are emitted now
by the simd lowering pass, which had the side effect of causing this
optimization pass to start corrupting the program in cases where a
VGRF-to-MRF copy instruction would be eliminated but only the last
instruction writing to the source VGRF region would be rewritten to
point to the target MRF.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will be required to correctly transform the destination of 8-wide
instructions that write a single GRF of a VGRF to MRF copy marked
COMPR4.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will allow compute_to_mrf to handle cases where the source of the
VGRF-to-MRF copy is initialized by more than one instruction. In such
cases we cannot rewrite the destination of any of the generating
instructions until it's known whether the whole VGRF source region can
be coalesced into the destination MRF, which will imply continuing the
search until all generating instructions have been found or it has
been determined that the VGRF and MRF registers cannot be coalesced.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Compute-to-mrf was checking whether the destination of scan_inst is
more than one component (making assumptions about the instruction data
type) in order to find out whether the result is being fully copied
into the MRF destination, which is rather inaccurate in cases where a
single-component instruction is only partially contained in the source
region, or when the execution size of the copy and scan_inst
instructions differ. Instead check whether the destination region of
the instruction is really contained within the bounds of the source
region of the copy.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Compute-to-mrf was being rather heavy-handed about checking whether
instruction source or destination regions interfere with the copy
instruction, which could conceivably lead to program miscompilation.
Fix it by using regions_overlap() instead of the open-coded and
dubiously correct overlap checks.
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now there are not files that require python 3, so for now just remove
the python 3 dependency and use python 2. I think the right plan is to
just get all of the python ready for python 3, and then use whatever
python is available.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
cc: 12.0 <mesa-stable@lists.freedesktop.org>
By using a counter to quickly reject textures that are not
bound to a framebuffer, the performance impact when binding
sampler_views/images is not too large.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The generated sources should follow the example set by the vulkan
headers and our non-generated code. Namely: the code for all supported
platforms should be available, each one guarded by its respective
VK_USE_PLATFORM_*_KHR macro.
v2: Reword commit message.
Cc: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96285
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1 over IRC)
This parameter is actually a bitmask of PIPE_TRANSFER_x flags.
Change it back to a simple unsigned type. IIRC, some compilers
complain about masks of enum values. Also, this make the function
signature match u_resource_vtbl::transfer_map() again.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The GL spec has been clarified and the new rule says we should just
copy 1 sample. u_blitter does the right thing.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The GL spec has been clarified and the new rule says we should just
copy 1 sample. u_blitter does the right thing.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Coverity is getting a false positive that a division by zero can occur
here. This change will silence the Coverity warnings as a division by zero
cannot occur in this case.
Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
`unsigned j` would never fail `j >= 0`, leading to an infinite loop as
`j--` wraps around.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The CTS test:
GL45-CTS.multi_bind.dispatch_bind_image_textures
binds 192 image uniforms, we reject this later,
but not until after we trash the contents of the
struct gl_shader.
Error now reads:
Too many compute shader image uniforms (192 > 16)
instead of
Too many compute shader image uniforms (2745344416 > 16)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This effectively limits registers to 32 and 64 for fermi and kepler when
1024 threads are used, but allows the full amount to be used with
smaller thread sizes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Now that vc4 automated code documentation can be generated with
doxygen, fix the warnings issued by Doxygen 1.8.11.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Add Gallium and the Gallium-based drivers to doxygen's automated
code documentation infrastructure.
Can be individually created with:
cd $MESA_TOP_LEVEL/
make -C doxygen/ gallium.tag
Benefits from the existing doxygen Makefile runners to clean up
afterwards with 'make clean'.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This patch makes it possible to build classic osmesa/swrast on windows
again. It was removed in commit 69db422218.
Although there is a gallium version of osmesa now, the swrast version
still has more features lacking in llvmpipe, e.g. anisotropic filtering.
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
[Emil Velikov: remove trailing whitespace]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
As we'll need the file in the release tarball, rework the rule so that
the file is regenerated _only_ if we're in a git repository.
With this in place we can build vulkan (anv) from a release tarball.
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
This way we can reuse the header from other places like -
src/intel/vulkan and src/gallium. Only the former is hooked up atm.
Make sure .gitignore is updated, as well as all the users (the mesa
code does not need any changes).
Also ensure that the file is always created by adding it to the
BUILT_SOURCES target.
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
As is there are two places that do the typedefs - dri_interface.h and
this header. As we cannot include the former in here, just drop the
typedefs and use the struct directly (as needed).
This is required because typedef redefinition is C11 feature which is
not supported on all the versions of GCC used to build mesa.
v2: Kill the typedef alltogether, as per Marek.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96236
Cc: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Rather than checking if the function name maps to a valid entry in the
respective table, just create a dummy entry at the end of each table.
This allows us to remove some unnessesary "index >= 0" checks, which get
executed quite often.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
It will allows us to find the function within 6 attempts, out of the ~80
entry long table.
v2: calculate middle on each iteration, correctly set the lower limit.
Reviewed-by: Adam Jackson <ajax@redhat.com> (v1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Things have changed since commit a92910a ("glx: Refactor the configure
options for glx implementation choice (v3)") where only a single
configure option is used to control the GLX provider.
[Emil Velikov: Ensure that the check is moved after the detection code.]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
With reference to the libglvnd branch:
https://cgit.freedesktop.org/mesa/mesa/log/?h=libglvnd
This is a squashed commit containing all of Kyle's commits, all but two
of Emil's commits (to follow), and a small fixup from myself to mark the
rest of the glX* functions as _GLX_PUBLIC so they are not exported when
building for libglvnd. I (ajax) squashed them together both for ease of
review, and because most of the changes are un-useful intermediate
states representing the evolution of glvnd's internal API.
Co-author: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
(a) Make sure to update the TIC in case of an updated buffer address
(b) Mark newly-inactive textures dirty so that we update the handle in
set_tex_handles.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This fixes:
GL45-CTS.direct_state_access.xfb_buffers
This test looks correct to me, we should work out the
size value and report it rather than using only the size
from the Range interface.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The dependencies should not mention any files external to the project.
If we want to do sanity checks for the LLVM installed on the system we
should do that in configure, yet again where is the merit which header
gets checked and which doesn't ?
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Otherwise `make distcheck' will barf at us as the file is dangling.
Ideally this should be part of the clean-local hook, although we include
install-lib-links.mk which already has one.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We should not have removed them in the first place. There's a subtle
difference between generating the complete sources and using them which
was not obvious as we nuked them.
Without this, the release tarball ends up without various hunks of the
generated sources, thus things fail at a later stage as we attempt to
build them.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
As seen elsewhere - we want to include the freshly built sources as
opposed the the (likely) stale ones in the srcdir.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fold the unneeded extra variable tests_ldadd, the explicit sources
section (single file with the default extension) and flip the
check_PROGRAMS <> TESTS order (TESTS includes scripts, while
check_PROGRAMS is binaries only).
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
One uses the makefile to create compatibility symlinks (to
$top_builddir/libs) for shared libraries/modules. As we don't create any
here, there's no need to include the file.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Otherwise we'll end up setting up a device with no winsys integration.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
---
Hard-coding the rendernode name in anv_physical_device_init() is a bad
idea really. We could/should be using drmGetDevices() to get info on all
the devices (master/render/etc. node names, pci location etc.) and apply
our heuristics on top of that.
That can come up as a follow up change.
Ensure that the final X11/XCB hunk is guarded by the correct macro.
Otherwise we'll require the symbol even when building without said
platform.
Cc: Cedric Sodhi <manday@openmail.cc>
Reported-by: Cedric Sodhi <manday@openmail.cc>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The return variable was not set for failure paths.
It has now been changed to VK_ERROR_INITIALIZATION_FAILED
for failure paths.
Coverity: 1358944
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
[Emil Velikov: rebase against master, s/vulkan/anv/]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Push offset down to drivers when importing dmabuf. This is needed
to more fully support EGL_EXT_image_dma_buf_import when a non-zero
offset is specified.
Tesing has been done for freedreno, and compile tested following
gallium drivers:
nouveau,svga,virgl,r600,r300,radeonsi,swrast,i915,ilo
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This helps to import dmabuf buffers from DRM_FORMAT_R8 and
DRM_FORMAT_GR88 used for example by GStreamer for YUV to RGB
conversion using shaders.
Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
We know that there cannot be any destination dependency race if we
reach the beginning or end of the program without having found any
other instruction the send could possibly race with. This avoids
emitting a pile of useless moves at the beginning or end of the
program in the most common case in which the program has a single
basic block only.
On the original i965 I get the following shader-db results:
total instructions in shared programs: 3354165 -> 3215637 (-4.13%)
instructions in affected programs: 3183065 -> 3044537 (-4.35%)
helped: 13498
HURT: 0
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Just to make sure we keep the SIMD lowering pass tidy when we
introduce additional logic to try to optimize out the copy
instructions used to zip and unzip the destination and source regions
into multiple packed regions of the lowered instruction width.
Shouldn't cause any functional changes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will be useful in several places. The only externally visible
difference (other than non-VGRF files being supported now) is that the
region sizes are now passed in byte units instead of in GRF units
because the loss of precision would have become a problem in the SIMD
lowering pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will be useful in the SIMD lowering pass to avoid having to
construct a builder object of the known region width just to pass it
as argument to offset(), which doesn't do anything with it other than
taking the builder dispatch_width as region width.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This makes the whole LOAD_PAYLOAD munging unnecessary which simplifies
the code and will allow the optimization to succeed in more cases
independent of whether the LOAD_PAYLOAD instruction can be found or
not.
The following patch is squashed in:
SQUASH: i965/fs: Add basic dataflow check to opt_sampler_eot().
The sampler EOT optimization pass naively assumes that the texturing
instruction provides all the data used by the FB write just because
they're standing next to each other. The least we should be checking
is whether the source and destination regions of the FB write and
texturing instructions match. Without this the previous seemingly
harmless patch would have caused opt_sampler_eot() to misoptimize a
shader from dota-2 causing DCE to eliminate all of its 78 instructions
except for the final sampler EOT message (!).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There are two reasons why this is useful:
- It avoids the introduction of an amount of partial writes emitted
by the SIMD lowering pass to zip and unzip register regions early
during optimization, which can make subsequent optimization less
effective.
- It substantially reduces the burden on the compiler when a large
fraction of the instructions in the program need to be split (e.g.
during SIMD32 builds). Individual halves of split instructions
will be optimized identically (if they can still be optimized at
all), so doing it up front can duplicate the amount of instructions
the optimizer has to deal with which causes the compilation time to
explode in some cases due to the worse-than-linear runtime
behaviour of the back-end.
It seems helpful to re-run a few optimization passes in cases where
any of the lowering passes was able to make progress.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Logical sends are eventually lowered into a series of copies so they
can take almost anything as source.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will prevent some shader-db regressions when we start plumbing
logical sends through the optimizer.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will let the optimizer know that the sample mask value is unused
so its definition can be DCE'ed.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This partially fixes CTS test:
GL44-CTS.enhanced_layouts.xfb_get_program_resource_api
The test now fails at a tes evaluation shader with unsized output arrays.
The ARB_enhanced_layouts spec says:
"It is a compile-time error to apply xfb_offset to the declaration of an
unsized array."
So this seems like a bug in the CTS.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The spec says gl_NextBuffer and gl_SkipComponents need to be
returned to userspace in the program interface queries.
We currently throw those away, this requires a complete piglit
run to make sure no drivers fallover due to the extra varyings.
This fixes:
GL45-CTS.program_interface_query.transform-feedback-built-in
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These types can't be returned.
This fixes:
GL43-CTS.shader_subroutine.subroutines_not_allowed_as_variables_constructors_and_argument_or_return_types
for the return type case.
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This stops the offset being bumped again when and an explicit
alignment has already been applied.
Fixes alignment issues in:
GL44-CTS.enhanced_layouts.uniform_block_alignment
Note the test still fails due to unrelated issues with doubles.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
It appears we were over-allocating these arrays.
Previously we would use nir->num_uniforms directly for scalar
programs, and multiply it by 4 for vec4 programs.
Instead we should have been dividing by 4 in both cases to convert
from bytes to a gl_constant_value count. The size of gl_constant_value
is 4 bytes.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
GL ES 2.0+ does not have a GL_PROGRAM_POINT_SIZE enable, unlike desktop
GL. So we have to go and check the last pre-rasterizer stage to see
whether it outputs a point size or not.
This fixes a number of dEQP tests that use a geometry or tessellation
shader to emit points primitives.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
From the OpenGL 4.5 core spec:
"An INVALID_VALUE error is generated if texture is not zero and level is
not a supported texture level for textarget, as described above."
Other FramebufferTexture functions already do the right thing.
This fixes the main menu in F1 2015.
Cc: 11.1 11.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fix build error with icc.
CXX libswrAVX_la-swr_clear.lo
icpc: command line warning #10006: ignoring unknown option '-Wdelete-non-virtual-dtor'
In file included from ./rasterizer/jitter/jit_api.h(31),
from swr_context.h(30),
from swr_clear.cpp(24):
./rasterizer/common/os.h(135): error: expected an identifier
void _mm256_storeu2_m128i(__m128i *hi, __m128i *lo, __m256i a)
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
This code was used for validating surfaces with compute but now we use
pipe_image_view instead. Anyway, surfaces support should be
re-introduced properly once OpenCL happens.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Constant buffers are aliased between 3D and CP on Fermi, but we should
only invalidate them when a compute shader actually uses CBs and not
all the time after a lauching grid.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This should have the side effect of enabling the ARB_compute_shader
extension on Gen8+ hardware and all Gen7 platforms that didn't
previously expose it (VLV and IVB GT1) due to the number of hardware
threads per subslice being insufficient in SIMD16 mode.
v2: Bump workgroup size limit for GLES too (Jordan).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The do32 INTEL_DEBUG option causes the back-end to try to generate a
SIMD32 program when compiling a compute shader regardless of the
specified compute shader workgroup size, which will be useful for
testing SIMD32 code generation in the most common case in which the
workgroup size doesn't exceed the SIMD16 limit so SIMD32 codegen
wouldn't be automatically enabled.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This replaces the current fs_visitor::no16() interface with
fs_visitor::limit_dispatch_width(), which takes an additional
parameter allowing the caller to specify the maximum dispatch width a
shader can be compiled with.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This was trying to save some one-time init on pre-Gen7 hardware under
the assumption that one would only ever need 1, 2, 4 and 8-wide
registers on those platforms. However nothing guarantees that those
will be the only VGRF sizes used after lowering and optimization. In
some cases we may end up with a temporary of different size being
allocated (e.g. by SIMD lowering to zip or unzip a multi-component
register region of a logical send instruction), and there is no
guarantee that they will be optimized away before register allocation
(especially since the compute_to_mrf coalescing pass is
rather... lacking...). Instead just allocate classes for all possible
VGRF sizes up to MAX_VGRF_SIZE to avoid a crash in pq_test() when we
encounter a variable of any other size.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The Gen5+ sampler message payload construction code steps through the
coordinate and derivative components by induction like 'coordinate =
offset(coordinate, bld, 1)', the problem is that while doing that it
may step one past the end of the coordinate vector causing an
assertion failure in offset() if it happens to be a (single component)
immediate. Right now coordinates and derivatives are typically passed
as actual registers but that will no longer be the case when we start
propagating constants into logical messages.
Instead express coordinate components in closed form like
'offset(coordinate, bld, i)' -- The end result seems slightly more
readable that way and it allows passing the coordinate and derivative
registers by const reference instead of by value, so it seems like a
clean-up in its own right.
v2: Fold a few post-increment operators into the last MOV
statement. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is more fallout from cf375a3333.
It's possible for multiple ACP entries to interfere with a given VGRF
write, so we need to continue iterating even if an overlapping entry
has already been found.
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The conditional mod of these instructions determines the semantics of
the comparison itself (rather than being evaluated based on the result
of the instruction as is usually the case for most other instructions
that allow conditional mods), so it's in general not legal to
propagate a conditional mod into a CMP instruction. This prevents
cmod propagation from (mis)optimizing:
cmp.z.f0 tmp, ...
mov.z.f0 null, tmp
into:
cmp.z.f0 tmp, ...
which gives the negation of the flag result of the original sequence.
I could reproduce this easily with SIMD32 but I don't see any reason
why the problem would be SIMD32-specific, it was most likely working
by luck.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The pass is disabled in SIMD16 dispatch mode for the same reason, it
cannot handle instructions that write multiple MRF registers at once.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This doesn't actually handle the FS case, just add an assertion for
the moment so I don't forget to update it later on for SIMD32 fragment
shader dispatch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
brw_null_vec() cannot handle widths over 16 but it doesn't really
matter what width we specify for null registers because destination
regions have no width field at the hardware level.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
horiz_offset() is able to deal with a superset of the register files
currently special-cased in half(). Just call horiz_offset() in all
cases.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This prevents false dependencies from being created between
instructions that write disjoint 8-bit portions of the flag register
and OTOH should make sure that the scheduler considers dependencies
between instructions that write or read multiple flag subregisters
at once (e.g. 32-wide predication or conditional mods).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is required for correctness in presence of multiple 8-wide flag
writes (e.g. 8-wide instructions with a conditional mod set) which
update a different portion of the same 16-bit flag subregister. Right
now we keep track of flag dataflow with 16-bit granularity and
consider flag writes to have killed any previous definition of the
same subregister even if the write was less than 16 channels wide,
which can cause live flag register updates to be dead code-eliminated
incorrectly.
Additionally this makes sure that we handle 32-wide flag writes and
reads which may span multiple flag subregisters so the current
approach of just setting/testing a single bit from the live set
wouldn't have worked.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This generalizes the current fs_inst::force_sechalf flag to allow
specifying channel enable groups other than 0 or 8. At some point it
will likely make sense to fix the vec4 generator to support arbitrary
execution groups and then move the definition of fs_inst::group into
backend_instruction (e.g. so we can do FP64 in the VEC4 back-end).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Alternatively we could have extended the current semantics to 32-wide
mode by changing brw_broadcast() to emit multiple indexed MOV
instructions in the generator copying the selected value to all
destination registers, but it seemed rather silly to waste EU cycles
unnecessarily copying the exact same value 32 times in the GRF.
The vstride change in the Align16 path is required to avoid assertions
in validate_reg() since the change causes the execution size of the
MOV and SEL instructions to be equal to the source region width.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This makes FIND_LIVE_CHANNEL behave like a normal instruction for
non-zero quarter control. On Gen8+ we just leave the quarter control
field of the emitted FBL instruction set to the default value so the
hardware applies the expected shift to the execution mask signals. On
Gen7 we apply the offset manually by specifying a non-zero subregister
offset in the source region of the FBL instruction.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Due to a Gen7-specific hardware bug native 32-wide instructions get
the lower 16 bits of the execution mask applied incorrectly to both
halves of the instruction, so the MOV trick we currently use wouldn't
work. Instead emit multiple 16-wide MOV instructions in 32-wide mode
in order to cover the whole execution mask.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The hardware has messages that can write 32 32bit components at once
but the channel enable mask gets messed up. We need to split them
into several 16-wide scratch writes for the channel enables to be
applied correctly. The SIMD lowering pass cannot be used for this
because scratch writes are emitted rather late during register
allocation long after SIMD lowering has been done.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Gen7 hardware expects the block size field in the message descriptor
to be the number of registers minus one instead of the log2 of the
number of registers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We don't want to emit a 32-wide send message in 32-wide programs. The
memory fence message should have the same effect regardless of the
execution size (as long as it's valid) so just set it to one.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In SIMD32 programs the compiler is responsible for providing the
appropriate half of the sample mask in the message header, so the
first and third quarters both map to the first slot group of the
provided 16-bit half, while the second and fourth quarters map to the
second slot group -- IOW they should be equivalent to 1Q and 2Q modulo
two.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Most of these are bugs because the intended execution size of an
instruction and the dispatch width of the shader aren't necessarily
the same (especially in SIMD32 programs).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This was kind of an abuse of p->compressed, dataport send message
instructions are always uncompressed. Use the current execution size
instead since p->compressed is on its way out.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This gets IF and DO instructions working in SIMD32 programs. brw_IF()
and brw_DO() should probably behave in the same way as other generator
functions that emit control flow instructions and just figure out the
right execution size by themselves from the current execution controls
specified through the brw_codegen argument. Changing that will
require updating lots of Gen4-5 clipper code though, so for the moment
just pass the current value redundantly from the FS generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
p->compressed won't work for SIMD32, we should just be using the
execution size value specified via p->current instead.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Instead of just halving the execution size when the instruction is
compressed hoping that it will give a legal source region width, we
can calculate the maximum legal width value in closed form from the
component size and stride. This makes sure that brw_reg_from_fs_reg()
always returns a valid hardware region even for virtual 32-wide
instructions (e.g. send-like instructions) that would seem to exceed
the hardware region width limit after halving.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Curro is planning to eliminate p->compressed, so let's avoid using it
here and just pass in the value directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
[ Francisco Jerez: Pass boolean flag instead of brw_compression enum. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
By using the new compression/group control interface. This will allow
easier extension to support arbitrary channel enable groups at the IR
level.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The right value is dependent on the specific IR instruction being
generated so it has to be reset in every iteration of the loop anyway.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Most of these were resetting quarter control to zero incorrectly even
though everything they needed to do was disable instruction
compression -- The brw_SAMPLE() case was doing the right thing but it
can be simplified slightly by using the new compression control
interface.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This implements some simple helper functions that can be used to
specify the group of channel enable signals and compression enable
that apply to a brw_inst instruction.
It's intended to replace brw_set_default_compression_control
eventually because the current interface has a number of shortcomings
inherited from the Gen-4-5-centric representation of compression and
group controls as a single non-orthogonal enum: On the one hand it
doesn't work for specifying arbitrary group controls other than 1Q and
2Q, which are frequently useful in SIMD32 and FP64 programs. On the
other hand the current interface forces you to update the compression
*and* group controls simultaneously, which has been the source of a
number of generator bugs (a bunch of them fixed in this series),
because in many cases we would end up resetting the group controls to
zero inadvertently even though everything we wanted to do was disable
instruction compression -- The latter seems especially unfortunate on
Gen6+ hardware which have no explicit compression control, so we would
end up bashing the quarter control field of the instruction for no
benefit.
Instead of a single function that updates both at the same time
introduce separate interfaces to update one or the other independently
preserving the current value of the other (which typically comes from
the back-end IR so it has to be respected).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
These can be easily represented in the IR as a MOV instruction with
strided source so they seem rather redundant.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Intended as a (partial) inverse of type_sz(). Will be useful in the
next commit and some other SIMD32 generator changes I have queued up.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently the generator code for most opcodes honours the default
access mode (which should typically be Align1 in the scalar back-end),
but generate_code() doesn't set it explicitly which means that the
access mode from a previous instruction could leak into the following
ones if you did something special and weren't careful enough to save
and restore the previous access mode.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Most of this wouldn't have worked for SIMD32 and had various
dispatch_width and compression control bugs. It's mostly dead now
with SIMD lowering of math instructions turned on in the compiler.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Which is 16 or 8 in most cases. This will make sure that 32-wide
virtual instructions get chopped up into chunks of their maximum
execution size.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Only per-channel LOAD_PAYLOAD instructions can be lowered, which
should cover everything that comes in from the front-end.
LOAD_PAYLOAD instructions used to construct actual message payloads
cannot be easily lowered because they contain headers and vectors of
variable type that aren't necessarily channel-aligned -- We shouldn't
find any of them in the program at SIMD lowering time though because
they're introduced during logical send lowering.
An alternative that may be worth considering would be to re-run the
SIMD lowering pass after LOAD_PAYLOAD lowering instead of this patch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
...on hardware lacking compressed Align16 support. Will allow
simplifying the generator code and fixing it for SIMD32 codegen.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We shouldn't encounter these right now but if we did it wouldn't be
possible for the SIMD lowering pass to split it into multiple
instructions because of its side effects on control flow, so just
assert in order to kill the program.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This change addresses a number of hardware restrictions on the source
and destination regions and other execution controls of regular
FPU-like instructions that in some cases can be avoided by reducing
the execution size of the instruction. Some of these restrictions
(e.g. the one about 3src instructions not supporting compression on
some hardware) are currently being worked around case by case in the
generator with ad-hoc splitting code that is buggy in several ways
(e.g. doesn't handle non-trivial execution controls which would break
SIMD32 code), but it seems cleaner to implement as many restrictions
as we can in a single lowering pass since that will allow us to
simplify some of the surrounding code considerably and also make sure
that we don't forget applying them in the future.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This teaches the SIMD lowering pass about the hardware limits on the
execution size of math instructions, which will allow simplifying the
generator code and at the same time get rid of a number of bugs in the
manual SIMD unrolling done currently that prevent SIMD32 codegen from
working.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Seems like this texturing opcode was missing its logical counterpart
which would prevent it from taking advantage of the SIMD lowering
infrastructure, define it and plumb it through the back-end. At some
point we'll likely want to emit a single SAMPLEINFO message shared
among all channels irrespective of this change, but for the moment
this should be enough to get the intrinsic working in SIMD32 mode.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The benefit is we will be able to use the SIMD lowering pass to unroll
math instructions of unsupported width and then remove some cruft from
the generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This was causing the scheduler to be rather optimistic about the
latency of pull constant opcodes on Gen7+. This might seem to
increase the cycle count estimate calculated by the scheduler itself
for some shaders, even though the actual cycle count should actually
be decreased.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For consistency with the Gen7 variant. I'm not doing the same to the
uniform pull constant message at this point because the non-GEN7 one
is still overloaded to be either an expression-like logical
instruction or a Gen4-specific physical send message.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Varying pull constant loads inherit the same limitation of pre-ILK
hardware that requires expanding SIMD8 texel fetch instructions to
SIMD16, we can deal with pull constant loads in the same way it's done
for texturing during SIMD lowering.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will allow the SIMD lowering pass to split 32-wide varying pull
constant loads (not natively supported by the hardware) into 16-wide
instructions.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The case where the source type of the instruction is smaller than the
immediate type could be handled by calculating the portion of the
immediate read by the instruction (assuming that the source channels
are aligned with the destination channels of the copy) and then
representing the same value as an immediate of the source type
(assuming such an immediate type exists), but the code below doesn't
do that, so just bail for the moment.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If the LOAD_PAYLOAD instruction only has header sources it's possible
for the number of registers written to be less than or equal to the
SIMD component size, in which case it would take the single-MOV path
at the bottom which would cause the channel enable masks to be applied
incorrectly to the header contents and/or cause it to write past the
end of the allocated temporary. If the instruction is either
LOAD_PAYLOAD or doesn't write exactly one component the MOV path is
going to mess up the program so just don't use it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If the source value is going to the same for all SIMD-lowered chunks
of the instruction there should be no need to unzip the value into
multiple temporary registers one for each lowered chunk. As a side
effect this fixes SIMD lowering of instructions with a vector
immediate source. In the long term it *might* still be worth fixing
offset() to handle vector immediates correctly though, this should be
good enough for the moment.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This was introduced in cf375a3333 but
the blame is mine because the pseudocode I sent in my review comment
for the original patch suggesting to do things this way already had
the off-by-one error. This may have caused copy propagation to be
unnecessarily strict while checking whether VGRF writes interfere with
any ACP entries and possibly miss valid optimization opportunities in
cases where multiple copy instructions write sequential locations of
the same VGRF.
Cc: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This way, if you have other cards installed, the Vulkan driver will still
work. No guarantees about WSI working correctly but offscreen should at
least work.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95537
At one point in time, we may have used the mapping to ISL_FORMAT_RAW for
certain buffer surfaces but that time has long since passed. This fixes a
bug where doing format queries on VK_FORMAT_UNDEFINED would assert-fail.
In d8347f12ea, we added support for
skipping SIMD8 generation when the program local size is too large for
SIMD8 to be usable. This change was missed in that commit.
This bug would impact gen7 platforms when the compute shader local
size is greater than 512, and gen8 platforms when the local size is
greater than 448.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Also, we don't actually need it for clipping because meta always colors
inside the lines and, for all other operations, the user is required to set
a scissor. Since DRAWING_RECTANGLE stalls the GPU, we want to emit it as
little as possible.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is in contrast to emitting it directly in vkCmdPipelineBarrier. This
has a couple of advantages. First, it means that no matter how many
vkCmdPipelineBarrier calls the application strings together it gets one or
two PIPE_CONTROLs. Second, it allow us to better track when we need to do
stalls because we can flag when a flush has happened and we need a stall.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Instead of blasting it out as part of the pipeline, we put it in the
command buffer and only blast it out when it's really needed. Since the
PUSH_CONSTANT_ALLOC commands aren't pipelined, they immediately cause a
stall which we would like to avoid.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The old code called this on the prelinked shader list,
but at this point we have the linked shader, so we should
call the interface on that alone.
This fixes a regression in:
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.13
introduced in
5b2675093e
glsl: handle implicit sized arrays in ssbo
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96228
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reported-by: Mark James
Signed-off-by: Dave Airlie <airlied@redhat.com>
Now that we have the better nir_foreach_block macro, there's no reason to
use the archaic block version for everything.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Instead of doing a add and then mask out the upper bits, we can
simply do a add with a half wide type (this, of course, assumes
the hw can actually do it...), so we'll get the required zero
in the upper bits automatically.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There were complaints from a mingw build:
u_draw.h:134:14: error: invalid conversion from ‘uint {aka unsigned int}’
to ‘pipe_prim_type’ [-fpermissive]
Reviewed-by: Brian Paul <brianp@vmware.com>
For a load locked, we might not use the first result but the second
result is the predicate result of the locking. In that case the load
splitting logic doesn't apply (which is designed for splitting 128-bit
loads). Instead we take the predicate and move it into the first
position (as having a dead result in first def's position upsets all
sorts of things including RA). Update the emitters to deal with this as
well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
For user-supplied constbufs, fileIndex is 0. In that case, when we
subtract 1, we'll end up loading from constbuf offset -16. This is
illegal, and there are asserts to avoid it. Normally we'd just DCE it,
but no point in generating the instructions if they're not going to be
used.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
SPIR-V specifies that a bunch of stuff gets applied to types. This means
taht a local variable could get, for instance, an array stride. Just
because it's pointless doesn't mean you'll never see it.
Tested with new piglit gl-3.2-adj-prims test.
v2: re-order trisadj and tristripadj code, per Roland.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The unfilled index translator/generator functions should only be
called when the primitive mode is one of the triangle types.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The original mode test was valid before we had GS support.
Regression tested with full piglit run. Though, I don't think we have
any piglit tests that exercise drawing unfilled adjacency primitives.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Only one dEQP io_blocks test fails. This test fails for the same reason
as the match_different_member_struct_names test in a previous commit.
dEQP-GLES31.functional.separate_shader.validation.io_blocks.match_different_member_struct_names
v2: Add to release notes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Fixes the following dEQP tests on SKL:
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_qualifier_vertex_smooth_fragment_flat
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_implicit_explicit_location_1
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_array_element_type
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_qualifier_vertex_flat_fragment_none
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_struct_member_order
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_struct_member_type
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_qualifier_vertex_centroid_fragment_flat
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_array_length
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_type
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_struct_member_precision
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_explicit_location_type
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_qualifier_vertex_flat_fragment_centroid
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_explicit_location
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_qualifier_vertex_flat_fragment_smooth
dEQP-GLES31.functional.separate_shader.validation.varying.mismatch_struct_member_name
It regresses one test:
dEQP-GLES31.functional.separate_shader.validation.varying.match_different_struct_names
Hoever, this test is based on language in the OpenGL ES 3.1 spec that I
believe is incorrect. I have already submitted a spec bug:
https://www.khronos.org/bugzilla/show_bug.cgi?id=1500
v2: Move spec quote about built-in variables to the first place where
it's relevant. Suggested by Alejandro.
v3: Move patch earlier in series, fix rebase issues.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v2]
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> [v2]
The interface type, interpolation mode, precision, the type of the
outermost structure, and whether or not the variable has an explicit
location will be used for SSO validation on OpenGL ES.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
commit 7c8dfa78b9 (i965/draw: Use the real size for vertex
buffers) changed how we programmed the VERTEX_BUFFER_STATE size field.
Previously, we programmed it to the size of the actual underlying BO,
which is page-aligned, and potentially much larger than the GL buffer
object. This violated the ARB_robust_buffer_access spec.
With that change, we started programming it based on the range of data
we expect the draw call to actually access - which is based on the
min_index and max_index information provided to glDrawRangeElements().
Unfortunately, applications often provide inaccurate range information
to glDrawRangeElements(). For example, all the Unreal demos appear to
draw using a range of [0, 3] when the index buffer's actual index range
is [0, 5]. Such results are undefined, and we are absolutely allowed
to restrict access to the range they specified. However, the failure
mode is usually that nothing draws, or misrendering with wild geometry,
which is kind of bad for a common mistake. And people tend to assume
the range information isn't that important when data is in VBOs.
There's no real advantage, either. ARB_robust_buffer_access only
requires us to restrict access to the GL buffer object size, not
the range of data we think they should access. Doing that allows
buggy applications to still function. (Note that we still use this
information for busy-tracking, so if they try to overwrite the data
with glBufferSubData, they'll still hit a bug.) This seems to be
safer.
We may want to provide the more strict range as a debug option,
or scan the VBO and warn against bogus glDrawRangeElements in
debug contexts. That can be done as a later patch, though.
Makes Unreal demos draw again.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Like constant buffers, samplers and textures are aliased on Fermi and
we need to invalidate the state when switching from 3D to CP and vice
versa.
This fixes rendering issues in the UE4 demos.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This brings the final size of an optimized non-debug build of the Vulkan
driver down to 2.9 MB as opposed to 8.7 MB for the dri driver.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Now that the compiler has been completely separated from libmesa, we no
longer need these. We can make the tests much smaller by not linking them
in. This also ensures that anyone who runs make check won't accidentally
put in any dependencies from the compiler to the rest of mesa core.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
They reference the compiler so they shouldn't go in libi965_compiler.la.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
None of them are actually using it. It's a relic of an older compiler
interface that required a gl_program.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
That's where brw_link_shader lives and they seem to go together. Also,
this gets it out of libi965_compiler.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This way it's no longer part of libi965_compiler.la since it depends on
GLSL and ARB program stuff.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Right now libglsl.la depends on libnir.la so putting it in libnir.la
adds a dependency on libglsl.la that goes the wrong direction.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The stated bug describes a scenario in which a post sync write operation for
depth or timestamp can be ignored. There are two workarounds suggested, the
first and easier is to simply do a cs stall when we do these type of writes.
The second option is to do a PIPE_CONTROL flush after the post sync but before
the data is required.
Generally, I believe the data written out is consumed by the application on the
CPU side and so doing the easier of the two is ideal. Furthermore, these queries
aren't tremendously common in the perf sensitive apps I have looked at. However,
there could be cases where a shader stage might directly consume the data, and
as a result option 2 may be desirable.
This patch goes with the easier solution for now.
gen9lp bug_de_id=2137196
By itself, this does *not* fix any of the GT4 hangs we're currently
experiencing.
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The R_028B50_VGT_TESS_DISTRIBUTION value is copied from
amdgpu-pro. Smaller values in the ACCUM fields seem to
decrease the performance advantage from this patch, higher
values don't seem to matter.
v2: Add distribution mode field enums.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Using more than 1 wave per threadgroup does increase performance
generally. Not using too many patches per threadgroup also
increases performance. Both catalyst and amdgpu-pro seem to
use 40 patches as their maximum, but I haven't really seen
any performance increase from limiting the number of patches
to 40 instead of 64.
Note that the trick where we overlap the input and output LDS
does not work anymore as the insertion of the tess factors
changes the patch stride.
v2: - Add comment about LDS assumptions.
- Add constant for buffer size.
- Fix code style.
v3: - Correct limits for not splitting patches between waves.
- Set max num_patches to 40 as in the proprietary driver.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The factors may be stored to LDs by another invocation than
the invocation for vertex 0.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This allows running the TES on different CU's than the
TCS which results in performance improvements.
v2: Only write the control word from one invocation.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We always try to use 4-component loads, as LLVM does not combine loads
and they bypass the L1 cache.
We can't use a similar strategy for stores and this is especially
notable with the tess factors, as they are often set with separate
MOV's per component in the TGSI.
We keep storing to LDS and the LDS space, so we can load the outputs
later, either due to the shader, of for wrting the tess factors.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We need to copy the VS outputs to memory. I decided to do this
using a shader key, as the value depends on other shaders.
I also switch the fixed function TCS over to monolithic, as
otherwisze many of the user SGPR's need to be passed to the
epilog, which increases register pressure, or complexity to
avoid that. The main body of the fixed function TCS is not
that interesting to precompile anyway, since we do it on
demand and it is very small.
v2: Use u_bit_scan64.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Instead of creating a memory area per patch and per vertex, we put
the same attribute of every vertex & patch together. Most loads
and stores access the same attribute across all lanes, only for
different patches and vertices.
For the TCS this results in tightly packed data for 4-component
stores.
For the TES this is not the case as within a patch the loads
often also access the same vertex. However if there are < 4
vertices/patch, this still results in a reduction of the number
of cache lines. In the LDS situation we only do better than worst
case if the data per patch < 64 bytes, which due to the
tessellation factors is pretty much never.
We do not use hardware swizzling for this. It would slightly reduce
the number of executed VALU instructions, but I had issues with
increased wait times that I haven't been able to solve yet.
Furthermore, the tbuffer_store intrinsic does not support both
VGPR offset and an index, so we have a problem storing
indirectly indexed outputs. This can be solved by temporarily
storing arrays in LDS and then copying them, but I don't think
that is worth the effort. The difference in VALU cycles
hardware swizzling gives is about 0.2% of total busy cycles.
That is without handling the array case.
I chose for attributes instead of components as they are often
accessed together, and the software swizzling takes VALU cycles
for calculating offsets.
v2: - Rename functions to get_tcs_tes_buffer_address.
- multiply by 16 as late as possible.
- Use tgsi_full_src_register_from_dst.
- Remove some bad comments.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: - Use llvm.admgcn.buffer.load instrinsics for new LLVM.
- Code style fixes.
v3: - Code style fix.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The buffer is quite large, but should only be allocated if the
application uses tessellation. Most non-games don't.
v2: - Use the correct register for SI.
- Add define for block size.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
CID 1271532 (#1 of 1): Out-of-bounds read (OVERRUN)34. overrun-local:
Overrunning array of 2 16-byte elements at element index 2 (byte offset
32) by dereferencing pointer &inst.Dst[i].
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Not sure why coverity calls this an out-of-bounds read vs out-of-bounds
write.
CID 1358920 (#1 of 1): Out-of-bounds read (OVERRUN)9. overrun-local:
Overrunning array r of 3 16-byte elements at element index 3 (byte
offset 48) using index chan (which evaluates to 3).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
XY_FAST_COPY_BLT command doesn't have a field for raster operation. So, fall
back to using XY_SRC_COPY_BLT to handle those cases.
Fixes piglit test gl-1.1-xor-copypixels when fast copy blit is enabled
for all tiling formats.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Experimentation with different values of src/dst horizontal/vertical
alignment showed that these fileds are not used on gen9 hardware.
A recent update in graphics specs has removed these fields from
XY_FAST_COPY_BLT command.
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Chad Versace <chad.versace@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
To read out MP perf counters we use a compute shader and need to upload
input data like a 64-bits addr used to store the values and a sequence
ID for synchronization. Currently, this input data is uploaded as user
uniforms which means that it's sticked to c0[], but if a compute shader
from a real application is used, monitoring those performance counters
will just overwrite some data and miserably crash.
Instead, sticking the 64-bits addr and the sequence into the driver
constant buffer seems like much better and will allow to monitor
counters with GL 4.3 apps.
Tested on GF119 and GK110, but should not hurt anything on GK104.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
eglCreatePbufferSurface should generate an EGL_BAD_MATCH error if:
1: The EGL_TEXTURE_FORMAT attribute is EGL_NO_TEXTURE and EGL_TEXTURE_TARGET
is something other than EGL_NO_TEXTURE
2: EGL_TEXTURE_FORMAT is something other than EGL_NO_TEXTURE and
EGL_TEXTURE_TARGET is EGL_NO_TEXTURE.
This fixes the dEQP-EGL.functional.negative_api.create_pbuffer_surface test.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Example:
Gallium 0.4 on AMD TONGA (DRM 3.2.0 / 4.5.0, LLVM 3.9.0)
My kernel version is pretty long already (4.5.0-amd-01025-g32791c1)
and adding "kernel" into the string would make too it long for glxinfo
to display.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Ported from the initial amdgpu winsys from the private AMD branch.
The thread creates the buffer list, submits IBs, and cleans up
the submission context, which can also destroy buffers.
3-5% reduction in CPU overhead is expected for apps submitting a lot
of IBs per frame. This is most visible with DMA IBs.
v2: use a semaphore instead of a busy loop in amdgpu_ws_queue_cs
add another amdgpu_cs_sync_flush call into amdgpu_bo_map
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Not piping this all the way through yet, but no better place to note
this down. This will can be used with NV_viewport_array2.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
For fermi, this likely will require use of linked tsc mode. However on
bindless architectures, we can have as many as we want. As it stands,
the AUX_TEX_INFO has 32 teture handles reserved.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
It executes compiler-glsl on all the available shaders, and it checks
that the outcome is the expected.
Bash code based on the already existing optimization-test
v2: rebasing: use --version option
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Add an option in order to ask to just print the InfoLog, without any
header or separator. Useful if we want to use the standalone compiler
to track only the warning/error messages.
v2: all printfs goes on its own line (Ian Romanick)
v3: rebasing: move just_log to standalone.h/cpp
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It silence by default warnings with function parameters, as the
parameters need to be processed in order to have the actual and the
formal parameter, and the function signature. Then it raises the
warning if needed at verify_parameter_modes where other in/out/inout modes
checks are done.
v2: fix comment style, multi-line condition style, simplify check,
remove extra blank (Ian Romanick)
v3: inout function parameters can raise the warning too (Ian
Romanick)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Just to allow to call set_is_lhs on any ast_node without a casting. Useful
when processing a ast_node list that we know it contain ast_expression.
v2: comment out new_value to avoid unused parameter warning (Ian Romanick)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The current code disallows unsized arrays except at the end of
an SSBO but it is a bit overzealous in doing so.
struct a {
int b[];
int f[4];
};
is valid as long as b is implicitly sized within the shader,
i.e. it is accessed only by integer indices.
I've submitted some piglit tests to test for this.
This also has no regressions on piglit on my Haswell.
This fixes:
GL45-CTS.shader_storage_buffer_object.basic-syntax
GL45-CTS.shader_storage_buffer_object.basic-syntaxSSO
This patch moves a chunk of the linker code down, so
that we don't link the uniform blocks until after we've
merged all the variables. The logic went something like:
Removing the checks for last ssbo member unsized from
the compiler and into the linker, meant doing the check
in the link_uniform_blocks code. However to do that the
array sizing had to happen first, so we knew that the
only unsized arrays were in the last block. But array
sizing required the variable to be merged, otherwise
you'd get two different array sizes in different
version of two variables, and one would get lost
when merged. So the solution was to move array sizing
up, after variable merging, but before uniform block
visiting.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes:
GL44-CTS.tessellation_shader.tessellation_control_to_tessellation_evaluation.data_pass_through
As the OUT_TC interface structures weren't matching because
one of them had explicit_xfb_buffer set when it shouldn't.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Indexed primitives were always using cut-aware primitive assembly,
whether primitive_restart was enabled or not. Correctly pass down
primitive_restart and select optimized PA when possible.
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
For now, only enable it on platforms that actually support ETC2.
At this point, Broadwell is only failing 5 (out of 8358) dEQP tests:
dEQP-GLES31.functional.copy_image.non_compressed.viewclass_32_bits.
srgb8_alpha8_r11f_g11f_b10f.renderbuffer_to_texture3d
srgb8_alpha8_rgb10_a2ui.renderbuffer_to_cubemap
srgb8_alpha8_rgb10_a2ui.renderbuffer_to_renderbuffer
srgb8_alpha8_rgb10_a2.renderbuffer_to_texture2d
srgb8_alpha8_rgb9_e5.renderbuffer_to_texture3d
These fail with all methods (meta, blorp, blitter, memcpy).
All are blacklisted from the Android mustpass list, which makes me
wonder whether there's an issue with the tests. The formats in
question work with other targets, and the targets in question work
with other formats...
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
The BLT can't handle S8 because it's W-tiled (at least without
additional funny business, and I'm not sure we care). Disallow
it so it falls back to the CPU path, which works.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Currently, it only contains the BLT/CPU fallbacks, so the name is a bit
too generic. But eventually this will use BLORP as well, at which point
the name will make more sense.
The next patch will introduce a second call.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
This simplifies things a little - now we only have one (tex or rb?)
if-ladder for src, and a second for dst, rather than four.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Fixes Piglit's arb_copy_image-texview test with the Meta path disabled
(so we hit the blitter/CPU fallback paths).
v2: Add MinLayer even for cube maps (suggested by Ilia).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Use glsl/libstandalone.la to add support for taking glsl src files (in
addition to .tgsi) as input. Then glsl->nir and feed the result into
the ir3 backend as normal.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Split standalone glsl_compiler into a libstandalone.la and a thin
main.cpp. This way drivers can re-use the glsl standalone frontend in
their own standalone compilers.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The GALLIUM_HUD does not yet expose a description for each events, but
this might be useful for developers who want to have a long description
of hw perf counters directly in the source code.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
To fix MSVC build. Any function which goes into the dispatch table
needs to have the GLAPIENTRY (__stdcall) tag.
Reviewed-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
This text transformation was done automatically via the following shell
command:
$ find -name SCons\* -exec sed -i s/\\s\\+$// '{}' \;
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The nested declaration of 'height' shadows a parameter and uses
uninitialized memory. Fix by renaming to 'plane_height' which also makes
the code clearer.
This would typically break the bo size computation, but we don't use
that except when mmaping, and we don't mmap YUV buffers much.
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Reported-by: Mathias Fröhlich <Mathias.Froehlich@gmx.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GL_KHR_robustness adds the GL_CONTEXT_LOST error and five new entry
points that we already implement. This patch adds a new dispatch table
that returns GL_CONTEXT_LOST from all entry points and implements the
GL_LOSE_CONTEXT_ON_RESET strategy by setting that table when we learn
that we've lost the context.
With the GL_CONTEXT_LOST reporting in place and dispatch for the new
entry points we can turn on GL_KHR_robustness.
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Print "GEOM" instead of "2", for example.
v2: also update the text parsing code, per Ilia.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
From time to time we have had cases where glslang has added a decoration we
don't handle and it has caused problems. This audit ensures that, for
every decoration, we either handle it or hit an unreachable() with an
accurate description of why we don't have to.
It appears that UV immediates aren't working on Ivy Bridge. In this
case, a signed version will work, and this fixes the piglit
tests/spec/glsl-4.50/execution/helper-invocation.shader_test test.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This allows clear and easy communication between the two.
Caller: Requesting information (struct vN)
Callee: I know how to deal with older version (vN-1) only. Here is your
data and the version I support.
Caller: Older version ? Sure I'll cap all access to the fields provided
by the older version (vN-1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
One cannot use a single version to control both export_in and export_out
versions. Using this forces us to always extend/bump both structs at the
same time.
An alternative scheme is coming with next patch.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Since we only need partial information about the GLX symbols we can
forward declare them and drop the include. Obviously each user of the
said API will needs more than what's provides, so they'll include the
GLX header.
If they don't, the compiler will give us a nice warning ;-)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
These come from windows.h, gl.h, glcorearb.h and/or glext.h.
The interop interface is aimed at non-Windows platforms while the macros
are used/derived due to Windows specifics. Thus we can safely remove
them.
Strictly speaking there should be GLXAPIENTRY/EGLAPIENTRY and alike
macros, although a) there is no GLX ones and b) this brings us even
further from decoupling the file from the GLX/EGL header dependency.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Thus we can preserve the ABI, while avoiding the inclusion of some/all
of the following:
EGL/egl.h
GL/gl.h
GL/glcorearb.h
GLES/gl.h
GLES2/gl2.h
GLES3/gl3.h
GLES3/gl31.h
This will allow us to build/use it alongside any combination of APIs.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
The GL_OES_geometry_shader work is on the oes_shader_io_blocks branch
of idr's fd.o repository.
The GL_OES_tessellation_shader work is on the tess-gles branch
of kwg's fd.o repository.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Add weak symbol notation for the pthread_mutexattr* symbols, thus making
the linker happy. When building with -O1 or greater the optimiser will
kick in and remove the said functions as they are dead/unreachable code.
Ideally we'll enable the optimisations locally, yet that does not seem
to work atm.
v2: Add the AX_GCC_FUNC_ATTRIBUTE([weak]) hunk in configure.
Cc: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
1. Don't clear bucket descriptions to fix issues with sim level
buckets getting out of sync.
2. Close out threadviz file descriptors in ClearThreads().
3. Skip buckets for jitter based buckets when multithreaded. We need
thread local storage through llvm jit functions to be fixed before
we can enable this.
4. Fix buckets StopCapture to correctly detect capture complete.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Lift the resctriction we had before and allow creation of images with
multiple planes. We still require all the planes to be within the same
bo.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This function now only creates the mt and we then call
intel_set_texture_image_mt() in intel_image_target_texture_2d() to set
it for the texture image.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Create the mt for the drawable bo directly and call our new
intel_miptree_create_for_bo() helper instead.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This factors out the work of setting up a miptree as the backing for a
texture image into a new helper.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This lowers sampling from YUV textures to 1) one or more texture
instructions to sample each plane and 2) color space conversion to RGB.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Otherwise, if the call executes normally we'll hit an assertion later
in the VBO code when we draw something. Note that these cases were
already handled correctly for the glIsEnabled() function (and the API
checks were copied from there).
Tested with new piglit gl-3.1-enable-vertex-array test.
v2: fix compat/es mix-up, per Ilia.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
We added support for Android build using autotools (configure),
update the documentation to reflect that.
Signed-off-by: Nicolas Boichat <drinkcat@google.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
For double-precision vertex inputs we need to measure them in dvec4
terms, and for single-precision vertex inputs we need to measure them in
vec4 terms.
For the later case, we use type_size_vec4() function. For the former
case, we had a wrong implementation based on type_size_vec4().
This commit introduces a proper type_size_dvec4() function, that we use
to measure vertex inputs.
Measuring double-precision vertex inputs as dvec4 is required because
ARB_vertex_attrib_64bit states that these uses the same number of
locations than the single-precision version. That is, two consecutives
dvec4 would be located in location "x" and location "x+1", not "x+2".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Images aren't supported on maxwell, but neither is tessellation. Don't
overly confuse matters by trying to expose those subtleties in the
GL3.txt file/relnotes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Dave Airlie <airlied@redhat.com>
This extension appears to be a strict subset of the ARB version. Also
remove it from GL3.txt since it doesn't seem relevant.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
V2: fix error checking for arrays and components. V1 was
only taking into account all the array elements and all the
components of one of the varyings during the comparision
and treating the other as a single slot/component.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We apparently pass all the relevant CTS tests. There are probably some
shortcomings, but they can be addressed down the line.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is just a copy-and-paste from brw_surface_formats.c. For the
supports_vertex_fetch function, we do a bit more work so that it properly
handles Bay Trail.
This prevents array overflow when the block is actually an array of UBOs or
SSBOs. On some hardware such as i965, such overflows can cause GPU hangs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we were using the size of the whole BO which may be
substantially larger than the actual index buffer size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we were using the size of the BO which may be substantially
larger than the actual vertex buffer size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For a long time, several of the 3-channel vertex formats didn't exist so we
faked them with 4-channel versions. Starting with Sandy Bridge, we can use
R16G16B16_FLOAT and 8 and 16-bit integer formats become available on
Haswell and Bay Trail.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bay Trail and Haswell added a bunch of new vertex formats. There was also
the addition of 64-bit passthrough formats for BDW+.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The old code always divided rounded down and then subtracted 1. What we
wanted was to divide rounded up and then subtract 1 which is equivalent to
subtracting 1 and then dividing rounded down.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we only handled the "I don't know what's going on" case for
things with InstanceDivisor == 0. However, in the DrawIndirect case we can
get num_instances == 0 and we don't know what's going on with the instanced
ones either. This commit makes the worst-case bound the default and then
conservatively tightens the bound.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The previous code got the BO the first time we encountered it. However,
this can potentially lead to problems if the BO is used for multiple arrays
with the same buffer object because the range we declare as busy may not be
quite right. By delaying the call to intel_bufferobj_buffer, we can ensure
that we have the full range for the given buffer.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The vbo layer passes an index_bounds_valid flag that we should be using
instead. This also fixes a bug when min_index == -1 and basevertex != 0
where we were actually comparing min_index + basevertex == -1 which was
false and we were getting the wrong buffer-sizing path.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Right now, we're just setting the range to [0, MAX_UINT32] which, while
correct isn't helpful. With DrawIndirect, you can't really know what the
actual range is so we may as well flag it as being an invalid range. This
is what we do for draws with index buffer which is similar (the indices
aren't statically known) if a bit simpler.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we would fail to find a match for the second half of a
dvec4 as 'i' would get incremented to 1 before we added the var to
the array at component 0.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The last version of this broke clipping, and I had to spend
sometime getting this working properly.
I had to introduce a third pass to count the clip/cull totals,
all due to one messy corner case. We have a piglit test
tes-input-gl_ClipDistance.shader_test
that doesn't actually output the clip distances, it just passes
them like a varying from TCS->TES, the older lowering pass worked
but to lower clip/cull we need to know the total number of clip+culls
used to defined the new variable correctly, and to offset culls
properly.
This adds an extra pass that works out the sizes for clip/cull,
then lowers gl_ClipDistance then gl_CullDistance into the new
gl_ClipDistanceMESA.
The pass checks using the fixed array sizes code if they array
has been referenced, or is actually never used, and ignores
it in the latter case.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes a bug that breaks cull distances. The problem
is the max array accessors can't tell the difference between
an never accessed unsized array and an accessed at location 0
unsized array. This leads to converting an undeclared unused
gl_ClipDistance inside or outside gl_PerVertex to a size 1
array. However we need to the number of active clip distances
to work out the starting point for the cull distances, and
this offset by one when it's not being used isn't possible
to distinguish from the case were only the first element is
accessed. I tried to use ->used for this, but that doesn't
work when gl_ClipDistance is part of an interface block.
So this changes things so that max_array_access is an int
and initialised to -1. This also allows unsized arrays to
proceed further than that could before, but we really shouldn't
mind as they will get eliminated if nothing uses them later.
For initialised uniforms we no longer change their array
size at runtime, if these are unused they will get eliminated
eventually.
v2: use ralloc_array (Ilia)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In agreement with the SNB PRM, alpha blending is a property that render
targets may or may not support.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This format does not support alpha blending, according to the SNB PRM.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When we have the geometry extensions, enable querying of the new param.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
OES_geometry_shader has wording to allow xfb when using Draw*Indirect
and DrawElements.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When passing in GL_RGBA or other base formats, we will try to upgrade
the format to whatever the passed in type was. However not all drivers
(notably nv30) support 32F textures, and so this would lead to crashes
down the line. Only upgrade when the relevant extensions are available.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Otherwise we still have TGSI_OPCODE_CMP's info, which causes a number of
later logic to go wrong. This fixes
dEQP-GLES2.functional.shaders.functions.control_flow.return_in_if_vertex
on nv30.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The mode should stay the same as the original struct. In
particular, shared should not be changed to temporary.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Technically, this was introduced with GL 4.4. However, I believe it
was intended to be retroactive. As far as I know, AMD has never
supported primitive restart with patches, while NVidia and Intel do.
This necessitated the need for a query which would allow applications
to figure out whether this was usable or not.
I decided to expose it everywhere ARB_tessellation_shader is exposed.
(It's also in both OES and EXT_tessellation_shader.)
Enable this for i965 and Gallium drivers which expose the capability.
v2: Fix a bug in the state_tracker code (caught by Ilia Mirkin).
Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=10364
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Some hardware supports primitive restart on patch primitives, and other
hardware does not. Modern GL and ES include a query for this feature;
adding a capability bit will allow us to answer it.
As far as I know, AMD hardware does not support this feature, while
NVIDIA and Intel hardware does. However, most Gallium drivers do not
appear to support tessellation shaders yet. So, I've enabled it for
nvc0 and disabled it everywhere else.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This lets the rest of the backend know that the uniform pull constant
load opcodes don't respect channel enables -- Without this the
register allocator has no way to know that the return payload of a
pull constant load is not per-channel and spills of the destination
will be broken under non-uniform control flow.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently the spilling code attempts to guess the scratch message
block size from the dispatch width of the shader, which is plain wrong
for SIMD-lowered instructions (frequently but not exclusively
encountered in SIMD32 shaders) or for instructions with register
region data types of size other than 32 bit.
Instead try to use the SIMD component size of the instruction which in
some cases will allow the dataport to apply the correct channel mask
to the scratch data read or written. In the spill case the block size
needs to be clamped to the number of MRF registers reserved for
spilling. In the unspill case I didn't even bother because we
currently have no 100% accurate way to determine whether a source
region is per-channel or whether it contains things like headers that
don't respect channel boundaries -- That's fine, because the unspill
is marked force_writemask_all we can just use the largest allowable
scratch message size.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This prevents the application of an incorrect channel mask by the
scratch write instruction for spilled variables that don't have an
exact one-to-one correspondence between channels of the variable and
32-bit components of the scratch write instruction.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This makes sure that unspills restore the exact contents of the
variable in scratch space into the GRF without applying channel
masking, which is incorrect under control flow for things like message
headers or vectors of heterogeneous types that don't properly respect
channel boundaries.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This makes emit_(un)spill even more stupid by removing the logic that
decides what execution size each scratch read or write send message
should have and instead relying on the caller to specify an
appropriate execution size via the builder argument. This makes sense
because the caller will need to act differently based on the scratch
message width (e.g. emit an additional unspill before the instruction
if the execution width and channel layout of the spill doesn't match
the instruction's).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This seems cleaner than exposing an implementation detail of
brw_fs_reg_allocate.cpp to the world, and will give the caller control
over the instruction execution flags (e.g. force_writemask_all) that
are applied to the scratch read and write instructions.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Until now the execution controls (e.g. channel group,
force_writemask_all, exec_size) of the instruction had been completely
ignored by spilling, even though that can lead to a mismatch between
the channel mask applied to the contents of the (un)spilled memory and
the GRF source or destination of the instruction. In some cases we'll
actually want the (un)spill messages to be marked force_writemask_all
regardless of whether the instruction has it set, but that will have
to be handled specially by the caller.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
And as we're at it fix the calculation to allocate a larger block of
registers for 32-wide dispatch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
According to the EGL specifications eglQueryString(EGL_CLIENT_APIS)
should return a string containing a combination of "OpenGL", "OpenGL_ES"
and "OpenVG", any other values would be considered invalid. Due to this
when the API string is constructed, the version of GLES should be
disregarded and "OpenGL_ES" should be attached once instead of
"OpenGL_ES2" and "OpenGL_ES3".
Fixes:
dEQP-EGL.functional.negative_api* and
dEQP-EGL.functional.query_context.simple.query_api
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The variable-indexing tests always had a few random fails, which I
usually couldn't reproduce when running tests manually. Somehow
recently this got a lot worse. I ported a couple of the shaders to
GLES to see what blob does, and it also seems to be avoiding to cp
indirect srcs. So I guess indirect w/ instructions other than cat1
(mov) are not totally reliable. Let's just switch that off until
this is better understood.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Constbufs are only aliased on Fermi and this will reduce the number of
flushes when we switch between 3d and compute.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
/system/vendor/lib/dri/*_dri.so actually depend on libglapi: without
this, loading the so file fails with:
cannot locate symbol "__emutls_v._glapi_tls_Context"
On non-Android (non-bionic) platform, EGL uses the following
workflow, which works fine:
dlopen("libglapi.so", RTLD_LAZY | RTLD_GLOBAL);
dlopen("dri/<driver>_dri.so", RTLD_NOW | RTLD_GLOBAL);
However, bionic does not respect the RTLD_GLOBAL flag, and the dri
library cannot find symbols in libglapi.so, so we need to link
to libglapi.so explicitly. Android.mk already does this.
Signed-off-by: Nicolas Boichat <drinkcat@google.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
[Emil Velikov: s/explicitely/explicitly/]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Add support for EGL android platform.
Also, detect when --host finishes with -android. In that case, we
do not set _GNU_SOURCE, and define autoconf symbol HAVE_ANDROID, so
that Android-specific workarounds can be applied.
Signed-off-by: Nicolas Boichat <drinkcat@google.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
[Emil Velikov: Rebase on top of HAVE_EGL_PLATFORM_NULL removal]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
The build systems already add this as applicable. There's no need to
have this in the source file.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The build systems already add this as applicable. There's no need to
have this in the source file.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The build systems already add this as applicable. There's no need to
have this in the source file.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Bounds of screen are 0 (inclusive) and ScreenCount(dpy) (exclusive).
The upper bound was too ScreenCount(dpy) (inclusive).
This causes a crash invoked by java3d which passes down an invalid
screen:
6 0x00007f0e5198ba70 in <signal handler called> () at /lib64/libc.so.6
7 0x00007f0e14531e14 in glXGetFBConfigs (dpy=<optimized out>, screen=1, nelements=nelements@entry=0x7f0dab3c522c) at glxcmds.c:1660
8 0x00007f0e14532f7f in glXChooseFBConfig (dpy=<optimized out>, screen=<optimized out>, attribList=0x7f0dab3c54e0, nitems=0x7f0dab3c535c) at glxcmds.c:1611
9 0x00007f0e1478d29b in find_S_FBConfigs () at /usr/lib64/libj3dcore-ogl.so
10 0x00007f0e1478d3dc in find_S_S_FBConfigs () at /usr/lib64/libj3dcore-ogl.so
11 0x00007f0e1478d567 in find_AA_S_S_FBConfigs () at /usr/lib64/libj3dcore-ogl.so
12 0x00007f0e1478d728 in find_DB_AA_S_S_FBConfigs () at /usr/lib64/libj3dcore-ogl.so
13 0x00007f0e1478d97c in Java_javax_media_j3d_X11NativeConfigTemplate3D_chooseOglVisual () at /usr/lib64/libj3dcore-ogl.so
While ScreenCount(dpy) is actually 1:
(gdb) p dpy->nscreens
$2 = 1
screen=1 is passed to glXGetFBConfigs.
Fix this typo in glXGetFBConfigs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95456
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Make the function non static so that we can use it directly from the
android platform code.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
This adds map and unmap functions to GBM utilizing the DRIimage extension
mapImage/unmapImage functions or existing internal mapping for dumb
buffers. Unlike prior attempts, this version provides a region to map and
usage flags for the mapping. The operation follows the same semantics as
the gallium transfer_map() function.
This was tested with GBM based gralloc on Android.
Signed-off-by: Rob Herring <robh@kernel.org>
[Emil Velikov: drop no longer relevant hunk from commit message.]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Add pthreadstubs to avoid pulling in full pthreads library. GBM will be the
first user.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
In preparation to add public map/unmap functions, rename the existing
gbm_dri_bo_{map,unmap} functions to indicate that they are only for dumb
buffers.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Implement support for mapImage/unmapImage functions in version 12 of the
DRIimage extension.
Signed-off-by: Rob Herring <robh@kernel.org>
[Emil Velikov: align/indent the map/unmap vfuncs]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Add mapImage and unmapImage functions to DRIimage extension for mapping
and unmapping DRIimages for CPU access. The caller provides the region of
the image to map and is returned a pointer to the beginning of the region
and the stride (which could be different from the original).
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
GBM needs the same special gallium_dri.so loading as EGL for Android, so
copy over the same hunk from the EGL code.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
In preparation to add Android build support, split out the source file
lists to Makefile.sources
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
[Emil Velikov: Whitespace cleanup.]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Move the defining of DEFAULT_DRIVER_DIR path to a common location so both
EGL and GBM can use it.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
If the mutexattrs are the default one can just pass NULL to
pthread_mutex_init. As the compiler does not know this detail it
unnecessarily creates/destroys the attrs.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
From the GL 4.5 core spec, section 11.1.1 (Vertex Attributes):
"A program with more than the value of MAX_VERTEX_ATTRIBS
active attribute variables may fail to link, unless
device-dependent optimizations are able to make the program
fit within available hardware resources. For the purposes
of this test, attribute variables of the type dvec3, dvec4,
dmat2x3, dmat2x4, dmat3, dmat3x4, dmat4x3, and dmat4 may
count as consuming twice as many attributes as equivalent
single-precision types. While these types use the same number
of generic attributes as their single-precision equivalents,
implementations are permitted to consume two single-precision
vectors of internal storage for each three- or four-component
double-precision vector."
This commits makes dvec3, dvec4, dmat2x3, dmat2x4, dmat3, dmat3x4,
dmat4x3 and dmat4 consume twice as many attributes as equivalent
single-precision types.
v3: count doubles as consuming two attributes (Dave Airlie)
v4: make reference to spec (Michael Schellenberger Costa)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
The previous implementation relied on the std140 alignment rules to
avoid handling misalignment in the case where we are loading more than
2 double components from a vector, which requires to emit a second load
message.
This alternative implementation deals with misalignment and is more
flexible going forward.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
I think these are not strictly necessary since the floats in them
should be automatically promoted to doubles when operated with
double sources, but it makes things more explicit at least.
Reviewed-by: Matt Turner <mattst88@gmail.com>
For geometry/compute inputs and tess control outputs, we create
an AST node to keep track of some things. However if we have
multiple layout sections, we don't ever link the node into the AST.
This is because we create the node on the rightmost layout declaration
and don't pass it back in so it gets linked at the end of the parsing
of the rightmost.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The code didn't deal with explicit function indexes properly.
It also handed out the indexes at link time, when we really
need them in the lowering pass to create the correct if ladder.
So this patch moves assigning the non-explicit indexes earlier,
fixes the lowering pass and the lookups to get the correct values.
This fixes a few of:
GL45-CTS.explicit_uniform_location.subroutine-index-*
Signed-off-by: Dave Airlie <airlied@redhat.com>
The code was implementing the ACTIVE_SUBROUTINE_UNIFORMS
incorrectly, using the number of types not the number of
uniforms. This is different than the locations as the
locations may be sparsly allocated.
This fixes:
GL43-CTS.shader_subroutine.four_subroutines_with_two_uniforms
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
GLSL spec says this doesn't generate an error.
Fixes:
GL45-CTS.explicit_uniform_location.subroutine-loc
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes:
GL45-CTS.shader_subroutine.subroutines_with_separate_shader_objects
Since we set the stream flags earlier on all geom shaders, we
shouldn't fall over later if we find one.
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes a crash in:
GL45-CTS.explicit_uniform_location.subroutine-loc-negative-link-max-num-of-locations
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes .length() on subroutine uniform arrays, if
we don't find the identifier normally, we look up the corresponding
subroutine identifier instead.
Fixes:
GL45-CTS.shader_subroutine.arrays_of_arrays_of_uniforms
GL45-CTS.shader_subroutine.arrayed_subroutine_uniforms
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes:
GL45-CTS.explicit_uniform_location.subroutine-index-negative-link-max-num-of-indices
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If a subroutine uniform is declared with no functions backing it,
that isn't legal, so we should fail to link.
Fixes:
GL43-CTS.shader_subroutine.subroutine_uniform_wo_matching_subroutines
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes:
GL43-CTS.shader_subroutine.subroutines_incompatible_with_subroutine_type
It just makes sure the signatures match as well as the return
types.
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
_mesa_GetActiveSubroutineUniformiv needs to check
against the number of types here.
Noticed while playing with ogl conform.
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This happens with dEQP tests. The code doesn't at all protect against
this condition, so while unhandled, this is an expected situation.
Also avoid using more than the first 16 registers for nv3x vertex
programs.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
On nv30, for example, there is no hardware index buffer support. So all
of those will be created entirely in user memory.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This isn't used anymore in the tree, culldist's
are part of the clipdist semantic, we could in theory
rename it, but I'm not sure there is much point, and
I'd have to be careful with virgl.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The way the HW works doesn't really fit with having
two semantics for this.
The GLSL compiler emits 2 vec4s and two properties,
this makes draw use those instead of CULLDIST semantics.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes:
GL45-CTS.direct_state_access.queries_errors
The ARB_direct_state_access spec agrees.
v2: move check down further (Ilia)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When the array doesn't start at 0 we need to account for su->tex.r.
While we are at it, make sure to avoid out of bounds access by masking
the index.
This fixes GL45-CTS.shading_language_420pack.binding_image_array.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reported-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Apparently the stencil mask applies to clears on nv30/nv40. Reset it to
0xff before doing a stencil clear. This fixes gl-1.0-readpixsanity and
a number of other piglit tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
nir_instr_rewrite_src() expects a nir_src and it is currently being fed a
nir_tex_src. This will crash something.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Initially to make sure the format doesn't mismatch and won't produce
out-of-bounds access, we checked that both formats have exactly the same
number of bytes, but this should not be checked for type stores.
This fixes serious rendering issues in the UE4 demos (tested with
realistic and reflections).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
To prevent out-of-bounds access and format mismatch we add a predicate
on sustp, but we have to account for it when the sources are condensed
because a predicate is a source. Using the range 3:6 will only condense
the input data and it's always the case. This also fixes constraints
when an indirect access is used.
This ensures that sources are correctly aligned.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
If the transform feedback object is paused when ending, then there are
no new snapshots to add to the tally. In fact, we haven't written a
starting snapshot, so we'd best not try and compute (end - start).
Just load the existing tally so we can convert it to the number of
vertices written and store it to the final result location.
This is the Haswell+ equivalent of the previous commit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
If the transform feedback object is paused, then we've already written
an ending counter snapshot. We don't want to write another one.
This fixes assertions in GL33-CTS.transform_feedback.api_errors_test,
which calls EndTransformfeedback after PauseTransformFeedback. On the
next BeginTransformFeedback, we tried to tally up the results, and saw
an odd number of snapshots (due to the double-end), and tripped an
assertion.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This way, the driver's EndTransformFeedback() hook can tell whether the
transform feedback operation was paused. It's also convenient to have
Paused remain false until the driver's PauseTransformFeedback hook
finishes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Otherwise we rewrote the fadd to use itself, causing crashes in
validation. Instead, start after the last use like we should.
A brown paper bag fix. Fixes crashes in several Vulkan tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
For cull distance GLSL will let unsized unused arrays get
into the backend, we should nuke those straight away, to
save caring about them later.
This fixes:
arb_separate_shader_objects/linker/large-number-of-unused-varyings
as a side effect (even without culling changes).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This isn't allowed by Vulkan, but might be useful someday for
SPIR-V in OpenGL (if that ever becomes a thing). It's easy enough
to hook up, and as precedent, we already do so for OriginLowerLeft.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rob's nir_lower_wpos_ytransform() pass flips dFdy in the opposite case
of what I expected, so we always take the negate_value case. It doesn't
really matter.
v2: Write src0 before src1 in ADD instructions (requested by Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we handle flipping and other gl_FragCoord transformations
via a uniform, these key fields have no users.
This patch actually eliminates the associated recompiles. The Tomb
Raider benchmark's minimum FPS increases from ~1 FPS to a reasonable
number.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This handles gl_FragCoord transformations and other window system vs.
user FBO coordinate system flipping by multiplying/adding uniform
values, rather than recompiles.
This is much better because we have no decent way to guess whether
the application is going to use a shader with the window system FBO
or a user FBO, much less the drawable height. This led to a lot of
recompiles in many applications.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
nir_lower_wpos_ytransform() is great for OpenGL, which allows
applications to choose whether their coordinate system's origin is
upper left/lower left, and whether the pixel center should be on
integer/half-integer boundaries.
Vulkan, however, has much simpler requirements: the pixel center
is always half-integer, and the origin is always upper left. No
coordinate transform is needed - we just need to add <0.5, 0.5>.
This means that we can avoid using (and setting up) a uniform.
I thought about adding more options to nir_lower_wpos_ytransform(),
but making a new pass that never even touched uniforms seemed simpler.
v2: Use normal iterator rather than _safe variant (noticed by Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Rob Clark <robdclark@gmail.com>
ffma is an explicitly fused multiply add with higher precision.
The optimizer will take care of promoting mul/add to fma when
it's beneficial to do so.
This fixes failures on Gen4-5 when using this pass, as those platforms
don't actually implement fma().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The return value was used for the old nir_foreach_block callback system,
but at this point it no longer means anything.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
gl_FragCoord is a shader input with location == VARYING_SLOT_POS.
ARB_fragment_programs have an equivalent input at VARYING_SLOT_POS,
but it isn't called gl_FragCoord. We do want to transform it.
Matching by location guarantees we catch both.
Fixes several fp tests on a branch which uses this pass on i965.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
The original value might have been swizzled. That's taken care of in
the fmul source - we don't want to reswizzle it again.
Fixes validation failures in glsl-derivs-varyings on a branch of mine
which uses this pass in i965.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
We'd like the comparisons to mean "the exact same bits". Comparing
doubles won't do that for NaN values or positive vs. negative zero.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
transfer_inline_write cannot be NULL and the virgl renderer doesn't support
inline writes for textures, so add the default version.
This fixes a crash in st_TexSubImage since commit fb9fe352ea ("st/mesa:
use transfer_inline_write for memcpy TexSubImage path").
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We were locking the Shared->Mutex and then using calling functions like
_mesa_HashInsert that do additional per-hash-table locking internally.
Instead just lock each hash-table's mutex and use functions like
_mesa_HashInsertLocked and the new _mesa_HashRemoveLocked.
In order to do this, we need to remove the locking from
_mesa_HashFindFreeKeyBlock since it will always be called with the
per-hash-table lock taken.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cuts 6K of .text.
text data bss dec hex filename
5772372 264648 29320 6066340 5c90a4 lib/i965_dri.so before
5766074 264648 29320 6060042 5c780a lib/i965_dri.so after
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In the expanded field, only ASTC format enums have the MSB set to 1.
Expanding the field width makes the process of handling these formats
identical to the way other formats are handled.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Also, make changes needed for successful compilation and registration
as a texture compression mode.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At this point, it would require a logic error in nir_validate to not
have already populated this hashtable entry, but coverity doesn't
realize that:
CID 1265547 (#1 of 1): Dereference null return value (NULL_RETURNS)3.
dereference: Dereferencing a null pointer entry.
CID 1271039 (#1 of 1): Dereference null return value (NULL_RETURNS)3.
dereference: Dereferencing a null pointer entry.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Not sure how coverity arrives at the conclusion that we can read comp[j]
unitialized (around line 204), other than not being aware that ncomp is
greater than 1 so it won't underflow in the 'if (tex->is_array)' case.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Not 100% sure, but I think being an unsigned literal will help:
CID 1358505 (#1 of 1): Unintended sign extension
(SIGN_EXTENSION)sign_extension: Suspicious implicit sign extension:
load1->def.num_components with type unsigned char (8 bits, unsigned) is
promoted in load1->def.num_components * (load1->def.bit_size / 8) to
type int (32 bits, signed), then sign-extended to type unsigned long (64
bits, unsigned). If load1->def.num_components * (load1->def.bit_size /
8) is greater than 0x7FFFFFFF, the upper bits of the result will all be
1.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Originally we removed the instruction, changed the source, and then
re-inserted it. This works, but nir_instr_rewrite_src is a bit more
obviously correct.
This works around a bug in older version of UE4, where a shader
defines the same structure twice. Although we aren't sure this is correct
GLSL (it most likely isn't) there are enough UE4 based things out there
we should deal with this.
This drops the error to a warning if the struct names and contents match.
v1.1: do better C++ on record_compare declaration (Rob)
v2: restrict this to desktop GL only (Ian)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95005
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Ken suggested instead of a big and complicated optimization pass, to
just recognize the operations here. It's certainly less code and a lot
prettier, but it seems to actually perform worse for currently unknown
reasons.
total instructions in shared programs: 8923452 -> 8904108 (-0.22%)
instructions in affected programs: 814563 -> 795219 (-2.37%)
helped: 3336
HURT: 10
total cycles in shared programs: 66970734 -> 66651476 (-0.48%)
cycles in affected programs: 10582686 -> 10263428 (-3.02%)
helped: 2438
HURT: 691
total spills in shared programs: 1811 -> 1789 (-1.21%)
spills in affected programs: 85 -> 63 (-25.88%)
helped: 4
total fills in shared programs: 3143 -> 3109 (-1.08%)
fills in affected programs: 167 -> 133 (-20.36%)
helped: 4
LOST: 2
GAINED: 36
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The next patch wants to inspect the LOD argument and do something
different if it's 0.0f. But at that point we've emitted a MOV for it and
we just have a register to look at.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Address registers are always loaded right before use. Don't treat them
as "global", which will cause them to be put into the function's
linkage, and will make the register allocator hold onto that
register until the end of the function.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Move a MultisampleTrait static from header to cpp as clang seemed to get
confused with some specializations in the header vs some in cpp.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
With xwayland, vainfo use VA_DISPLAY_WAYLAND as default and it fails
and fails when specify display with `vainfo --display wayland`.
In fact wayland support for libva uses drm path to connect device,
and should use drm pipe loader to create screen.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
brw_fs.cpp: In function ‘const unsigned int* brw_compile_fs(const [...]
brw_fs.cpp:6093:64: warning: ‘simd16_grf_start’ may be used uninitialized [...]
prog_data->base.dispatch_grf_start_reg = simd16_grf_start;
brw_fs.cpp:5996:29: note: ‘simd16_grf_start’ was declared here
uint8_t simd8_grf_start, simd16_grf_start;
brw_fs.cpp:6094:52: warning: ‘simd16_grf_used’ may be used uninitialized [...]
prog_data->reg_blocks_0 = brw_register_blocks(simd16_grf_used);
brw_fs.cpp:5997:29: note: ‘simd16_grf_used’ was declared here
unsigned simd8_grf_used, simd16_grf_used;
(and more)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We don't really support reading/writing of 3D textures since the hardware
doesn't do 3D, but we do need to make sure that a pipe_transfer for them
has enough space to store the image. This was previously not a problem
because the state tracker only mapped a slice at a time until
fb9fe352ea. Fixes glean glsl1 tests, which
all have setup of a 3D texture at the start.
Trivial fix to improperly handled cleanup during
VK_ERROR_OUT_OF_HOST_MEMORY.
Identified by Coverity: CID 1358908 and 1358909
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
A few changes to support musl libc as well.
In particular fpu_control.h is glibc specific.
fenv.h doesn't enable to do exactly what we want either,
so instead use assembly directly.
Signed-off-by: Wang He <xw897002528@gmail.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
The last remaining issues with thread_submit have been resolved,
thus turn it when on a different device (the case where is is
beneficial).
Signed-off-by: Axel Davy <axel.davy@ens.fr>
pipe_rasterizer multisample bit should be enabled only when really
wanting to do multisampling, thus we should disable when not having
msaa render target.
This fixes some depth calculation precision issues on radeon.
Also disable it when depth and stencil tests are disabled, since in that
case multisampling is same as not multisampled.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
ATOC extension does something only when alpha test is enabled.
Use a second bit to encode the difference with ATIATOC.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Nine doesn't support vs output/ps input packing.
We haven't found any application requiring that,
and implementing it properly is complex.
Add asserts for now.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
When taking screenshots we do a copy from the frontbuffer
to an allocated buffer (which we then copy to a ram buffer).
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Our behaviour was not entirely similar to what
the docs and our tests describe.
Drop d3dlock_buffer_to_pipe_transfer_usage.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Use present interface 1.2 function ID3DPresent_CreateThread
to create the thread for threadpool.
Creating the thread with WINE prevents some rarely occuring crashes.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
The problem is that if one d3d present call fails,
because of our occlusion check in present method,
the next presentation call will send the same pixmap to the Xserver again,
without waiting it is released, which is wrong.
Move the present call after occlusion check to return and prevent
Xpixmaps errors.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Any third party app might change the current screen resolution.
Poll for resolution mismatch to force a device reset.
Required for non ex devices only.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Implement presentation interface version 1.2:
* ID3DPresent_ResolutionMismatch
Poll for resolution mismatch.
A third party app might have changed resolution,
which requires a device reset.
* ID3DPresent_CreateThread
Create a thread in WINE to allow nine to use Windows API
functions. Required for multi-threaded presentation.
In single-threaded presentation mode the calling thread is
already known to WINE.
* ID3DPresent_WaitForThread
Wait for a wine thread to terminate.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
We were doing the conversion for surfaces, but not yet
volumes. Now that volumes can do conversion, use it.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
X8L8V8U8 support should be common. Some more recent cards
do support this format, but not L6V5U5.
Add fallback for this format to have it alwaus supported.
L6V5U5 conversion rule apparently differs a bit from the normal
spec, and thus the gallium equivalent format leads to slightly
wrong colors. Since some recent cards do not support it, do not
support it either.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Version 0.1 allows to assume that the second
element of the IDirect3D* structures will
be a pointer to the internal nine vtable.
This is useful if the gallium nine user wants
to wrap some interfaces.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Our code did match the user documentation of the function
quite well (except for format check).
However the DDI documentation and wine tests show that
documentation was not correct. Thus adapt our code to
fit the best possible to the -real- spec.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This sets the AA location to the d3d11
spec.
EG/NI 8X MSAA is left as is. Not sure
why it was set different to Cayman, so
lets it as is.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Besides depth/stencil, the hardware doesn't support
mixed formats.
The GL state tracker doesn't make use of them.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
d3d 9 needs COLOR0 to be 1.0 on all channels when
undefined. 0.0 for the others is fine.
GL behaviour is undefined.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
d3d 9 needs COLOR0 to be 1.0 on all channels when
undefined. 0.0 for the others is fine.
GL behaviour is undefined.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
d3d 9 needs COLOR0 to be 1.0 on all channels when
undefined. 0.0 for the others is fine.
GL behaviour is undefined.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
As Emil pointed out, only gcc, clang and MSVC compatibility is required.
Hence the check for GNUC can be skipped, as __i386__ and __x86_64__ are
only defined for gcc/clang, not for MSVC.
Remove the #undef which has been there for historic reasons, when wine
dlls for nine have been built inside mesa. Instead use #ifndef in order
to avoid redefining WINAPI from MSVC's headers.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Axel Davy <axel.davy@ens.fr>
Mark it as unreachable. Silences a compiler warning:
spirv/spirv_to_nir.c:1397:4: warning: enumeration value
'nir_texop_txf_ms_mcs' not handled in switch [-Wswitch]
switch (instr->op) {
^
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
layout should only be null for structs, but it's checked everywhere else
and confuses Coverity (CID 1358495).
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Coverity (CID 1358496) warns that the cleanup code doesn't unlock the
mutex (which is arguably kind of stupid, since the only case that can
happen is when mtx_unlock() failed!). But, mtx_unlock() isn't going to
fail -- the mutex was locked by this thread just a few lines above it.
Operations like nir_op_bitfield_insert have four arguments, and Coverity
isn't privy to the fact that 4-argument operations aren't possible here,
so it thinks this can lead to memory corruption. Just increase the size
of the array to quell any fears.
Previously an SSO pipeline containing only a tessellation control shader
and a tessellation evaluation shader would not get locations assigned
for the TCS inputs. This would lead to assertion failures in some
piglit tests, such as arb_program_interface_query-resource-query.
That piglit test still fails on some tessellation related subtests.
Specifically, these subtests fail:
'GL_PROGRAM_INPUT(tcs) active resources' expected 2 but got 3
'GL_PROGRAM_INPUT(tcs) max length name' expected 12 but got 16
'GL_PROGRAM_INPUT(tcs,tes) active resources' expected 2 but got 3
'GL_PROGRAM_INPUT(tcs,tes) max length name' expected 12 but got 16
'GL_PROGRAM_OUTPUT(tcs) active resources' expected 15 but got 3
'GL_PROGRAM_OUTPUT(tcs) max length name' expected 23 but got 12
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
Commit 11096ec introduced a regression in some piglit tests (e.g.,
arb_program_interface_query-resource-query). I did not notice this
regression because other (unrelated) problems caused failed assertions
in those same tests on my system... so they crashed before getting to
the new failure.
v2: Use is_gl_identifier. Suggested by Tim.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
The use of the parameter was removed in d6b92028.
glsl/link_varyings.cpp:1390:39: warning: unused parameter ‘separate_shader’ [-Wunused-parameter]
bool separate_shader)
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The parameter appears to have been unused since the function was added
in commit 12ba6cfb. Remove it.
glsl/linker.cpp:2886:60: warning: unused parameter ‘prog’ [-Wunused-parameter]
match_explicit_outputs_to_inputs(struct gl_shader_program *prog,
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The only place that actually used the type parameter was the GS visitor,
and it was always passed glsl_type::int. Just remove the parameter.
brw_vec4_vs_visitor.cpp:38:61: warning: unused parameter ‘type’ [-Wunused-parameter]
const glsl_type *type)
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The MaxComputeWorkGroupInvocations constant is used in
compute_version_es2() instead of extensions->ARB_compute_shader
as ES has lower requirements than desktop GL.
Both i965 and gallium set this constant before enabling compute support.
Signed-off-by: Daniel Scharrer <daniel@constexpr.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Because the CSO module handles sampler views for fragment shaders
differently than vertex/geom shaders, VS/GS shader sampler views
aren't explicitly unbound like for FS sampler vers. This code
checks for the case of start=num=0 and nulls out the sampler views.
Fixes a assert regression in piglit's arb_texture_multisample-
sample-position test.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We always clamp fragment colors, since they're always 8-bit unorm, so
there's no need to have us compile separate shaders based on
GL_ARB_color_buffer_float. This gives us precompilation of fragment
programs to the vc4_shader_state_create() level.
This allows the same pipe_shader_state to be referenced from multiple
contexts. Since our pipe_shader_state is treated as immutable (other than
the variant number) within the driver, this is no problem.
This will be generated by glsl_to_nir, and it turns out that this is a
more code-efficient path than the floating point math, anyway.
No change on shader-db, but drops an instruction in piglit's
glsl-fs-frontfacing.
In a5d7e144ea, Connor generalized the
exec_size halving code to handle more cases. As part of this, he made
it not halve anything if the region accessed falls completely in a
single register.
Unfortunately, it started producing some invalid regions:
-add(16) g6<1>F g10<8,8,1>UW -g1<0,1,0>F { align1 compr };
-add(16) g8<1>F g12<8,8,1>UW -g1.1<0,1,0>F { align1 compr };
+add(16) g6<1>F g10<16,16,1>UW -g1<0,1,0>F { align1 compr };
+add(16) g8<1>F g12<16,16,1>UW -g1.1<0,1,0>F { align1 compr };
Here, the UW source region completely fits within a register. However,
we have to use instruction compression because the destination region
spans two registers. <16,16,1> is invalid because it's compressed.
To handle this, skip the "everything fits in one register" case and
fall through to the exec_size halving case when compressed.
Fixes hundreds of Piglit regressions on GM965.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95370
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This enables:
- GL_OES_sample_shading
- GL_OES_sample_variables
- GL_OES_shader_multisample_interpolation
On Gen8, we pass all the CTS tests, and all but 4 of the dEQP-GLES31
tests (dealing with 1x/2x MSAA at half rate sampling). We believe
those 4 dEQP-GLES31 tests are incorrect.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes the following building error due to libmesa_nir dependency:
In file included from external/mesa/src/mesa/state_tracker/st_glsl_to_nir.cpp:44:0:
external/mesa/src/compiler/nir/nir.h:42:25: fatal error: nir_opcodes.h: No such file or directory
#include "nir_opcodes.h"
^
compilation terminated.
build/core/binary.mk:706: recipe for target 'out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_st_mesa_intermediates/state_tracker/st_glsl_to_nir.o' failed
make: *** [out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_st_mesa_intermediates/state_tracker/st_glsl_to_nir.o] Error 1
make: *** Waiting for unfinished jobs....
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Section 8.9 (Texture Functions) of the OpenGL Shading Language 4.5
specification:
However, automatic level of detail is computed only for fragment shaders.
Other shaders operate as though the base level of detail were computed as
zero.
and Section 8.9.3 (Texture Gather Functions):
When performing a texture gather operation, the minification and
magnification filters are ignored, and the rules for LINEAR filtering in
the OpenGL Specification are applied to the base level of the texture
image to identify the four texels i_0 j_1, i_1 j_1, i_1 j_0, and i_0 j_0.
Of course, explicit LOD or derivative variants work in all shader types.
This fixes several GL4x-CTS.texture_gather.* tests.
v2: TG4 is always level zero (thanks, Ilia)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
TXQ is sufficiently different that having in it in the same code path as
texture sampling/fetching opcodes doesn't make much sense.
v2: guard against NULL pointer dereferences
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
The format_desc swizzle describes where in the array each color channel
comes from - but the existing code was written as if each entry in the
swizzle described the meaning of an array element.
Fixes piglit's arb_copy_image-format-swizzle.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We also have this barrier call for gen8 vkCmdWaitEvents.
We don't implement waiting on events for gen7 yet, but this barrier at
least helps to not regress CTS cases when data caching is enabled.
Without this, the tests would intermittently report a failure when the
data cache was enabled.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If images or shader buffers are used, we will enable the data cache in
the the L3 config.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This way we get correct sampling from RGB formats that are faked as RGBA.
This should also cause it to disable rendering and blending on those
formats. We should be able to render to them and, on Broadwell and above,
we can blend on them with work-arounds. However, we'll add support for
that more properly later when it's deemed useful. For now, disabling
rendering and blending should be safe.
The new code removes the switch statement and instead handles depth/stencil
as up-front special cases. This allows for potentially more complicated
color format handling in the future.
This commit removes anv_format_for_vk_format and adds an anv_get_format
helper. The anv_get_format helper returns the anv_format by-value. Unlike
anv_format_for_vk_format the format returned by anv_get_format is 100%
accurate and includes any tweaks needed for tiled vs. linear.
anv_get_isl_format is now just a wrapper around anv_get_format that picks
off just the isl_format.
Now that we have VkFormat introspection and we've removed everything that
tried to use anv_format for introspection, we no longer need most of what
was in anv_format.
Because the buffer is exposed to the user, the block size is defined to
always exactly be the size of the actual vulkan format. This is the same
size (it had better be) as the linaer image format.
As much as I hate adding yet more format introspection, there are times
when the VkFormat is sufficient and we don't want to round-trip through
isl_format. For these times, the new vk_format_info.c/h files provide some
simple driver-agnostic VkFormat introspection. This intended to be
specific to Vulkan but not to any driver whatsoever.
Otherwise the instances in the extension XML override the core
definitions, and we stop knowing their sizes in indirect_size_get.c
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Squashes the one remaining warning in the xserver build.
v2: Also clean up some non-standard whitespace (Ian Romanick)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
We're about to update the generator scripts to use these, easier not to
vary between client and server.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
... otherwise we'll produce uncomplete binaries with introduction of NIR
as alternative IR with next commits.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
This allows us to disable spilling for blorp shaders since blorp state
setup doesn't handle spilling. Without this, blorp fails hard if you run
with INTEL_DEBUG=spill.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Francisco Jerez <currojerez@riseup.net>
We were always doing atomics on shared memory location 0 instead of the
originally supplied location. Make sure to pass through the original
symbol and any indirection.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org # note: expect minor conflict
Log all the errors, and at the end dump the shader w/ error annotations
to make it easier to see where the problems are.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Caller can pass a hashtable mapping NIR object (currently instr or var,
but I guess others could be added as needed) to annotation msg to print
inline with the shader dump. As the annotation msg is printed, it is
removed from the hashtable to give the caller a way to know about any
unassociated msgs.
This is used in the next patch, for nir_validate to try to associate
error msgs to nir_print dump.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
ARB_vertex_attrib_64bit was the only feature missing.
v2: we can expose 4.2 instead of 4.1 (Ian Romanick)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Input attributes can require 2 vec4 or 1 vec4 depending on whether they
are double-precision or not.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When computing where the first non-payload GRF starts, we can't rely on
the number of attributes, as each attribute can be using 1 or 2 slots
depending on whether they are a dvec3/4 or other.
Instead, we need to use the number of slots used by the attributes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Do not use total attributes because a dvec3/dvec4 attribute requires two
slots. So rather use total attribute slots.
v2: do not use loop to calculate required attribute slots (Kenneth
Graunke)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
VS Thread Payload handles attributes in URB as vec4, no matter if they
are actually single or double precision.
So with double-precision types, value ends up in the registers split in
32bits chunks, in different positions.
We need to shuffle the chunks to get the doubles correctly.
v2:
* Extra blank line. Add { } on if body (Ian Romanick)
* Use dest directly (Kenneth Graunke)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The HW has a restriction that only vertical stride may cross register
boundaries. Until now this was only handled on VGRFs at
rw_reg_from_fs_reg, but it is also needed for attributes.
v2:
* Remove reference to commit id on commit message (Juan Suarez)
* Simplify code that compute final exec_size (Ian Romanick)
* Use REG_SIZE on that same code (Kenneth Graunke)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the Broadwell specification, structure VERTEX_ELEMENT_STATE
description:
"When SourceElementFormat is set to one of the *64*_PASSTHRU
formats, 64-bit components are stored in the URB without any
conversion. In this case, vertex elements must be written as 128
or 256 bits, with VFCOMP_STORE_0 being used to pad the output
as required. E.g., if R64_PASSTHRU is used to copy a 64-bit Red component into
the URB, Component 1 must be specified as VFCOMP_STORE_0 (with
Components 2,3 set to VFCOMP_NOSTORE) in order to output a 128-bit
vertex element, or Components 1-3 must be specified as VFCOMP_STORE_0
in order to output a 256-bit vertex element. Likewise, use of
R64G64B64_PASSTHRU requires Component 3 to be specified as VFCOMP_STORE_0
in order to output a 256-bit vertex element."
Uses 128-bits to write double and dvec2 vertex elements, and 256-bits for
dvec3 and dvec4 vertex elements.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit adds support for PASSTHRU format when pushing
double-precision attributes.
Check glarray->Doubles in order to know if we should choose a format
that does a conversion to float, or just passthru the 64-bit double.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These varying have a separate location domain from per-vertex varyings
and need to be handled separately.
Reviewed-by: Dave Airlie <airlied@redhat.com>
These varyings have a separate location domain from per-vertex varyings
and need to be handled separately.
Reviewed-by: Dave Airlie <airlied@redhat.com>
I recently fixed a bug in the Piglit tests:
https://lists.freedesktop.org/archives/piglit/2016-May/019802.html
With that patch in place, we pass all the tests. So, turn it on.
We could probably expose this earlier than Gen8, but the extension
says that OpenGL 4.0 is required, and all of our tests are written
against GLSL 4.00 (which is only supported on Gen8+).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We also need render to the front buffer of temporary X pixmap,
this is the case of when we using opengl as video out for vaapi.
the basic implementation is to pass pixmap ID to X server, and
then X will return dma-buf fd, we will get the buffer object
through this dma-buf fd.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
When drawable size changed, PresentConfigureNotify event will be
emitted, by handling the event to re-allocate resized buffer.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This implements DRI3 PixmapFromBuffer. Create buffer objects, and
associate it to a dma-buf fd, and then pass this fd with a pixmap
ID to X server for creating pixmap object; also add a function
for wait events.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Required functions into place for implementation, create screen
with device fd returned from X server, also bail out to DRI2
with certain conditions.
v2: -organize the error out path (Axel)
-squash previous patch 1 and 2 into one (Emil)
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Coverity issue 1361544 found an instance where the tcs variable is
checked for NULL, but unconditionally dereferenced later in the same
function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: do not write to the original indirect_offset since that is
an expression that could be used somewhere else (Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The scalar backend uses this to check URB input sizes.
v2: Removed redundant break after return (Curro)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This is pretty much the same we do with SSBOs.
v2: do not shuffle in-place, it is not safe since the original 64-bit data
could be used after the write, instead use a temporary like we do
for SSBO stores (Iago)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This does the inverse operation of shuffle_32bit_load_result_to_64bit_data
and we will use it when we need to write 64-bit data in the layout expected
by untyped write messages.
v2 (curro):
- Use subscript() instead of stride()
- Assert on the input types rather than silently retyping.
- Use offset() instead of horiz_offset(), drop the multiplier definition.
- Drop the temporary vgrf and force_writemask_all.
- Make component_i const.
- Move to brw_fs_nir.cpp
v3 (curro):
- Pass dst and src by reference.
- Simplify allocation of tmp register.
- Move to brw_fs_nir.cpp.
- Get rid of the temporary.
v3 (Iago):
- Check that the src and dst regions do not overlap, since that would
typically be a bug in the caller.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We are going to need the same logic for anything that reads
doubles via untyped messages (CS shared variables and SSBOs). Add a
helper function with that logic so that we can reuse it.
v2:
- Make this a static function instead of a method of fs_visitor (Iago)
- We only support types with a size of 4 or 8 (Curro)
- Avoid retypes by using a separate vgrf for the packed result (Curro)
- Put dst parameter before source parameters (Curro)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
UBO loads with constant offset use the UNIFORM_PULL_CONSTANT_LOAD
instruction, which reads 16 bytes (a vec4) of data from memory. For dvec
types this only provides components x and y. Thus, if we are reading
more than 2 components we need to issue a second load at offset+16 to
read the next 16-byte chunk with components w and z.
UBO loads with non-constant offset emit a load for each component
in the vector (and rely in CSE to fix redundant loads), so we only
need to consider the size of the data type when computing the offset
of each element in a vector.
v2 (Sam):
- Adapt the code to use component() (Curro).
v3 (Sam):
- Use type_sz(dest.type) in VARYING_PULL_CONSTANT_LOAD() call (Curro).
- Add asserts to ensure std140 vector alignment rules are followed
(Curro).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4 starting at a
constant offset that is 16-byte aligned. If we need to access an unaligned
offset we emit a load with an aligned offset and use the remaining constant
offset to select the component into the vec4 result that we are interested
in. This component must be computed in units of the type size, since that
is what fs_reg::set_smear expects.
This patch does this change in the two places where we use this message:
In demote_pull_constants when we lower uniform access with constant offset
into the pull constant buffer and in UBO loads with constant offset.
v2 (Sam):
- Fix set_smear() in fs_visitor::lower_constant_loads(), take into account
source type instead and remove MAX2 (Curro).
- Improve changes to nir_intrinsic_load_ubo case in nir_emit_intrinsic()
(Curro).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This fixes a number of bugs of component() by reimplementing it in
terms of horiz_offset(): Handling of base registers starting at a
non-zero subreg_offset, handling of strided registers and overflow of
subreg_offset into reg_offset.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There will be a few places where we need to shuffle the result of a 32-bit
load into valid 64-bit data, so extract this logic into a separate helper
that we can reuse.
v2 (Curro):
- Use subscript() instead of stride()
- Assert on the input types rather than retyping.
- Use offset() instead of horiz_offset(), drop the multiplier definition.
- Don't use force_writemask_all.
- Mark component_i as const.
- Make the function name lower case.
v3 (Curro):
- Pass src and dst by reference.
- Move to brw_fs_nir.cpp
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Instead of using the LOAD_PAYLOAD instruction (emitted through the
emit_transpose() helper that is no longer useful and this commit
removes) which had to be marked force_writemask_all in some cases,
emit a series of moves to apply proper channel enable signals to the
destination. Until now lower_simd_width() had mainly been used to
lower things that invariably had a basic block-local temporary as
destination so it didn't seem like a big deal, but I found it to be
the reason for several Piglit regressions in my SIMD32 branch and
Igalia discovered the same issue independently while working on FP64
support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were not accounting for subreg_offset in the check for the start
of the region.
Also, fs_reg::regs_read() already takes the stride into account, so we
should not multiply its result by the stride again. This was making
copy-propagation fail to copy-propagate cases that would otherwise be
safe to copy-propagate. Again, this was observed in fp64 code, since
there we use stride > 1 often.
v2 (Sam):
- Rename function and add comment (Jason, Curro).
- Assert that register files and number are the same (Jason).
- Fix code to take into account the assumption that src.subreg_offset
is strictly less than the reg_offset unit (Curro).
- Don't pass the registers by value to the function, use
'const fs_reg &' instead (Curro).
- Remove obsolete comment in the commit log (Curro).
v3 (Sam):
- Remove the assert and put the condition in the return (Curro).
- Fix function name (Curro).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We were not considering the case where the load payload is writing to
a destination with a reg_offset > 0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We were not invalidating entries with a src that reads more than one register
when we find writes that overwrite any register read by entry->src after
the first. This leads to incorrect copy propagation because we re-use
entries from the ACP that have been partially invalidated. Same thing for
entries with a dst that writes to more than one register.
v2 (Sam):
- Improve code by defining regions_overlap() and using it instead of a
loop (Curro).
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
try_copy_propagate() was special-casing UNIFORM registers (the
BAD_FILE, ARF and FIXED_GRF cases are dead, see the assertion at the
top of the function) and then failing to take into account the
possibility of the instruction reading from a non-zero offset of the
destination of the copy. The VGRF/ATTR handling takes it into account
correctly, and there is no reason we couldn't use the exact same logic
for the UNIFORM file aside from the fact that uniforms represent
reg_offset in different units. We can work around that easily by
defining an additional constant with the right unit reg_offset is
expressed in.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Because the semantics of source modifiers are type-dependent, the type of the
original source of the copy must be kept unmodified while propagating it into
some instruction, which implies that we need to have the guarantee that the
meaning of the instruction is going to remain the same after we have changed
the types. Whenthe size of the new type is different from the size of the old
type the new and old instructions cannot possibly be equivalent because the new
instruction will be reading more data than the old one was.
Prevents that we turn this:
load_payload(8) vgrf17:DF, |vgrf4+0.0|:DF 1sthalf
mov(8) vgrf18:DF, vgrf17:DF 1sthalf
load_payload(8) vgrf5:DF, vgrf18:DF, vgrf20:DF NoMask 1sthalf WE_all
load_payload(8) vgrf21:UD, vgrf5+0.4<2>:UD 1sthalf
mov(8) vgrf22:UD, vgrf21:UD 1sthalf
into:
load_payload(8) vgrf17:DF, |vgrf4+0.0|:DF 1sthalf
mov(8) vgrf18:DF, |vgrf4+0.0|:DF 1sthalf
load_payload(8) vgrf5:DF, |vgrf4+0.0|:DF, |vgrf4+2.0|:DF NoMask 1sthalf WE_all
load_payload(8) vgrf21:UD, vgrf5+0.4<2>:UD 1sthalf
mov(8) vgrf22:DF, |vgrf4+0.4|<2>:DF 1sthalf
where the semantics of the last instruccion have changed.
v2 (Curro):
- Update commit log and add comment to explain the problem better.
- Simplify the condition.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Specifically, consider the size of the data type of the operand to compute
the number of registers written.
v2 (Sam):
- Fix line width (Jordan).
- Add an assert (Jordan).
- Use REG_SIZE in the calculation of regs_written (Curro)
v3 (Sam):
- Fix assert and calculation of regs_written (Curro).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This has likely been broken since we started propagating copies not
matching the offset of the instruction exactly
(1728e74957). The copy source stride
needs to be taken into account to find out the offset at the origin
that corresponds to the offset at the destination of the copy which is
being read by the instruction. This has led to program miscompilation
on both my SIMD32 branch and Igalia's FP64 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This can happen if the register already has a non-zero subreg_offset
when byte_offset() is called.
v2 (Sam):
- Refactor byte_offset() (Jordan).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We've apparently always been botching JIP for sequences such as:
do
cmp.f0.0 ...
(+f0.0) break
...
do
...
while
...
while
Because the "do" instruction doesn't actually exist, the inner "while"
is at the same depth as the "break". brw_find_next_block_end() thus
mistook the inner "while" as the end of the loop containing the "break",
and set the "break" to point to the wrong place.
Only "while" instructions that jump before our instruction are relevant.
We need to ignore the rest, as they're sibling control flow nodes (or
children, but this was already handled by the depth == 0 check).
See also commit 1ac1581f38.
This prevents channel masks from being screwed up, and fixes GPU
hangs(*) in dEQP-GLES31.functional.shaders.multisample_interpolation.
interpolate_at_sample.centroid_qualified.multisample_texture_16.
The test ended up executing code with no channels enabled, and that
code contained FIND_LIVE_CHANNEL, which returned 8 (out of range for
a SIMD8 program), which then was used in indirect GRF addressing,
which randomly got a boolean value (0xFFFFFFFF), interpreted it as
a sample ID, OR'd it into an indirect send message descriptor,
which corrupted the message length, sending a pixel interpolator
message with mlen 15, which is illegal. Whew :)
(*) Technically, the test doesn't GPU hang currently, but only
because another bug prevents it from issuing pixel interpolator
messages entirely...with that fixed, it hangs.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
STATE_BASE_ADDRESS stalls the whole pipeline, and the documentation
cautions us to emit it as little as possible for better performance.
We recently put some hacks in BLORP to try and avoid emitting it
if it was already set correctly. However, this wasn't quite minimal:
if BLORP is the first operation (i.e. glClear()), then it would emit
it, and subsequent draw calls would emit it again.
This caused a small drop in performance in GPUTest Triangle when
switching from Meta to BLORP.
Unlike most packets, STATE_BASE_ADDRESS isn't influenced by GL state:
it needs to be emitted once per batch, before most other commands, or
whenever we change the program cache BO. It's also valid in both the
3D and compute pipelines, which makes it even more unique.
This patch removes it from the atom mechanism and instead directly
calls it as part of every draw, compute dispatch, or BLORP operation.
We introduce a new flag indicating that STATE_BASE_ADDRESS has already
been emitted this batch, and if so, skip doing it again. When we make
a new program cache BO, we simply reset the flag, so the next operation
will emit it again. When we flush/reset the batch, we reset the flag.
This guarantees that we'll emit STATE_BASE_ADDRESS only when we have to.
It's also less code than the old atom mechanism.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We're about to start calling it directly, and this means the callers
won't have to think about generations.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This way all the programs are in one place again, and it also should
make some future STATE_BASE_ADDRESS related changes possible.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
opt_constant_folding is supposed to fold trees of constants into a
single constant. Surprisingly, it was also propagating constant values
from variables into expression trees - even when the result couldn't be
folded together. This is opt_constant_propagation's job.
The ir_dereference_variable::constant_expression_value() method returns
a clone of var->constant_value. So we would replace the dereference
with a constant, propagating it into the tree.
Skip over ir_dereference_variable to avoid this surprising behavior.
However, add code to explicitly continue doing it in the constant
propagation pass, as it's useful to do so.
shader-db statistics on Broadwell:
total instructions in shared programs: 8905349 -> 8905126 (-0.00%)
instructions in affected programs: 30100 -> 29877 (-0.74%)
helped: 93
HURT: 20
total cycles in shared programs: 71017030 -> 71015944 (-0.00%)
cycles in affected programs: 132456 -> 131370 (-0.82%)
helped: 54
HURT: 45
The only hurt programs are by a single instruction, while the helped
ones are helped by 1-4 instructions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If an ir_dereference_array has non-constant components, there's no
point in trying to evaluate its value (which involves walking down
the tree and possibly allocating memory for portions of the subtree
which are constant).
This also removes convoluted tree walking in opt_constant_folding(),
which tries to fold constants while walking up the tree. No need to
walk down, then up, then down again.
We did this for swizzles and expressions already, but I was lazy
back in the day and didn't do this for ir_dereference_array.
No change in shader-db.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We could probably clean this up more (maybe make it a method), but at
least there's only one copy of this code now, and that's a start.
No change in shader-db.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It looks like this was missed when converting opt_constant_folding()
from a hierarchical visitor to an rvalue visitor in 6606fde3.
ir_rvalue_visitor already processes values on the way back up the tree,
so we will have already visited every child node. There's no point in
doing it again.
No change in shader-db.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The pass ultimately skips over any entries with assignment_count != 1,
so there's no need to do further work once we've determined that there
are multiple assignments.
The constant value could be a large array (i.e. uvec4[327]), at which
point skipping the constant_expression_value() call (and the clone()
call within) can save us piles of memory.
No change in shader-db.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I've added this to nir_gather_info(), but also to glsl_to_nir() as a
temporary measure, since the i965 GL driver today doesn't use
nir_gather_info() yet.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I don't know what the intention was here, but this function returns
void. We can't assert anything about its return value.
Fixes "make check" failures.
v2: Also fix prototype for the function (caught by Jordan).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Drop extra #include which is otherwise unneeded (and makes this header
difficult to include from outside of src/mesa).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With algebraic-opt support for lowering div to shift, the driver would
like to be able to run this pass *after* the main opt-loop, and then
conditionally re-run the opt-loop if this pass actually lowered some-
thing.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Not sure how we didn't hit this already, but since we want fdiv
converted into mul + rcp, we should set this.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
In the glsl->tgsi path, this already gets translated to VAR8, which
matches up with rasterizer->sprite_coord_enable.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
When we got NIR directly from state tracker (vs using tgsi_to_nir) we
need to realize this and skip some TGSI specific hacks.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
INTERP is defined (by me) to have to have a INPUT source. However the
state tracker does not always obey this. This happens due to varying
packing logic introducing additional mov's which can't always be undone.
Instead of just giving up, we instead try harder to find the original
input. This won't always be possible, for example with indirect
accesses. There's not much we can (easily) do about that though.
This fixes the remaining interpolateAt* failures in dEQP:
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at*
some of which were asserting due to INTERP_* being passed a non-input.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This fixes
dEQP-GLES31.functional.draw_indirect.draw_elements_indirect.*.default_attribute
These tests were causing a const vbo to be set up, and were small enough
draws that the logic was trying to go via the push path (which emits
data directly into the cmd stream rather than uploading a user vbo).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cull distances are just a special case of clip distances as far as the
hardware is concerned. Make sure that the relevant "planes" are enabled,
and flip the clip mode to cull for those.
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
[imirkin: add enables on nvc0, add nv50 support]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
The pass that st/mesa relies on to combine clip and cull distances has
been reverted, so we can't expose ARB_cull_distance until that is
resolved.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We used to use a meta path on gen8 but we haven't since c7cf17ae75. We
might as well delete the meta path since blorp works on all gens.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We used to use a meta path because blorp didn't support 16x MSAA. Now it
does, so we don't need the meta paths anymore.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We used to use a meta path because blorp didn't support 16x MSAA. Now it
does, so we don't need the meta paths anymore.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The helper was initially created to allow us to set reasonable defaults as
we mutated the brw_blorp_prog_data structure in preparation for NIR. Now
that everything is going through brw_blorp_compile_nir_shader() which fully
fills out the brw_blorp_prog_data structure, we don't need the helper.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
NIR gets kind of awkward when you have a 3-component vector with two floats
and one int. This led to us accidentally going through float for the
sample index. It doesn't hurt anything but it also isn't needed.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The original code-flow tried to map original blorp. This puts things more
where they belong and simplifies some of the logic.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
There's no reason to be passing a whole struct around just for a single
boolean. We can create it later when we actually need to use it as a key.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This array allows the push constants to be re-arranged on upload. The
actual arrangement will, eventually, come from the back-end compiler.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Intel hardware does a form of multisample compression that involves an
auxilary surface called the MCS. When an MCS is in use, you have to first
sample from the MCS with a special opcode and then pass the result of that
operation into the next sample instrucion. Normally, we just do this
ourselves in the back-end, but we want to expose that functionality to NIR
so that we can use MCS values directly in NIR-based blorp.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is similar to nir_channel except that it lets you grab more than one
channel by providing a mask.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's no reason for having a macro *and* a python generator. We can
easily just do the whole thing in python. This has the advantage that we
are no longer definining ALU# macros which conflict with the ones in
brw_fs_builder.h.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The hardware packets organize kernel pointers and GRF start by slots that
don't map directly to dispatch width. This means that all of the state
setup code has to re-arrange the data from prog_data into these slots.
This logic has been duplicated 4 times in the GL driver and one more time
in the Vulkan driver. Let's just put it all in brw_fs.cpp.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we have a persample_shading bit in prog_data we can reduce the
amount the state setup code needs to be looking at the GL state. In
particular, it no longer pulls anything directly out of the
gl_fragment_program and no longer depends on NEW_FRAGMENT_PROGRAM.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit reworks and simplifies the way we handle persample shading in
the shader key and prog_data. The previous approach had three different
key bits that had slightly different and hard-to-decern meanings while the
new bits are far more clear. This commit changes it to two easily
understood bits that communicate everything we need:
1) key->persample_interp: means that the user has requested persample
interpolation through the API. This is equivalent to having
SAMPLE_SHADING enabled and having MIN_SAMPLE_SHADING_VALUE set high
enough that you actually get multiple per-sample invocations.
2) key->multisample_fbo: means that the shader will be running on an
actual multi-sampled framebuffer.
This commit also adds a new "persample_dispatch" bit to prog_data which
indicates that the shader should be run in persample mode. This way the
state setup code doesn't have to look at the fragment program or GL state
and can just pull that data out of the prog_data.
In theory, this shuffle could mean more recompiles. However, in practice,
we were shoving enough state into the key before that we were probably
hitting a recompile on every per-sample shader anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 5310bca024 added a new "double df"
field to the brw_reg struct, adding an extra 4 bytes of data that isn't
usually initialized (or may contain irrelevant garbage if the struct is
mutated). This means that it's no longer safe to memcmp().
Instead, add a brw_regs_equal() function which ignores the extra df bits
unless they matter. To keep the implementation cheap, we wrap the first
set of fields in a union/struct so that we can use a single DWord
comparison.
v2: Drop unnecessary casts (caught by Francisco Jerez).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This fixes a few dEQP tests like
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.no_qualifiers.default_framebuffer
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
interpolateAt* can only take input variables or an element of an input
variable array. No structs.
Further, GLSL 4.40 relaxes the requirement to allow swizzles, so enable
that as well.
This fixes the following dEQP tests:
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.negative.interpolate_struct_member
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.interpolate_struct_member
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.negative.interpolate_struct_member
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In the case of a constant, it might have been propagated through and
variable_referenced() returns NULL. Error out in that case.
Fixes 3 dEQP tests:
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.negative.interpolate_constant
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.interpolate_constant
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.negative.interpolate_constant
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
This will come in handy when we want to lower gl_CullDistance into
gl_CullDistanceMESA.
[airlied: drop separate APIs for clip/cull - just use single API
to call both passes.]
v3: reexamine my sanity, this was pretty broken, the new code
creates one copy of gl_ClipDistanceMESA, as the clip distance
varying and lowers everything into that in two passes, one for clips
one for culls.
v4: rework using the passes in clip/cull sizes, instead of the
array sizes.
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This function previously assumed that the Buffer and Image had matching
dimensions. However, it is possible to copy from a Buffer with larger
dimensions than the Image. Modify the copy function to enable this.
v2: Use ternary instead of MAX for setting bufferExtent (Jason Ekstrand)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95292
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Tested-by: Matthew Waters <matthew@centricular.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is purely cosmetic, making it easier to assign blame for space used
in the binary in case somebody else makes a similar cleanup effort in the
future.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
So that it gets compiled and emitted only once, saving space is the final
binary.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
So that it gets compiled and emitted only once, saving space is the final
binary.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
So that it gets compiled and emitted only once, saving space is the final
binary.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: Reuse the macro for bind & delete.
Note that may not be able to share the delete long-term as
pipe_compute_state contains members not in pipe_shader_state,
and we need to distinguish the pointer location if we add that
struct to the union.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
These cases had the parameter removed:
nir/nir_lower_vec_to_movs.c: In function ‘try_coalesce’:
nir/nir_lower_vec_to_movs.c:124:66: warning: unused parameter ‘shader’ [-Wunused-parameter]
try_coalesce(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader)
^
nir/nir_lower_io.c: In function ‘load_op’:
nir/nir_lower_io.c:147:32: warning: unused parameter ‘state’ [-Wunused-parameter]
load_op(struct lower_io_state *state,
^
These cases had the parameter (void) silenced because the parameter was
necessary for an interface:
nir/glsl_to_nir.cpp:1900:32: warning: unused parameter 'ir' [-Wunused-parameter]
nir_visitor::visit(ir_barrier *ir)
^
nir/nir.c: In function ‘remove_use_cb’:
nir/nir.c:802:35: warning: unused parameter ‘state’ [-Wunused-parameter]
remove_use_cb(nir_src *src, void *state)
^
nir/nir.c: In function ‘remove_def_cb’:
nir/nir.c:811:37: warning: unused parameter ‘state’ [-Wunused-parameter]
remove_def_cb(nir_dest *dest, void *state)
^
Number of total warnings in my build reduced from 2543 to 2538
(reduction of 5).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reduces the number of loop iterations for invalidating buffers
and images.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
It's what all the call-sites once, so gets rid of a bunch of inlined
glsl_get_base_type() at the call-sites.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I tried first creating the auxiliary buffer the same time with the
color buffer. That, however, led me into a situation where we would
later create the rest of the mip-levels and the compression would
need to be disabled (it is only supported for single level buffers).
Here we try to create it on demand just before the hardware starts
to render. This is similar what we do with fast clear buffers,
their creation is deferred until the first clear.
This setup also gives the opportunity to detect if the miptree
represents the temporaty texture used internally in the mesa core.
This texture is mostly written by cpu and therefore enabling
compression for it doesn't make much sense.
Note that a heuristic is included. Floating point formats are not
enabled yet as they are only seen to hurt performance.
Some highlights with window system driver kept fixed to default
and only the application driver changing:
Manhattan: 8.32152% +/- 0.355881%
Offscreen: 9.09713% +/- 0.340763%
Glb trex: 8.46231% +/- 0.460624%
Offscreen: 9.31872% +/- 0.463743%
v2 (Ben): Re-use msaa layout type for single sampled case.
v3: Moved the deferred allocation of mcs to brw_try_draw_prims() and
brw_blorp_blit_miptrees() instead.
v4: (Ken): Drop MIPTREE_LAYOUT_ACCELERATED_UPLOAD when allocating mcs.
Do not enable for scanout buffers
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
v2: Add support for blorp and removed the support for meta
v3 (Ben): Add assertion on compressed non-fast clear - must
be partial clear.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Blorp blits use sampling engine which is capable of resolving
on the fly. Buffers are still resolved for blitter engine. Current
understanding is that blitter doesn't understand lossless compression.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Until now mcs was associated to single sampled buffers only for
fast clear purposes and it was therefore the responsibility of the
clear logic to allocate the aux buffer when needed. Now that normal
3D render or blorp blit may render with mcs enabled also, they need
to prepare the mcs just as well.
v2: Do not enable for scanout buffers
v3 (Ben):
- Fix typo in commit message.
- Check for gen < 9 and return early in brw_predraw_set_aux_buffers()
- Check for gen < 9 and return early in intel_miptree_prepare_mcs()
v4: Check for msaa_layput and number of samples to determine if
lossless compression is to used. Otherwise one cannot distuingish
between fast clear with and without compression.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Consider later on adding specific disable flags such as
MIPTREE_LAYOUT_DISABLE_AUX_MCS = 1 << 3, /* CCS_D */
MIPTREE_LAYOUT_DISABLE_AUX_CCS_E = 1 << 4,
MIPTREE_LAYOUT_DISABLE_AUX = MIPTREE_LAYOUT_DISABLE_AUX_MCS |
MIPTREE_LAYOUT_DISABLE_AUX_CCS_E,
and equivalent boolean/enums into miptree.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
v2: Check explicitly against base type of GL_FLOAT instead of
using _mesa_is_format_integer_color(). Otherwise we miss
GL_UNSIGNED_NORMALIZED.
v3 (Ben): Also call intel_miptree_supports_non_msrt_fast_clear()
in order to really check everything.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
v2 (Ben): Use combination of msaa_layout and number of samples
instead of introducing explicit type for lossless
compression (intel_miptree_is_lossless_compressed()).
v3 (Ben): Do not set fast claer state in surface state setup.
Moved into brw_postdraw_set_buffers_need_resolve()
using a separate patch.
v4: Support for blorp
v5 (Ben): Re-use gen8_get_aux_mode()
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Also use the opportunity to drop the unused surface type argument.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
This hasn't been visible before. It showed up with lossless
compression with:
dEQP-GLES3.functional.fbo.color.repeated_clear.sample.tex2d.rgb8
Current fast clear logic kicks color resolves even for gpu sampling.
In the test case this results into trashing of the fast color clear
state between two subsequent clears, and therefore each clear is
performed correctly.
With lossless compression the resolves are unnecessary and therefore
the clear state indicates that the buffer is already cleared. Without
considering if the previous color value was the same as the new,
clears that need to be performed are skipped and the buffer ends up
holding old pixel values.
v2 (Ken): Fix the comparison for gen < 9
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
I'd originally left this off because Orbital Explorer was hanging the
GPU, but it seems to be working these days. There have been a bunch
of changes since then, so we probably fixed something.
On my Broadwell laptop, both Synmark/GSCloth and Orbital Explorer seem
to run at approximately the same framerate in either mode. This is
despite large reductions in instruction count for Synmark, and large
increases for Orbital Explorer. It apparently just doesn't matter.
Switching to scalar mode will gain us fp64 support in the next release,
as vec4-mode support isn't yet ready.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Three Shadow of Mordor geometry shaders increase by a single
instruction, but the number of spills/fills in Orbital Explorer
is reduced from 194:1279 -> 82:454. No other programs are affected.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This looks like leftover cruft from an earlier attempt at writing
point size hacks. Each vertex has its own copy of gl_PointSize,
so accessing any vertex other than 0 would cause this to fail.
The tests seem to work fine without it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
BLORP never touches these, and they're all non-pipelined. Some
are fairly large packets as well.
I haven't tried to benchmark this; the effect is likely to be small.
However, we may as well stop the pointless papercuts; maybe they'll
add up someday.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Class "ir_constant" had a bunch of constructors where the pointer member
"array_elements" had not been initialized. This could have lead to unsafe
code if something had tried to write anything to it. This patch fixes
this issue by initializing the pointer to NULL in all the constructors.
This issue was discovered by Coverity.
CID: 401603, 401604, 401605, 401610
Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The SAMPLEMASK semantic should only return the bits set covered by the
current invocation. However we were always retrieving the covmask, which
returns the covered samples of the whole pixel.
When not doing per-sample invocation, this is precisely what we want.
However when doing per-sample invocation, we have to select the
sampleid'th bit and only return that. Furthermore, this means that we
have to have a 1:1 correlation for invocations and samples.
This fixes most
dEQP-GLES31.functional.shaders.sample_variables.sample_mask_in.*
tests. A few failures remain due to disagreements about nr_samples==1
logic as well as what happens with MSAA x2 RTs when the shading fraction
is 0.5.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This adds a first tentative .mailmap file, to canonicize contributor
name/emails in shortlogs and other statistical endeavours.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
According to the GLSL spec, if the user uses the fma() intrinsic to
generate a precise-consumed value, and you have it in your hardware, you
shouldn't split it. For a while now, we've been splitting all ffma's
up-front and then planned to fuse them later which isn't valid. Correctly
handling the GLSL behaviour fixes rendering corruptions in Tomb Raider.
The only reason why doing this possibly helped before was for ARB programs
which is handled by the previous commit.
Shader-db results on Haswell:
total instructions in shared programs: 7560300 -> 7561510 (0.02%)
instructions in affected programs: 56265 -> 57475 (2.15%)
helped: 86
HURT: 291
The only shaders in the database that are affected are from "Shadow of
Mordor" which is the first app in our database to use fma(). We could, at
some point in the future, split inexact ffma opcodes which would fix the
shader-db regressions since Shadow of Mordor doesn't ues precise. However,
this fixes a bug now and and the shader-db impact is fairly small.
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unlike fma() in GLSL, MAD in ARB programs is 100% splittable. Just emit
the split version and let the optimizer fuse them later.
Shader-db results on Haswell:
total instructions in shared programs: 7560379 -> 7560300 (-0.00%)
instructions in affected programs: 143928 -> 143849 (-0.05%)
helped: 443
HURT: 250
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The i965 driver has its own pass for fusing mul+add combinations that's
much smarter than what nir_opt_algebraic can do so we don't want to get the
nir_opt_algebraic one just because we didn't set lower_ffma.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previous rename of lower-output-to-temps pass predated merging of anv,
and apparently vulkan wasn't enabled in my local builds so overlooked
this when rebasing.
Reported-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Going to convert this pass to parameterized lower_io_to_temporaries, and
we want the user to be able to specify whether to lower outputs or
inputs or both. The restriction of running this pass before validate
to avoid output reads no longer applies.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
A pass to lower complex (struct/array/mat) inputs/outputs to primitive
types. This allows, for example, linking that removes unused components
of a larger type which is not indirectly accessed.
In the near term, it is needed for gallium (mesa/st) support for NIR,
since only used components of a type are assigned VBO slots, and we
otherwise have no way to represent that to the driver backend. But it
should be useful for doing shader linking in NIR.
v2: use glsl_count_attribute_slots() rather than passing a type_size
fxn pointer
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The goal is to allow the pipe driver to request something other than
TGSI, but detect whether what is getting is TGSI vs what it requested.
The pipe drivers will always have to support TGSI (and convert that into
whatever it is that they prefer), but in some cases we should be able to
skip the TGSI intermediate step (such as glsl->nir vs glsl->tgsi->nir).
I think pipe_compute_state should get similar treatment. Currently,
afaict, it has one user and one consumer, which has allowed it to be
sloppy wrt. supporting alternative IR's.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The use of transfer_inline_write() in TexSubImage path (see fb9fe352ea)
exposed a bug for "layer_first" resources (ie. a4xx) not setting correct
layer_stride.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Currently, when cross validating global variables, all global variables
seen in the shaders that are part of a program are saved in a table.
When checking a variable this already exist in the table, we check both
are initialized to the same value. If the already saved variable does
not have an initializer, we copy it from the new variable.
Unfortunately this is wrong, as we are modifying something it is
constant. Also, if this modified variable is used in
another program, it will keep the initializer, when it should have none.
Instead of copying the initializer, this commit replaces the old
variable with the new one. So if we see again the same variable with an
initializer, we can compare if both are the same or not.
v2: convert tabs in whitespaces (Kenenth Graunke)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Juha-Pekka found this back in May 2015:
<1430915727-28677-1-git-send-email-juhapekka.heikkila@gmail.com>
From the discussion, obviously it would be preferable to make
ralloc_size no longer return zeroed memory, but Juha-Pekka found that
it would break Mesa.
In <56AF1C57.2030904@gmail.com>, Juha-Pekka mentioned that patches
exist to fix i965 when ralloc_size is fixed to not zero memory, but
the patches have not made their way to mesa-dev yet.
For now, let's stop doing the double zeroing of rzalloc buffers.
v2:
* Move ralloc_size code to rzalloc_size, and add a comment as
suggested by Ken.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
% pattern rules are a GNU extension. Convert the use of one to a
inference rule to allow this to build on OpenBSD.
v2: inference rules can't have additional prerequisites so add a target
rule to still depend on gen_pack_header.py
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Use GALLIVM_DEBUG=dumpbc for dumping of modules as bitcode.
Instead of a fixed llvmpipe.bc name, use ir_<modulename>.bc so multiple
modules can be dumped (albeit it might still overwrite previous modules,
particularly the modules from draw tend to always have the same name).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch fixes this build error.
CXX rasterizer/memory/libswrAVX_la-ClearTile.lo
In file included from rasterizer/memory/ClearTile.cpp:34:0:
./rasterizer/memory/Convert.h: In function ‘uint16_t Convert32To16Float(float)’:
./rasterizer/memory/Convert.h:170:9: error: ‘__builtin_isnan’ is not a member of ‘std’
if (std::isnan(val))
^
./rasterizer/memory/Convert.h:170:9: note: suggested alternative:
<built-in>: note: ‘__builtin_isnan’
./rasterizer/memory/Convert.h:176:14: error: ‘__builtin_isinf_sign’ is not a member of ‘std’
else if (std::isinf(val))
^
./rasterizer/memory/Convert.h:176:14: note: suggested alternative:
<built-in>: note: ‘__builtin_isinf_sign’
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95180
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Otherwise constants which aren't live get an undefined constant location.
When we go to set up param and pull_param we end up assigning all unused
uniforms to slot 0. This cases the Vulkan driver to segfault because it
doesn't have pull_param.
This fixes bugs in the Vulkan driver introduced in c3fab3d000.
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
This fixes a crash (but not the test):
GL45-CTS.shader_texture_image_samples_tests.functional_test
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The calculated limit gave problems on SI as it was > 32 KiB
and the hardware LDS size on SI is only 32 KiB. It isn't
correct anyway when processing multiple patches in a threadgroup.
As we potentially have any number of patches such that the
used LDS is at most the hardware LDS size, and exact size
per patch is not known at compile time, this seems like
the only valid bound.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We index into these based on var->data.driver_location, which might have
gaps (ie. two inputs, one w/ drvloc 0 and other 2). This shows up in
(for example) 'bin/copyteximage 1D', but was only noticed recently due
to additional asserts.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Continue using ADD in the other case because a fragment shader backend
could fuse the ADD with a MUL to generate a MAD for ((x && y) || z).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There is nothing left that can generate them. These used to be
generated by ir_to_mesa or by the assembler for various NV extensions
that have been removed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Nothing that consumes the output of this backend consumes them
navtively. This is *not* the way i915 has implemented these
instructions, but, as far as I am able to tell, this is the way both the
Cg compiler and the HLSL compiler implement these operations.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Nothing that consumes the output of this backend consumes them
navtively. This is the way i915 has implemented these instructions
since it began consuming GLSL.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Compute support seems to be pretty stable now, and according to piglit
it doesn't seem to break 3D state.
As a side effect, this will expose ARB_compute_shader on GK110/GK208.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This prevents IB rejections due to insane memory usage from
many concecutive texture uploads.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This implements:
- Linear-to-linear partial copies. (unaligned)
- Tiled-to-linear and linear-to-tiled partial copies.
(unaligned except 1-2 Bpp)
- Tiled-to-tiled partial copies aligned to 8x8.
v2: Extend the SDMA L2T VM fault workaround to T2L.
- Same algorithm, just applied to T2L.
(and using a 0-based address and surface.bo_size instead of buf->size)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Most of this has never worked according to the new test.
The new code will be radically different.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2: - adjustments for exercising all important SDMA code paths
- decrease the probability of getting huge sizes (faster testing)
- increase the probability of getting power-of-two dimensions
- change the memory cap to 128MB (faster testing)
- better detect which engine has been used
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Those aren't really interesting, however outputting them is helpful when
trying to feed the IR to llvm llc (or opt) for debugging.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We don't target this yet, and some llvm versions incorrectly enable it based
on cpu string, causing crashes.
(Albeit this is a losing battle, it is pretty much guaranteed when the next
new feature comes along llvm will mistakenly enable it on some future cpu,
thus we would have to proactively disable all new features as llvm adds them.)
This should fix https://bugs.freedesktop.org/show_bug.cgi?id=94291 (untested)
Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com
CC: <mesa-stable@lists.freedesktop.org>
This reverts commit 99474dc29b.
-Wpedantic is too verbose, even when applied to just a few includes.
We'll just have to deal with the issues as they come.
Reviewed-by: Brian Paul <brianp@vmware.com>
In that case, the writes need two times the size of a 32-bit value.
We need to adjust the exec_size, so it is not breaking any hardware
rule.
v2:
- Add an assert to verify type size is not less than 4 bytes (Jordan).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The constants could be double, and it was allocating size for float types
for the destination register of varying pull constant loads.
Then the fs_visitor::validate() will complain.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
When there is a mix of definitions of uniforms with 32-bit or 64-bit
data type sizes, the driver ends up doing misaligned access to double
based variables in the push constant buffer.
To fix this, this patch pushes first all the 64-bit variables and
then the rest. Then, all the variables would be aligned to
its data type size.
v2:
- Fix typo and improve comment (Jordan).
- Use ralloc(NULL,...) instead of rzalloc(mem_ctx,...) (Jordan).
- Fix typo (Topi).
- Use pointers instead of references in set_push_pull_constant_loc() (Topi).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Usually, writes to a subreg_offset > 0 would also have a stride > 1
and we would recognize them as partial, however, there is one case
where this does not happen, that is when we generate code for 64-bit
imemdiates in gen7, where we produce something like this:
mov(8) vgrf10:UD, <low 32-bit>
mov(8) vgrf10+0.4:UD, <high 32-bit>
and then we use the result with a stride of 0, as in:
mov(8) vgrf13:DF, vgrf10<0>:DF
Although we could try to avoid this issue by producing different code
for this by using writes with a stride of 2, that runs into other
problems affecting gen7 and the fact is that any instruction that
writes to a subreg_offset > 0 is a partial write so we should really
recognize them as such.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When the original instruction had a stride > 1, the combined registers
written by the split instructions won't amount to the same register space
written by the original instruction because the split instructions will
use a stride of 1. The current code assumed otherwise and computed the
number of registers written by split instructions as an equal share based
on the relation between the lowered width and the original execution size
of the instruction.
It is only after the split, when we interleave the components of the result
from the lowered instructions back into the original dst register, that the
original stride takes effect and we write all the registers specified by
the original instruction.
Just make the number of register written the same as the vgrf space we
allocate for the dst of the split instruction.
Fixes crashes in fp64 tests produced as a result of assigning incorrectly the
number of registers written by split instructions, which led to incorrect
validation of the size of the writes against the allocated vgrf space.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since it no longer handles conversions from double to float but from
double to various other 32-bit types.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Probably not needed since we fix the dst type of comparisons
automatically, but for consistency with the rest of null_reg_*
functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add asserts so we remember to address this when we enable 64-bit
integer support, as suggested by Connor and Jason.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In the case of the pack opcode we are already doing the
lowering in NIR, so no need to do it here. The unpack opcode
operates on scalars, so it should not be lowered.
In the case of frexp_sig and frexp_exp, they are lowered in
lower_instructions, so we don't have to care about them.
All the remaining opcodes involve conversions from and to doubles
and are business as usual.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to do this late, in order to avoid partial writes during the
optimization loop.
v2: Use subscript() instead of stride().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Sam):
- LOAD_PAYLOAD treats each header source as a 32B block
regardless of the datatype. Drop the change (Curro)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The destination has to have the same source as the type, or else the
simulator will complain. As a result, we need to emit a CMP that
outputs a 64-bit wide result and then do a strided MOV to pick out the
low 32 bits of each channel.
v2: Use subscript() instead of stride() (Curro)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The HW has a restriction that only vertical stride may cross register
boundaries. Previously, this only mattered for SIMD16 instructions where
we needed to use the same regioning parameters as the equivalent SIMD8
instruction but double the exec size. But we need to do the same
splitting for 64-bit instructions as well as instructions with a stride
of 2 (which effectively consume 64 bits per element). Fix up the code to
do the right thing instead of special-casing SIMD16.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Uniform doubles will read two registers, in which case we need to mark
both as being live.
v2 (Sam):
- Use a formula to get the number of registers read with proper
units (Curro).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This makes things more consistent, and also fixes the offset calculation
for double uniforms.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we are actually unpacking from a double that we have previously
packed from its 32-bit components we can bypass the pack operation
and source from its arguments directly.
v2 (Sam):
- Fix line overflow (Topi)
- Bail if the parent instruction's source is not SSA (Connor)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we are actually creating a double using values obtained from a
previous unpack operation we can bypass the unpack and source from
the original double value directly.
v2:
- Style changes (Topi)
- Bail is parent instruction's src is not SSA (Connor)
v3: Use subscript() instead of stride() (Curro)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
They can only be used with 1-src instructions, which practically (since
we should've constant-propagated away all 1-src instructions with 64-bit
immediates in NIR) means that they must be kept in separate MOV's and
can't be propagated.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Iago):
- Squashed bits from 'support double precission constant operands for
the implementation of 64-bit emit_load_const'.
- Do not use BRW_REGISTER_TYPE_D for all 32-bit registers since that breaks
asserts and functionality for some piglit tests. Just keep 32-bit types
untouched and add 64-bit support.
- Use DF instead of Q for 64-bit registers. Otherwise the code we generate
will use Q sometimes and DF others and we hit unwanted DF/Q conversions,
so always use DF.
v3 (Sam):
- Mark 'reg_type' occurrences as const (Topi).
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Tapani Palli <tapani.palli@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Sam):
- Mark 'size' as const (Topi).
- Add comment to explain that we do copies 64-bits regardless of the
type (Topi)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2 (Connor): rebase on master which moved this to brw_link.cpp
v3 (Sam):
- Only enable DFREXP_DLDEXP_TO_ARITH in process_glsl_ir(). This is
used for doubles. Single floating point op is lowered by NIR.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2: also lower trunc, ceil, floor, fract and roundEven (Iago)
v3: also lower mod for doubles (Sam)
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Lower lrp when operating with double operands because float version of
lrp is also lowered.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Broadwell and previous generations does not support lrp instruction
operating with doubles.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Bump up default percentage of commits required to be auto-picked for CC.
Seems from a bit of trial-and-error to come up with a more reasonable
list of CC's this way.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
As far as I can tell, this was just entirely missing...honestly, I'm
not sure how anything worked at all.
Caught by noticing GPU hangs in image load store tests with scalar TCS,
but probably has broader implications.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
fs_visitor::emit_urb_writes skips writing the VUE header for shaders
that don't write gl_PointSize, gl_Layer, or gl_ViewportIndex. This
leaves their values uninitialized. Kristian's nearby comment says:
"But often none of the special varyings that live there are written
and in that case we can skip writing to the vue header, provided the
corresponding state properly clamps the values further down the
pipeline."
However, we were clamping gl_ViewportIndex to [0, 15], so we would end
up using a random viewport. To fix this, detect when the shader doesn't
write gl_ViewportIndex, and clamp it to [0, 0].
The vec4 backend always writes zeros to the VUE header, so it doesn't
suffer from this problem. With vec4-style HWord writes, we can write
the header and position together in a single message. In the FS world,
we would need 4 extra MOVs of 0 and a longer message, or a separate
OWord write. It's likely cheaper to just clamp the value.
Fixes DiRT Showdown and Bioshock Infinite, which only rendered half of
the screen - the lower left of two triangles.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93054
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For Gen6 through Haswell dword 1 is MBZ. In gen 8 it becomes part of
the 64-bit address.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
My old implementation accumulated <start, end> pairs in a buffer,
and eventually processed that data on the CPU. This meant flushing
the batchbuffer and waiting for it to completely execute before we
could map it, resulting in really long stalls. We could also run out
of space in the buffer, and have to do this early.
Instead, we can use Haswell's MI_MATH command to do the (end - start)
subtraction, as well as the multiplication by 2 or 3 to convert from
the number of primitives written to the number of vertices written.
We still need to CS stall to read the counters, but otherwise everything
is completely pipelined - there's no CPU<->GPU synchronization required.
It also uses only 80 bytes in the buffer, no matter what.
Improves performance in Manhattan on Skylake GT3e at 800x600 by
6.1086% +/- 0.954166% (n=9). At 1920x1080, improves performance
by 2.82103% +/- 0.148596% (n=84).
v2: Fix number of primitives -> number of vertices calculation for
GL_TRIANGLES (I was multiplying by 4 instead of 3.) Caught by
Jordan Justen.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
It appears that we can't do this in a single command (like we do for
MI_LOAD_REGISTER_IMM) - the Skylake simulator gets rather grumpy about
the command length if I try to combine them. No matter.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
On Haswell, we need version 6 of the kernel command parser in order to
write the math registers. Our implementation of ARB_query_buffer_object
heavily relies on MI_MATH, so we should only advertise it when MI_MATH
is available. We also need MI_LOAD_REGISTER_REG, which requires version
7 of the command parser.
To make these checks easier, introduce a screen->has_mi_math_and_lrr
flag that will be set when both commands are supported.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This fixes some cases in the CTS KHR debug tests where it uses
glIsTexture to find an invalid ID and then call GetObjectLabel.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we try to draw or query an XFB object that hasn't been bound,
we shouldn't return any information.
This fixes a couple if cases in:
GL33-CTS.transform_feedback.api_errors_test
The ObjectLabel test is inspired by another test.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since this is potentially modifying the block structure of the shader,
it needs the _safe() version of the iterator.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We request more than 32KB of LDS here, which SI doesn't have. Since LLVM
recently started checking the size of declared LDS allocations, all shaders
involved in tesselation fail to compile on SI.
Note that the entire calculation here seems wrong, given how we calculate
indices for generic attributes, so the number ends up wrong on CI+ as well.
A proper solution is clearly needed, but this patch should serve as a band-aid
for SI in the meantime.
Also note that the real size of the LDS allocation in hardware is independent
from what we tell LLVM, so this is really more of a "cosmetic" change.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95198
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Experiments with framebuffer-no-attachments type draw calls have shown that
NULL exports stall terribly unless we ensure that export memory is allocated
by the SPI.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
LLVM when configured with "intel jitevents" enabled can inform
VTune about dynamic code, so individual shaders are attributed
profiling data and the resulting assembly can be examined.
Acked-by: Roland Scheidegger <sroland@vmware.com>
There are a total of four possible currently, rather than 2. So we need
to be prepared for the input array to grow by 16 components. We could
get away with less if we could pack sysval inputs.. and the way this is
handled currently isn't really the nicest thing. But it's a tactical
fix for an issue hit in:
GL31-CTS.gtf30.GL3Tests.transform_feedback.transform_feedback_vertex_id
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Several NIR scripts were using `from ... import ...` syntax, which wasn't
supported.
Using Python standard libary's modulefinder solves the problem with less
effort and hacks.
Reviewed-by: Brian Paul <brianp@vmware.com>
This commit broke Weston, Mutter, and xf86-video-modesetting, on KMS.
In order to use Y-tiled buffers, the kernel requires the tiling mode to
be explicitly named through the I915_FORMAT_MOD_Y_TILED AddFB2 modifier;
it disallows any attempt to infer the buffer's tiling mode.
As the GBM API does not have a way to extract modifiers for a buffer,
this commit broke all users of GBM on SKL+. Revert it for now, until we
get a way to extract modifier information from GBM, and also let GBM
users inform the implementation that it intends to use the modifiers.
This reverts commit 6a0d036483.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Hans de Goede <hdegoede@redhat.com>
We want to use interface_type, not vtn_var->type. They're normally
equivalent, but for geometry/tessellation per-vertex interface arrays,
we need to unwrap a level.
Otherwise, we tried to iterate a structure members but instead used
an array length. If the array length was longer than the number of
fields in the structure, we'd crash.
Fixes the CreatePipelineGeometryInputBlockPositive layer validation
test.
v2: Just use glsl_without_array() on the vtn_var type
(requested by Jason Ekstrand).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
One of these is an unsigned bitfield, which I suspect is a false positive, but
gcc 5.3.1 complains about it with -fsanitize=undefined.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Shifting into the sign bit of a signed int is undefined behavior.
Unfortunately, there are potentially many places where this happens using
the register macros.
This commit is the result of running
sed -ie "s/(((\(\w\+\)) & 0x\(\w\+\)) << \(\w\+\))/(((unsigned)(\1) \& 0x\2) << \3)/g"
on all header files in gallium/{r600,radeon,radeonsi}.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Funnily enough, some of these were turned into a compile-time error by gcc
with -fsanitize=undefined ("initializer is not a constant").
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Since 1 << 31 complains about undefined behaviour; the others are changed
only for consistency.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
No sure where the 36 came from, but we clearly need at least
48 bytes per attribute per primitive.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will be used for resetting the uniform stream in the presence of
branching, but may also be useful as an optimization to reduce how many
uniforms we have to copy out per draw call (in exchange for increasing
icache pressure).
Seeing the expansion of a QPU_GET_FIELD in an assert isn't very
informative, and it's hard find what's going wrong without getting a dump
of the instruction that failed.
We should have already emitted a NOP due to the last instruction being a
TLB or VPM write. However, if you disable dead code elimination then you
might get dead code at the end, and that dead code might have the signal
bits set to something non-default, at which point you die in assertion
failure.
Vulkan always sets this. It only affects in-place Z decompression.
This is recommended for performance, but what app uses MSAA depth
texturing?
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
There are a few different fixups that we have to do for texture
destinations that re-arrange channels, fix hardware vs. API mismatches, or
just shrink the result to fit in the NIR destination. These were all being
done in a somewhat haphazard manner. This commit replaces all of the
shuffling with a single LOAD_PAYLOAD operation at the end and makes it much
easier to insert fixups between the texture instruction itself and the
LOAD_PAYLOAD.
Shader-db results on Haswell:
total instructions in shared programs: 6227035 -> 6226669 (-0.01%)
instructions in affected programs: 19119 -> 18753 (-1.91%)
helped: 85
HURT: 0
total cycles in shared programs: 56491626 -> 56476126 (-0.03%)
cycles in affected programs: 672420 -> 656920 (-2.31%)
helped: 92
HURT: 42
The fs_visitor::emit_texture helper originated when we still had both NIR
and IR visitors for the FS backend. Since the old visitor was removed,
emit_texture serves no real purpose beyond arbitrarily splitting
heavily-linked code across two functions.
Normally, we expect SIMD8 shaders to be more instructions than SIMD4x2
shaders, as it takes four instructions to operate on a vec4, rather than
a single instruction. However, the benefit is that it can process 8
objects per shader thread instead of 2.
Surprisingly, the shader-db statistics show an improvement in both
instruction and cycle counts:
Synmark: -31.25% instructions, -29.27% cycles, 0 hurt.
Tessmark: -36.92% instructions, -37.81% cycles, 0 hurt.
Unigine Heaven: -3.42% instructions, -17.95% cycles, 0 hurt.
Shadow of Mordor:
+13.24% instructions (26 with fewer instructions, 45 with more),
-5.23% cycles (44 with fewer cycles, 27 with more cycles).
Presumably, this is because the SIMD8 URB messages are a much more
natural fit than the SIMD4x2 URB messages - there's a ton less header
setup.
I benchmarked Shadow of Mordor and Unigine Heaven on my Skylake GT3e,
and the performance seems to be the same or increase ever so slightly
(< 1 FPS difference). So I believe it's strictly superior.
There's also a lot more optimization potential we can do in scalar mode.
This will also help us finish fp64 support, as scalar support is going
to land much sooner than vec4-mode support.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There are a couple of cycle count changes in shader-db, but it's
basically a wash.
However, with the Broadwell scalar TCS backend enabled, many
Shadow of Mordor shaders benefit from this patch. Because we don't
batch up output writes for TCS, vec4 outputs might not have all
components defined. Many output writes have a value of undef,
which is useless.
With scalar TCS, stats for tessellation shaders on Broadwell:
total instructions in shared programs: 1283000 -> 1280444 (-0.20%)
instructions in affected programs: 34302 -> 31746 (-7.45%)
helped: 71
HURT: 0
total cycles in shared programs: 10798768 -> 10780682 (-0.17%)
cycles in affected programs: 158004 -> 139918 (-11.45%)
helped: 71
HURT: 0
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
shader-db statistics on Broadwell:
total instructions in shared programs: 8963409 -> 8962455 (-0.01%)
instructions in affected programs: 60858 -> 59904 (-1.57%)
helped: 318
HURT: 0
total cycles in shared programs: 71408022 -> 71406276 (-0.00%)
cycles in affected programs: 398416 -> 396670 (-0.44%)
helped: 199
HURT: 51
GAINED: 1
The only shaders affected were in Dota 2 Reborn.
It also sets up for the next optimization.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This better reflects what it does. I plan to add other ALU
optimizations as well, so the old name would be confusing.
In preparation for that, also move the file comments about csels
above the opt_undef_csel function, and delete the ones about there
not being other optimizations.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
According to Timothy, using program_string_id == 0 to identify the
passthrough TCS is going to be problematic for his shader cache work.
So, change it to strcmp() the name at visitor creation time.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fix windows in 32-bit mode when hyperthreading is disabled on Xeons.
Some support for asymmetric processor topologies.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This tells LLVM to always use SMEM loads for descriptors. It fixes a
regression in piglit's arb_shader_storage_buffer_object/execution/indirect.shader_test
that was caused by LLVM r268259 (but the proper fix is really here in Mesa).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Beginning with commit 7b208a73, Unigine Valley began hanging the GPU on
Gen >= 8 platforms.
Evidently that commit allowed the scheduler to make different choices
that somehow finally ran afoul of a hardware bug in which POW and FDIV
instructions may not be followed by an instruction with two destination
registers (including compressed instructions). I presume the conditions
are more complex than that, but the internal hardware bug report (BDWGFX
bug_de 1696294) does not contain much more information.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94924
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> [v1]
Tested-by: Mark Janes <mark.a.janes@intel.com> [v1]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When gathering query results, swr_gather_stats was
unnecessarily stalling the entire pipeline. Results are now
collected asynchronously, with a fence marking completion.
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
The assert was null checking dest_arr_parent twice. The intention
seems to be to check both dest_ and src_.
Added in d3636da9
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Silences warnings with 32-bit Linux gcc builds and MinGW which doesn't
recognize the ‘t’ conversion character.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
v2:
* Declare loop index variable at loop site (idr)
* Make arrays of MI_MATH instructions 'static const' (idr)
* Remove commented debug code (idr)
* Updated comment in set_query_availability (Ken)
* Replace switch with if/else in hsw_result_to_gpr0 (Ken)
* Only divide GL_FRAGMENT_SHADER_INVOCATIONS_ARB by 4 on
hsw and gen8 (Ken)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This matches the byte based offset of brw_load_register_mem*.
The function is also moved into intel_batchbuffer.c like
brw_load_register_mem*.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Pretty much only happens if shader variant compile fails. But in this
case, if we haven't emitted cmdstream, we don't want to set needs_flush.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This was always a bit overly complicated, and had some issues (like
ctx->prog.dirty not getting reset at the end of the batch). It also
required some special hacks to avoid resetting dirty state on binning
pass. So just move it all into ctx->dirty (leaving some free bits
for future shader stages), and make FD_DIRTY_PROG just be the union
of all FD_SHADER_DIRTY_*.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Ofc won't catch *all* faults, but at least helpful for catching offsets
which are completely bogus.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Now that the opc's encode the instruction category (making them unique)
we no longer need to check the category in addition to the opc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The instruction encoding allows for more registers, but at least on
a3xx/a4xx they don't actually exist.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Helps reduce register pressure and instruction counts for immediates
that would otherwise require a mov into gpr.
total instructions in shared programs: 4455332 -> 4369297 (-1.93%)
total dwords in shared programs: 8807872 -> 8614432 (-2.20%)
total full registers used in shared programs: 263062 -> 250846 (-4.64%)
total half registers used in shader programs: 9845 -> 9845 (0.00%)
total const registers used in shared programs: 1029735 -> 1466993 (42.46%)
half full const instr dwords
helped 0 10415 0 17861 5912
hurt 0 1157 21458 947 33
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Copied from linux kernel (where it is called MAINTAINERS and
get_maintainer.pl), with minimal changes to script (to recognize
mesa src tree rather than linux kernel src tree, and to avoid
accidentaly CC'ing Linus Torvalds on mesa patches), and slimmed
down MAINTAINER file syntax to recognize that we don't really have
subsystem "maintainers" in the same sense as the linux kernel (ie. no
different mailing lists and git trees per subsystem).
The main point is to automate slapping on the correct CC's for patches
via git's --cc-cmd feature, more than anything else.
I didn't attempt to fully populate the REVIEWERS file, by a long shot.
This is an opt-in system and anyone else can add their own entries.
To utilize:
git send-email --cc-cmd ./scripts/get_reviewer.pl ...
or to configure it to be the default:
git config sendemail.cccmd ./scripts/get_reviewer.pl
Signed-off-by: Rob Clark <robclark@freedesktop.org>
When computing the offset in the uniform storage table, take into account
the size multiplier so double precision matrices are handled correctly.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Split 32-bit and 64-bit fmod lowering as the drivers might need to
lower them separately inside NIR depending on the HW support.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
There are rounding errors with the division in i965 that affect
the mod(x,y) result when x = N * y. Instead of returning '0' it
was returning 'y'.
This lowering pass fixes those cases.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
GEN_LT has a straightforward implementation on which we can build the
GEN_GE and GEN_LE macros.
Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For opcodes that changed meaning on different generations, we store a
pointer to a secondary table and the table's size in a tagged union in
place of the mnemonic and number of sources.
Acked-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The previous commit replaced direct uses of opcode_descs with calls to
the wrapper function, which should be the only method of accessing
opcode_descs's data. As a result gen_from_devinfo() can also be made
static.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I merged opcode_desc into inst_info (instead of the other way around)
because inst_info was sorted by opcode number.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Drop the uses of 'enum gen' to a plain int, so that we don't have to
expose the bitfield definitions and GEN_GE/GEN_LE macros to other users
of brw_eu.h. As a result, s/.gen/.gens/ to avoid confusion with
devinfo->gen.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The function takes a device info struct as argument in addition to the
opcode number in order to disambiguate between multiple opcode_desc
entries for different instructions with the same opcode number.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [v1]
[v2] mattst88: Put brw_opcode_desc() in brw_eu.c instead of moving it
there in a later patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v2]
[v3] mattst88: Return NULL if opcode >= ARRAY_SIZE(opcode_descs)
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is not strictly required for the following changes because none
of the three-source opcodes we support at the moment in the compiler
back-end has been removed or redefined, but that's likely to change in
the future. In any case having hardware instructions specified as a
pair of hardware device and opcode number explicitly in all cases will
simplify the opcode look-up interface introduced in a subsequent
commit, since the opcode number alone is in general ambiguous.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A future series will implement support for an instruction that happens
to have the same opcode number as another instruction we support
already on a disjoint set of hardware generations. In order to
disambiguate which instruction it is brw_instruction_name() will need
some way to find out which device we are generating code for.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unlike most shader stages, the Hull Shader hardware makes us explicitly
tell it how many threads to dispatch and manually configure the channel
mask. One perk of this is that we have a lot of flexibility - we can
run it in either SIMD4x2 or SIMD8 mode.
Treating it as SIMD8 means that shaders with 8 or fewer output vertices
(which is overwhemingly the common case) can be handled by a single
thread. This has several intriguing properties:
- Accessing input arrays with gl_InvocationID as the index is a simple
SIMD8 URB read with g1 as the header. No indirect addressing required.
- Barriers are no-ops.
- We could potentially do output shadowing to combine writes, as the
concurrency concerns are gone. (We don't do this yet, though.)
v2: Drop first_non_payload_grf change, as it was always adding 0
(caught by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I'm about to implement a scalar TCS backend, and I'd rather not
duplicate all of this code there.
One change is that we now write the tessellation levels from all
TCS threads, rather than just the first. This is pretty harmless,
and was easier. The IF/ENDIF needed for that are gone; otherwise
the generated code is basically identical.
I chose to emit load/store intrinsics directly because it was easier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
If we fail to create a context in the VMware driver we call this function
unconditionally to free a bunch of bit vectors. Instead of asserting on
a null pointer, just no-op.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
If, for example, we previously had 2 sampler states bound and now we
are binding one, we'd leave the second sampler state unchanged.
This change nulls-out the second sampler state in this situation.
We're already doing the same thing for sampler views.
This silences an occasional warning issued by the VMware driver when
the number of sampler views and sampler states disagreed.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This silences some warnings when we try to sample from surfaces that were
created for drawing, such as when blitting from one of the framebuffer
surfaces. We were already doing the opposite situation (adding a bind
flag for rendering to surfaces declared as texture sources).
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Like cube maps, we need to convert the z information to a layer index.
Also rename the *_face vars to *_face_layer to make things a little more
understandable.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
metric-issue_slot_utilization and metric-branch_efficiency are already
computed as percentages.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This will allow to use percentages for some metrics because the Gallium
HUD doesn't allow to display floating point numbers and 0 is printed
instead.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This is most likely a copy-paste error when I reworked this area few
weeks ago. For SM20, metric-issue_slots is equal to inst_issued because
there is only one pipeline, so the metric is not exposed there.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reported-by: Karol Herbst <nouveau@karolherbst.de>
This fixes two of the cases in
GL43-CTS.shader_subroutine.subroutines_not_allowed_as_variables_constructors_and_argument_or_return_types
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
resource just appears in GLSL 4.20 without any fanfare.
Fixes GL43-CTX.CommonBugs.CommonBug_ReservedNames
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes
GL43-CTS.copy_image.samples_missmatch
which otherwise asserts in the radeonsi driver.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This prevents GL43-CTS.khr_debug.labels_non_debug from
memcpying all over the stack and crashing.
v2: actually fix the test.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
GL43-CTS.texture_view.errors checks for GL_INVALID_VALUE
here but we catch these problems in the dimensionsOK check
and return the wrong error value.
This fixes:
GL43-CTS.texture_view.errors.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
this is a leftover from the days when depth-stencil buffers were
allocated by the DDX
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This will allow us to simplify a lot of code around tiling.
Kernel 3.10 is required for SI support.
Kernel 3.13 is required for CIK support.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This might be useful to avoid breaking the current compute state when
monitoring MP perf counters because we use a compute kernel to read out
those counters. This has been initially suggested by Ilia Mirkin.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
It's not doing anything according to shader-db now that we're using NIR.
It would have had to be reworked significantly anyway, to handle control
flow.
We were generating piles of FRAG_W for interpolation, only to CSE them
away immediately. Since this is the only thing that CSE is doing for us
any more, just avoid making the CSE work necessary.
This is one of two uses of the current QIR CSE pass according to
shader-db. The NIR pass means that we'll end up doing Newton-Raphson on
our RCP, which we weren't doing before, but that's probably actually a
good thing.
That format has first_non_void < 0. This fixes a regression in piglit
arb_shader_image_load_store-semantics that was introduced by commit 76b8c5cc60,
while hopefully still shutting Coverity up (and failing in a more obvious way
if a similar error should re-appear).
Reviewed-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The GL4.1 spec bumps this to 2048, so we should do so.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The file(s) within are already picked thanks to the build rule of the
respective test. No need to have the folder in EXTRA_DIST.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
From Section 7.3.1.1 (Naming Active Resources) of the OpenGL 4.5 spec:
"For the property LOCATION_COMPONENT, a single integer indicating the first
component of the location assigned to an active input or output variable is
written to params. For input and output variables with a component specified
by a layout qualifier, the specified component is written. For all other
input and output variables, the value zero is written."
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I don't think this will do much as it's a compiler error
to use component without location which is already in the
table but its good to be consistent.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This change checks for component overlap, including handling overlap of
locations and components by doubles. Previously there was no validation
for assigning explicit locations to a location used by the second half
of a double.
V3: simplify handling of doubles and fix double component aliasing
detection
V2: fix component matching for matricies
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add support for creating images from Android native buffers with dma-buf
fd. As dma-buf support also requires DRI image loader extension, add
that as well.
This is based on several originally patches written by Varad Gautam.
I've collapsed them into logical changes and done a bit of reformatting.
Using dma-bufs vs. GEM handles is now a runtime decision similar to the
wayland EGL instead of being compile time selection. The dma-buf support
is also re-written to use common dri2_create_image_dma_buf function in
egl_dri2.c.
Cc: Varad Gautam <varadgautam@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
In preparation to use the same code for dma-bufs, factor out the code to a
separate function.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Use of __DRI_DRI2_LOADER extension is only supported for card nodes. In
order to support dmabufs, Android will be moving to using render nodes and
we need to disable the DRI2 loader extension.
This is based on the Wayland EGL code.
Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Different versions of make behave differently in whether $(wildcard) sorts
the results or not. The Android build now explicitly sorts
all-named-subdir-makefiles which breaks the build because src/gallium
must be included after src/mesa/drivers/dri.
The Android build system doesn't support doing "include $(call
all-named-subdir-makefiles,...)" twice, so rework things by generating
the included makefile list and including them in 2 steps.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
OpenGL 4.5 Core Profile section 7.1, in the documentation for
CompileShader, says: "Changing the source code of a shader object with
ShaderSource does not change its compile status or the compiled shader
code."
According to Karol Herbst, the game "Divinity: Original Sin - Enhanced
Edition" depends on this odd quirk of the spec.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93551
Signed-off-by: Jamey Sharp <jamey@minilop.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Namely the python script, the ICD header and private headers. We could
get the system version of the ICD ones, although there is no .pc file to
easily locate and/or manage them.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Otherwise we risk picking the possibly outdated file in the source dir
over the fresh one in the builddir.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
It should ease people with all the interaction and platforms and how
they interact (at least from a build POV) with each other.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Similar to earlier commit - move all the common bits into a single
place, thus improving readability and allowing us to see what's missing.
Also don't forget to add the missing bits. This commit should allows us
to build wayland only vulkan ;-)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Rather than having things split out in multiple places, consolidate it
and add all the missing bits. Also ensure that we use the already built
static library libwayland-drm.la.
v2 Add missing '\' in the CFLAGS.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
... rather than having duplicates files through the sources lists.
Splitting things as is, has the side effect of making things clearer and
easing a potential android build. The latter of which automatically adds
BUILT_SOURCES to the binary.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Recent commit removed the winsys defines from anv_private.h thus
breaking the tests. To fix that and avoid it in the future, merge the
tests makefile in the libvulkan one.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Introduce a static library libvulkan_common.la that is used by
libvukan_intel.la and libvulkan_test.la.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Will allow others to reuse the lists (scons/android anyone ?) and makes
the file a lot shorter and easier to read.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Copy/paste from the rest of mesa, but namely.
- The module should be shared only.
- We don't need the explicit ".so", as the vulkan loader will retrieve
the full filename from the json
- No unresolved symbols in the final binary
- Use the linker garbage collector to slim down the final binary.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
It's used only by dev_icd.json so just call it that way. While we're
here, manually expand $< (as it might cause issue on some systems)
and drop the unneeded install_libdir substitution.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
With earlier commit we've moved a few functions and changing the
argument type from _EGLDisplay * to struct dri2_egl_display *.
The latter is effectively a wrapper around the former, thus
functionality was preserved, although GCC rightfully warned us about the
misuse.
Add a simple wrapper that casts and propagates the correct type.
Fixes: 9bbf3737f9 ("egl/x11: authenticate before doing chipset id
ioctls")
Cc: "11.2 11.1" <mesa-stable@lists.freedesktop.org>
Reported-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Instead of cascading support for various different implementations of
GLX, all three options are now specified through the --enable-glx
option:
--enable-glx=dri : Enable the DRI-based GLX
--enable-glx=xlib : Enable the classic Xlib-based GLX
--enable-glx=gallium-xlib : Enable the gallium Xlib-based GLX
--enable-glx[=yes] : Defaults to dri if DRI is enabled, else
gallium-xlib if gallium is enabled, else
xlib
This removes the --enable-xlib-glx option and fixes a bug in which both
the classic xlib-glx and gallium xlib-glx implementations were getting
built causing different versioned and conflicting libGL libraries to be
installed.
v2: Changes from various review feedback from Emil:
a) Fixed typos
b) Corrected help docs for new option
c) Added appropriate a-b and r-b tags in commit msg
d) Fixed various GLX related dependency checks.
v3: Rebased to current master and added changelog in commit msg
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94086
Acked-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Parenthesis are needed here as ! takes precedence over the &. The
check had the opposite effect than intended.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now that there is a pass to do this in NIR, lets just use that and
manage the variants ourself, rather than letting state-tracker do it.
This way, mesa/st will precompile shaders without requiring
ST_DEBUG=precompile (which requires a debug build).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This fixes the winsys->cs_is_buffer_referenced query, which is used for
synchronization before buffers are mapped.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When index - t->temps_size is greater than 4096, allocating space for
temporaries on demand will miserably crash. This can happen when a game
uses a lot of temporaries like the recent released Tomb raider.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
lower_vector_derefs can produce new vector_extract operations.
Neither i965 nor st_glsl_to_tgsi can handle them, so we'd best
convert them to swizzles.
Together with the previous patch, this fixes assertion failures in
GLideN64, as well as a new Piglit test which reproduces the issue:
spec/glsl-1.10/compiler/vector-dereference-in-dereference.frag
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95164
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In optimized builds, visit(ir_expression *) experiences inlining with gcc that
leads the function to have a roughly 32KB stack frame. This is a problem given
that the function is called recursively. In non-optimized builds, the stack
frame is much smaller, hence one gets crashes that happen only in optimized
builds.
Arguably there is a compiler bug or at least severe misfeature here. In any
case, the easy thing to do for now seems to be moving the bulk of the
non-recursive code into a separate function. This is sufficient to convince my
version of gcc not to blow up the stack frame of the recursive part. Just to be
sure, add the gcc-specific noinline attribute to prevent this bug from
reoccuring if inliner heuristics change.
v2: put ATTRIBUTE_NOINLINE into macros.h
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95133
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95026
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92850
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Properly handle Target and Format parameters when present.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Previously, there was a bug where nospace wasn't signalled if it just so
happened that the very last print exceeded the available space.
Reviewed-by: Dave Airlie <airlied@redhat.com>
There is no sense in having the double version of ldexp take a 64-bit
integer. Instead, let's just take a 32-bit int all the time. This also
matches what GLSL does where both variants of ldexp take a regular integer
for the exponent argument.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The new expressions are more explicit in terms of where the bits go so it's
a little easier to tell what's going on. This is the way GLSL specifies
things so it's a bit easier to verify too. It also has the benifit that
the new expressions easily vectorize so we can constant-fold vector forms
of the _split versions correctly.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Fixes dEQP-GLES31.functional subtests:
draw_indirect.negative.command_offset_not_in_buffer_signed32_wrap
draw_indirect.negative.command_offset_not_in_buffer_unsigned32_wrap
These tests use really large values that overflow GLsizeiptr, at
which point the buffer size isn't less than "end".
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95138
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_def(\([^,]*\),\s*\([^,]*\))/nir_foreach_def(\2, \1)/
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_use(\([^,]*\),\s*\([^,]*\))/nir_foreach_use(\2, \1)/
and similar expressions for nir_foreach_use_safe, etc.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_function(\([^,]*\),\s*\([^,]*\))/nir_foreach_function(\2, \1)/
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This matches the "foreach x in container" pattern found in many other
programming languages.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_phi_src(\([^,]*\),\s*\([^,]*\))/nir_foreach_phi_src(\2, \1)/
and a similar expression for nir_foreach_phi_src_safe.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
This matches the "foreach x in container" pattern found in many other
programming languages. Generated by the following regular expression:
s/nir_foreach_instr(\([^,]*\),\s*\([^,]*\))/nir_foreach_instr(\2, \1)/
and similar expressions for nir_foreach_instr_safe etc.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Having one buffer object for input kernel arguments coming from clover
and an other one for OpenGL user uniforms is unnecessary. Using the
uniform_bo object for both GL/CL uniforms avoids to declare a new BO.
This only affects compute programs but it should not hurt anything
because the states are dirtied and data will get reuploaded.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Enlarge the buffer hashlist to prevent large numbers of misses
due to adding more buffers than can be cached in the hashlist.
Ported from winsys/amdgpu: 6373845d98
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Changes:
- don't flush DB for fast color clears
- don't flush any caches for initial clears
- remove the flag from si_copy_buffer, always assume shader coherency
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Ideally we'd have nir.h being included with -Wpedantic too, but it fails
with:
src/compiler/nir/nir.h:754:20: warning: ISO C++ forbids zero-size array ‘src’ [-Wpedantic]
nir_alu_src src[];
^
In file included from src/compiler/nir/glsl_to_nir.cpp:42:0:
src/compiler/nir/nir.h:919:16: warning: ISO C++ forbids zero-size array ‘src’ [-Wpedantic]
nir_src src[];
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
As they are not standard C++ and are not supported by MSVC C++ compiler.
Just have nir_imm_double match nir_imm_float above.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
The old comment was a bit terse. Also, change the function return
type to bool.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Some hardware (i965 on Broadwell generation, for example) does not support
natively the execution of lrp instruction with double arguments.
Add 'lower_flrp64' flag to lower this instruction in that case.
v2:
- Rename lower_flrp_double to lower_flrp64 (Jason)
- Fix typo (Jason)
- Adapt the code to define bit_size information in the opcodes.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
A later patch will add lower_flrp64 option to NIR.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At least i965 hardware does not have native support for ceil on doubles.
v2 (Sam):
- Improve the lowering pass to remove one bcsel (Jason).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At least i965 hardware does not have native support for floor on doubles.
v2 (Sam):
- Improve the lowering pass to remove one bcsel (Jason)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At least i965 hardware does not have native support for truncating doubles.
v2:
- Simplified the implementation significantly.
- Fixed the else branch, that was not doing what we wanted.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Move to compiler/nir (Iago)
v3: Use nir_imm_int() to load the constants (Sam)
v4 (Sam):
- Undo line-wrap (Jason).
- Fix comment (Jason).
- Improve generated code for get_signed_inf() function (Connor).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- Group num_components and bit_size together (Jason)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes a Coverity defect by adding checks to see if a value is negative
before using it to index an array. By checking the value first it makes
the code a bit safer but overall should not have a big impact.
CID: 1355598
Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
(Re-pushing previous fix for clang SVN r265359, which was reverted in
the meantime)
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Return statements in conditional blocks were not having their
output varyings lowered correctly.
This patch fixes the following piglit tests:
/spec/glsl-1.10/execution/vs-float-main-return
/spec/glsl-1.10/execution/vs-vec2-main-return
/spec/glsl-1.10/execution/vs-vec3-main-return
Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Previously, these were functions which took a callback. This meant that
the per-block code had to be in a separate function, and all the data
that you wanted to pass in had to be a single void *. They walked the
control flow tree recursively, doing a depth-first search, and called
the callback in a preorder, matching the order of the original source
code. But since each node in the control flow tree has a pointer to its
parent, we can implement a "get-next" and "get-previous" method that
does the same thing that the recursive function did with no state at
all. This lets us rewrite nir_foreach_block() as a simple for loop,
which lets us greatly simplify its users in some cases. This does
require us to rewrite every user, although the transformation from the
old nir_foreach_block() to the new nir_foreach_block() is mostly
trivial.
One subtlety, though, is that the new nir_foreach_block() won't handle
the case where the current block is deleted, which the old one could.
There's a new nir_foreach_block_safe() which implements the standard
trick for solving this. Most users don't modify control flow, though, so
they won't need it. Right now, only opt_select_peephole needs it.
The old functions are reimplemented in terms of the new macros, although
they'll go away after everything is converted.
v2: keep an implementation of the old functions around
v3 (Jason Ekstrand): A small cosmetic change and a bugfix in the loop
handling of nir_cf_node_cf_tree_last().
v4 (Jason Ekstrand): Use the _safe macro in foreach_block_reverse_call
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes the OpenGLES 3.1 CTS:
* ESEXT-CTS.draw_elements_base_vertex_tests.invalid_mapped_bos
Because this is triggering the error message after the normal API
validation phase, we don't have the API function name available, and
therefore we generate an error message without the draw call name:
Mesa: User error: GL_INVALID_OPERATION in draw call (vertex buffers are mapped)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95142
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Sampling from an ETC2 texture is supported on Bay Trail and
from Gen8 onwards. While ASTC_LDR is supported on Gen9, the
logic to handle such formats has not yet been implemented in
the driver.
Fixes dEQP-VK.api.info.format_properties.compressed_formats.
v2: Enable ETC2 for Bay Trail (Kenneth Graunke)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94896
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit adds a validator that ensures that all expressions passed
through nir_algebraic are 100% non-ambiguous as far as bit-sizes are
concerned. This way it's a compile-time error rather than a hard-to-trace
C exception some time later.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This is consistent with the rename done for the rest of NIR. Currently,
"bool" is the only type specifier used in nir_opt_algebraic.py so this is
really a no-op.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Previously, if an exception was encountered anywhere, nir_algebraic would
just die in a fire with no indication whatsoever as to where the actual bug
is. This commit makes it print out the particular search-and-replace
expression that is causing problems along with the exception. Also, it
will now report all of the errors it finds and then exit at the end like a
standard C compiler would do.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The test was failing to build with "undefined reference to `roundf'" errors,
so Make check on mesa was failing.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes ES31-CTS.gtf.GL31Tests.texture_stencil8.texture_stencil8_multisample.
Current logic divides given layer of one by number of samples (four)
trashing the layer to zero. Layer adjustment is only to be used with
non-interleaved msaa surfaces where samples for particular layer are
in multiple slices.
I copy-pasted a bit of documentation from
brw_blorp.c::brw_blorp_compute_tile_offsets().
Also took the opportunity to fix the comment regarding sampling
as 2D, cube textures are the only exception.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
A semantic error was introduced in a past refactoring that caused the bind
parameter to be passed into the use_reusable_pool parameter of buffer_create.
Since this clearly makes no sense, and there is no clear reason why the
cache _shouldn't_ be used, just use the cache always.
Cc: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Mostly for consistency with the other decompress functions, but note that
in the non-DCC decompress case, the function can now early-out in slightly
more (albeit probably rare) cases.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This happens to "fix" a rendering bug in KotOR2, because it avoids a still
not quite understood bug with MSAA fast stencil clear decompress. For the
stencil clear bug, I have sent a piglit test (arb_texture_multisample-stencil-clear).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93767
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There are some undefined behavior subtleties, so having a function to match
the u_bit_scan_consecutive_range makes sense.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Turns out that this is needed after all to satisfy some strengthened
coherency tests. Depends on support in LLVM, added in r267729.
v2: updated to reflect changes to the LLVM intrinsic
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
BackendPixelRate should be easier to read/maintain now hopefully.
Small perf bump by moving some of the pfn's to inline functions
without template params.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Instead of a hard-coded 512. The query typically returns 65536 now.
Fall back to 512 if the query fails as we do for vertex shaders (which
should never happen).
Note that we don't actually enforce this limit in our shaders but it gets
reported via the glGetProgramivARB(GL_MAX_PROGRAM_INSTRUCTIONS_ARB) query.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The llvm TGSI backend uses pointers in registers and does things
like:
LOAD TEMP[0].y, MEMORY[0], TEMP[0]
Expecting the data at address TEMP[0].x to get loaded to
TEMP[0].y. But this will cause the data at TEMP[0].x + 4 to be
loaded instead.
This commit adds support for a swizzle suffix for the 1st source
operand, which allows using:
LOAD TEMP[0].y, MEMORY[0].xxxx, TEMP[0]
And actually getting the desired behavior
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
"off" later gets set to NULL when the address is immediate, so move the
fetchSrc(1) call to the non-immediate branch of the if-else. This brings
handleLOAD's offset handling inline with how it is done in handleSTORE.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
LOAD loads upto 4 components from the specified resource starting at
the passed in x value of the 2nd source operand, the y, z and w
components of the address should not be used.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Stencil texturing is required by ES 3.1. Apparently we never actually
turned it on. Do that now. Also turn on the desktop extension.
Fixes nine dEQP-GLES31.functional tests:
stencil_texturing.format.stencil_index8_2d
texture.border_clamp.formats.stencil_index8.nearest_size_pot
texture.border_clamp.formats.stencil_index8.nearest_size_npot
texture.border_clamp.formats.stencil_index8.gather_size_pot
texture.border_clamp.formats.stencil_index8.gather_size_npot
texture.border_clamp.unused_channels.stencil_index8
state_query.internal_format.renderbuffer.stencil_index8_samples
state_query.internal_format.texture_2d_multisample.stencil_index8_samples
state_query.internal_format.texture_2d_multisample_array.stencil_index8_samples
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
ES prohibits this, but GL appears to allow it. We at least need this
much, or else we'll crash as there's no source to read from.
This fixed crashes in the ES tests before I realized I needed to
prohibit stencil instead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
I want to add another condition. Moving the indirect_offset.file
check out a level should make this a little easier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Often, we don't need a full 4 channels worth of data from the sampler.
For example, depth comparisons and red textures only return one value.
To handle this, the sampler message header contains a mask which can
be used to disable channels, and reduce the message length (in SIMD16
mode on all hardware, and SIMD8 mode on Broadwell and later).
We've never used it before, since it required setting up a message
header. This meant trading a smaller response length for a larger
message length and additional MOVs to set it up.
However, Skylake introduces a terrific new feature: for headerless
messages, you can simply reduce the response length, and it makes
the implicit header contain an appropriate mask. So to read only
RG, you would simply set the message length to 2 or 4 (SIMD8/16).
This means we can finally take advantage of this at no cost.
total instructions in shared programs: 9091831 -> 9073067 (-0.21%)
instructions in affected programs: 191370 -> 172606 (-9.81%)
helped: 2609
HURT: 0
total cycles in shared programs: 70868114 -> 68454752 (-3.41%)
cycles in affected programs: 35841154 -> 33427792 (-6.73%)
helped: 16357
HURT: 8188
total spills in shared programs: 3492 -> 1707 (-51.12%)
spills in affected programs: 2749 -> 964 (-64.93%)
helped: 74
HURT: 0
total fills in shared programs: 4266 -> 2647 (-37.95%)
fills in affected programs: 3029 -> 1410 (-53.45%)
helped: 74
HURT: 0
LOST: 1
GAINED: 143
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Calling textureOffset() with an offset of <0, 0, 0> is equivalent to
calliing texture(). We don't actually need to set up an offset,
which causes a message header to be created.
A fairly common pattern is to sample at a point with a bunch of
offsets, and average them. It's natural to write all the lookups
as textureOffset, but use <0, 0> for the center sample.
shader-db results on Skylake:
total instructions in shared programs: 9092095 -> 9092087 (-0.00%)
instructions in affected programs: 2826 -> 2818 (-0.28%)
helped: 12
HURT: 2
total cycles in shared programs: 70870166 -> 70870144 (-0.00%)
cycles in affected programs: 15924 -> 15902 (-0.14%)
helped: 2
HURT: 0
This also helps prevent code quality regressions in a future patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by Jason Ekstrand <jason@jlekstrand.net>
Some apps set NEG and ABS on the source param to test for zero.
Use ALU_OP3_CNDE insted of ALU_OP3_CNDGE and unset both modifiers.
It also removes the need for a MOV instruction, as ABS isn't
supported on op3.
Tested on AMD CAYMAN and AMD RV770.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
This enables ARB_compute_shader on softpipe. I've only
tested this with piglit so far, and I hopefully plan
on integrating it with my vulkan work. I'll get to
testing it with deqp more later.
The basic premise is to create up to 1024 restartable
TGSI machines, and execute workgroups of those machines.
v1.1: free machines.
v2: deqp fixes - add samplers support, finish
atomic operations, fix load/store writemasks.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We want to use the SysSemanticToIndex to tell if we've seen
the semantics at all.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This lets us restart the machine at a PC value, and exits
the machine when we hit a barrier.
Compute shaders will then execute all the threads up to the
barrier, then restart the machines after the barrier once
all are done.
v2: comment the code a bit, change return types.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
compute shaders don't need input/outputs so don't bother
allocating memory for these.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This implements basic load/store/atomic ops on MEMORY types
for compute shaders.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just a cleanup that will make later changes easier
to make.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This will be used later to restart barriered execution
threads in compute, for now we just want to change the API.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For compute support some of the system values are .xyz types,
so move to using a vector instead of a single channel.
[airlied: squash swizzle fix from compute series].
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
a) SysSemanticToIndex needs to be indexed with the semantic name
not the decl->Declaration.Semantic.
b) doing this in run is too late, as the mappings are all setup
prior to run in the execs.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously they (very rarely) used C++isms that prevented them from being
compiled as C. As of this commit, they can be compiled as either C or C++.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This commit is mostly mechanical except that it changes where we set the
swizzle. Previously, the blorp_surface_info constructor defaulted the
swizzle to SWIZZLE_XYZW. Now, we memset to zero and fill out the swizzle
when we setup the rest of the struct.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
They didn't really add anything other than a key and extra layers of
function calls. This commit just inlines the extra functions and gets rid
of the extra classes.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Instead of having a virtual member function for getting the WM/PS kernel,
we simply add fields for prog_data and the kernel to brw_blorp_parms and
always make sure those get set as part of the different constructors.
v2: Use use prog_data != NULL to check for a valid program instead of a
magic kernel offset value
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
if brw_meta_stencil_blit() errored at wrong place 'target' would
be uninitialized and cause random behaviour on leaving the funtion.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Initialize drawFb to NULL in _mesa_meta_CopyImageSubData_uncompressed()
if getting readFb fails uninitialized drawFb will cause randomness
on cleanup.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This has been merged few months ago but this should help
https://mesamatrix.net/ to update its list of supported extensions.
Please note that compute shaders are not really useful without
ARB_image_load_store and only GK104 and GK110 support it for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
3D images are a bit more complicated to implement and will probably
requires a bunch of headaches and we don't care for now because they
do not seem to be really used by apps.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This re-uses NVE4_SU_INFO_CALL which is not used anymore because we
don't use our lib for format conversions. While we are at it, add a
todo for image buffers because there are some robustness-related
issues to fix.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Checking if the image address is not 0 should be enough to prevent
read faults. To improve robustness, make sure that the destination
value of atomic operations is correctly initialized in case the
instruction is not performed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes arb_shader_image_load_store-indexing and
arb_shader_image_load_store-max-images.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This will allow to convert surface formats without adding an extra
call to our lib.
[hakzsam: make use of this for GK104]
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This implements RESQ for surfaces which comes from imageSize() GLSL
bultin. As the dimensions are sticked into the driver constant buffer,
this only has to be lowered with loads.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v2)
TGSI RESQ allows both images and buffers but we have to make a
distinction between these two type of resources in our lowering pass.
Introducing OP_BUFQ which is a fake operand will allow to implement
OP_SUQ for surfaces.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This feature can be enabled in two ways: as an optimization and by
explicit user control (with OpenGL 4.2 or ARB_shader_image_load_store).
This makes use of the recent TGSI_PROPERTY_FS_EARLY_DEPTH_STENCIL to
force early fragment tests when needed.
This fixes a bunch of
dEQP-GLES31.functional.image_load_store.early_fragment_tests.* tests.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Destination type is actually always 32-bits, so typeSizeof() returns 4
and no sources are condensed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is loosely based on the previous lowering pass wrote by calim
four years ago. I did clean the code and fixed some issues.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Old and dead resource code will be removed once images are completely
done. Based on original patch by Ilia Mirkin.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes a bunch of subtests of
arb_shader_image_load_store-host-mem-barrier.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
No clue why this was not enabled by default before, maybe because
the SULDP conversion was wrong. Anyway, this helps in fixing all
rgb10_a2ui piglit tests.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Old surfaces validation code will be removed once images are completely
done for Fermi/Kepler, that explains why I only disable it for now.
This also introduces nvc0_get_surface_dims() which computes correct
dimensions regarding the given target.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
To process surfaces coordinates from the codegen part, and because
some information like the format is not always available (eg. when
writeonly is used), we have to stick some surfaces data in the
driver constbuf. This is especially true for OpenCL because we don't
know the format at shader compile time.
This bumps the size of each shader area from 1K to 2K.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This implements set_shader_images() and resource invalidation for
images. As OpenGL requires at least 8 images, we are going to expose
this minimum value even if this might be raised for Kepler, but this
limit is mainly for Fermi because the hardware only accepts 8 images.
Based on original patch by Ilia Mirkin.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The third source must be emitted at offset 49 instead of 17 and the
not modifier is at 52 instead of 20. If you look a bit above in
emitLogicOp() you will see that the dest is emitted at 17 which
confirms that src(2) is obviously wrong.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
- Introduce 'gcc_compat' env flag, for all compilers that define __GNUC__,
(which includes Clang when it's not emulating MSVC.)
- Clang doesn't support whole program optimization
- Disable enumerator value warnings (not sure why Clang warns about them,
as my understanding is that MSVC promotes enums to unsigned ints
automatically.)
This is not enough to build with Clang + AddressSanitizer though. More
follow up changes will be required for that.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
More portable, particularly when building with Clang, which implements
all MSVC intrisincs in its own intrin.h, but doesn't actually support
`#pragma instrinsic`.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Because compilers like GCC and Clang are effectively available everywhere
so their presence/absence is seldom conclusive.
Furthermore, all compilers we use now have stdint.h.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These were being defined in SCons, but it's not practical:
- we actually need to include Gallium headers from external source trees, with
completely disjoint build infrastructure, and it's unsustainable to
replicate the HAVE_xxx checks or even hard-coded defines across
everywhere.
- checking compiler version via command line doesn't really work due to
Clang essentially being like a cameleon which can fake either GCC or
MSVC
There's no change for autoconf.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch modifies r600_colorformat_endian_swap(), so for 16-bit and for
32-bit buffers, the endianess configuration will be determined not only
by the color/texture format, but also by the do_endian_swap parameter.
The only exception is for array formats, which are always set to not do
swapping, because for them gallium sets an alias based on the machine's
endianess.
v4:
V_0280A0_COLOR_16_16 and V_0280A0_COLOR_16_16_FLOAT should be set to
8IN16 because the bytes inside need to be swapped even for array formats.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Because r600 GPUs can't do swap in their DB unit, we need to disable
endianess swapping for textures that are handled by DB.
There are four format translation functions in r600g driver:
- r600_translate_texformat
- r600_colorformat_endian_swap
- r600_translate_colorformat
- r600_translate_colorswap
This patch adds a new parameters to those functions, called
"do_endian_swap". When running in a big-endian machine, the calling
functions will check whether the texture/color is handled by DB -
"rtex->is_depth && !rtex->is_flushing_texture" - and if so, they will
send FALSE through this parameter. Otherwise, they will send TRUE.
The translation functions, in specific cases, will look at this parameter
and configure the swapping accordingly.
v4:
evergreen_init_color_surface_rat() is only used by compute and don't
handle DB surfaces, so just sent hard-coded FALSE to translation
functions when called by it.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There was definitely bugs here mixing up the PIPE_ and TGSI_ defines,
hopefully they didn't cause any problems, since mostly it was special
cases for GEOMETRY.
This clarifies at shader machine create what type of shader this
machine will execute. This is needed also for compute shaders where
we don't want to allocate inputs/outputs.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
It gets annoying that changing the tgsi exec rebuilds the state
tracker unnecessarily. Putting this include into draw_gs.h which
uses it causes a lot less rebuilds.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some cases (especially these using fract for coord wrapping) did not handle
NaNs (or Infs) correctly - the following code assumed the fract result
could not be outside [0,1], but if the input is a NaN (or +-Inf) the fract
result was NaN - which then could produce out-of-bound offsets.
(Note that the explicit NaN behavior changes for min/max on x86 sse don't
result in actual changes in the generated jit code, but may on other
architectures. Found by looking through all the wrap functions.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94955
No piglit changes.
(v2: fix min/max typo in coord_mirror, add comment)
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Tested-by: Bruce Cherniak <bruce.cherniak@intel.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
According to the CUDA compute capability version, GM10x can expose
64KB of shared memory while GM20x can use 96KB.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This got removed from u_blitter.h and we were taking it from
there, this should just move to ARRAY_SIZE eventually.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The home-grown heap scheme (which is ultra-simple but probably not good
to always allocate and memset such a chunk of memory up front) was a
remnant of fdre (where the ir originally came from). But since we have
ralloc in mesa, lets just use that instead.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Normally this would never happen (constant-propagation in NIR would
eliminate the instruction), except it does happen for 'undef' which
we turn into immed 0.0 for bookkeeping purposes.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
See a7eb12d0.. but that wasn't restrictive enough. Fixes
dEQP-GLES3.functional.rasterization.primitives.line_strip_wide, and
similar
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We seem to need range reduction to get sane results. Fixes glmark2
jellyfish bench, and a whole bunch of
dEQP-GLES3.functional.shaders.builtin_functions.precision.{sin,cos,tan}.*
v2: squashed in android build fixes from Rob Herring
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This fixes 10 dEQP-GLES3 subtests:
dEQP-GLES3.functional.shaders.derivate.dfdy.texture.float_nicest.*.
Matt noticed that our Piglit tests for this use even numbered registers,
while the failing dEQP tests use odd numbered registers. We believe
that it works for even numbered registers, but not otherwise.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If the bound framebuffer has a format of MESA_FORMAT_R_UNORM, then
IMPLEMENTATION_COLOR_READ_FORMAT will return GL_RED. This change
applies to OpenGLES contexts where additional restrictions are placed
on the formats that are allowed to be supported.
Fixes OpenGLES 3.1 CTS tests:
* ES31-CTS.texture_border_clamp.sampling_texture.Texture2DDC16
* ES31-CTS.texture_border_clamp.sampling_texture.Texture2DDC16Linear
* ES31-CTS.texture_border_clamp.sampling_texture.Texture2DDC32F
* ES31-CTS.texture_border_clamp.sampling_texture.Texture2DDC32FLinear
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently if the texture binding is changed, emit_fs_consts()
is triggered to update texture scaling factor for
rectangle texture or texture buffer size in the constant buffer.
But the update is only relevant if the texture binding includes
a rectangle texture or a texture buffer.
To eliminate the unnecessary constant buffer updates due to other texture
binding changes, a new flag SVGA_NEW_TEXTURE_CONSTS will be used
to trigger fragment shader constant buffer update when a rectangle texture
or a texture buffer is bound.
With this patch, the number of constant buffer updates in Lightsmark2008
reduces from hundreds per frame to about 28 per frame.
Reviewed-by: Brian Paul <brianp@vmware.com>
Instead of unconditionally mark the texture subresource dirty at transfer map,
we'll set the dirty bit for write transfer only.
Tested with lightsmark2008 and glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
With this patch, when running in hardware version 11, we'll use
SVGA3D_QUERYTYPE_OCCLUSION query type for PIPE_QUERY_OCCLUSION_PREDICATE
and return TRUE if samples-passed count is greater than 0.
Fixes glretrace/solidworks2012_viewport running in hardware version 11.
Reviewed-by: Brian Paul <brianp@vmware.com>
Currently, we always do a surface flush when we try to establish
a synchronized write transfer map. But if the subresource has not
been modified, we can skip the surface flush. In other words,
we only need to do a surface flush if the to-be-mapped subresource
has been modified in this command buffer.
With this patch, lightsmark2008 shows about 15% performance improvement.
Reviewed-by: Brian Paul <brianp@vmware.com>
They can be affected by URB writes.
In the upcoming scalar TCS backend, this prevents read-modify-write
cycles from being broken by CSE removing reads.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Also, move them to brw_shader.cpp so they're in a location for code
used by both the vec4 and fs worlds.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Count depth/stencil, blend, sampler, etc. state objects separately
but just report the sum for the HUD. This change lets us use gdb to
see the breakdown of state objects in more detail.
Also, count sampler views too.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Remove a wrapper around __builtin_ffs that conflicts with system
headers on OpenBSD and perhaps elsewhere:
isl_priv.h:44: error: conflicting types for 'ffs'
v2: include strings.h to ensure prototype is found
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Mark variable MAYBE_UNUSED to avoid unused-but-set-variable warning in
release build.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Mark variables MAYBE_UNUSED to avoid unused-but-set-variable warnings
in release build.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This is mostly for variables that are only used in asserts and cause
unused-but-set-variable warnings in release builds. Could just use
UNUSED directly, but MAYBE_UNUSED should be less confusing and is
similar to what the Linux kernel has.
And yes __attribute__((unused)) can be used on variables on both GCC 4.2
(oldest supported by mesa) and clang 3.0 (just some random old version,
not sure what's the minimum for mesa).
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
combineLd/St would combine, i.e. :
st u32 # g[$r2+0x0] $r2
st u32 # g[$r2+0x4] $r3
into:
st u64 # g[$r2+0x0] $r2d
But this is only valid if r2 contains an 8 byte aligned address,
which is not guaranteed for compute shaders
This commit checks for src0 dim 0 not being indirect when combining
loads / stores as combining indirect loads / stores may break alignment
rules.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Currently we were two restrictive, and would insert an output move in
cases like: MOV OUT[0], IN[0].xyzw
Loosen the restriction to allow the current instruction to appear in the
neighbor list but only at it's current possition.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Normally the offset in the group would be the same, but not always. For
example, in a sam(w) which only writes the 4th component.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This *seems* like a hw bug, and maybe only applies to certain a4xx
variants/revisions. But setting the SRGB bit in sampler view state
(texconst0) causes invalid alpha for ASTC textures. Work around this
setting up a second texture state and using that to sample alpha
separately.
This way, srgb->linear conversion happens in hw *prior* to
interpolation.
This fixes 546 dEQP tests: dEQP-GLES3.functional.texture.*astc*srgb*
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Allows the build to work when the python3 binary is not "python3".
v2: remove x bit from the script at Emil's suggestion
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When uploading a linear, void-extent, ASTC LDR block on Skylake, we are
required to flush to zero the UNORM16 channel values that would be
denormalized. This is specifically required for the values: 1, 2, and 3.
Fixes the 14 failing tests in:
dEQP-GLES3.functional.texture.compressed.astc.void_extent_ldr.*
v2: Split out flushing function (Kristian Høgsberg)
v3: Map with READ instead of INVALIDATE (Kenneth Graunke)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This is equivalent of 73b01e2711
for blorp.
v2 (Ken): No need to call _mesa_format_has_color_component() now
that the number of components is gotten from
_mesa_base_format_component_count().
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In case blorp needs to configure it will be just as if render or
compute pipeline had configured it.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In the past, BLORP has clobbered all BRW_NEW_* state flags, to trigger
re-emission of the entire 3D pipeline on the next draw. However, there
are some packets BLORP simply leaves alone, so there's no need to
re-emit them. Trying to reduce the set of dirty bits flagged after
BLORP runs is tricky.
Instead, we introduce a BRW_NEW_BLORP flag. This should be set on any
atom which emits a packet that BLORP also emits. When BLORP runs, it
will flag BRW_NEW_BLORP, causing those packets to get re-emitted.
This also makes it easy to avoid re-emitting specific atoms - we can
simply drop the BRW_NEW_BLORP flag on those.
To start, we assume that all packets need to be re-emitted. This is the
safest approach and closest to the existing code's behavior. Many of
these are obviously not required, and can be dropped in subsequent
patches.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is identical to the blorp version which only differs in case
fragment shader isn't used. In that case blorp would reset batch
buffer address to zero.
This is not really needed, and having blorp to use base state
address setup that is compatible with normal upload allows one to
skip resetting it.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're trying to move to more of the new style intrinsics with include
the correct target name, and map directly to ISA instructions.
v2:
- Only do this with LLVM 3.8 and newer.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The range metadata tells LLVM the range of expected values for this intrinsic,
so it can do some additional optimizations on the result.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Although Gen9 samples from most HDR ASTC surfaces of correctly,
there currently are no software workarounds to fix the incorrect
sampling that occurs in others of certain color endpoint modes.
With this change, we are no longer failing the 14 tests from:
dEQP-GLES3.functional.texture.compressed.astc.endpoint_value_hdr_cem_15.*
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Switch boolean template arguments to typename template arguments of type
std::integral_constant<bool, VALUE>.
This allows the template argument unroller to easily be extended to enums.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Part of fixing piglit EXT_framebuffer_multisample/sample-coverage inverted
(there is also a bug with RCL tiled blits)
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
There's no reason we couldn't do non-MSAA full resolution tile buffer
load/stores, but we would have claimed buffer overflow was being
attempted. Nothing does this currently.
This was a bug from the MSAA enabling. Tests for surfaces with
nr_samples==1 instead of 0 (generally GL renderbuffers) would incorrectly
fail out.
Fixes the ARB_framebuffer_sRGB piglit tests other than srgb_conformance.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
I had made the previous blit fix non-MSAA only because I was thinking
about how the hardware infers stride from the RENDERING_CONFIG packet.
However, I'm also inferring the stride for both MSAA src and dst in
vc4_render_cl.c from the width argument in the ioctl.
Fixes 15 EXT_framebuffer_multisample piglit tests.
On Broadwell, I get the following shader-db statistics:
Tessellation Control Shaders:
total instructions in shared programs: 57327 -> 57012 (-0.55%)
instructions in affected programs: 27334 -> 27019 (-1.15%)
helped: 45
HURT: 0
total cycles in shared programs: 265692 -> 255188 (-3.95%)
cycles in affected programs: 263122 -> 252618 (-3.99%)
helped: 184
HURT: 26
Tessellation Evaluation Shaders:
total instructions in shared programs: 23236 -> 23157 (-0.34%)
instructions in affected programs: 2791 -> 2712 (-2.83%)
helped: 27
HURT: 0
total cycles in shared programs: 151858 -> 149704 (-1.42%)
cycles in affected programs: 151858 -> 149704 (-1.42%)
helped: 101
HURT: 114
Geometry Shaders:
Orbital Explorer goes from 6442 -> 6356 instructions.
Two Shadow of Mordor shaders increase by a single instruction.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
NIR will lower it in nir_opt_algebraic.
No change in shader-db.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The timestamps are stored in a funny place, and even though they are a
64-bit result, are not stored with is64bit. Account for that when
retrieving the query result into a resource.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
This lets us delete some redundant code and keep all of the
image_load_store format lowering logic in one place: libisl.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Previously, we were relying on has_matching_typed_format returning true for
MESA_FORMAT_NONE which, in turn, relied on _mesa_get_format_bytes returning
1 for MESA_FORMAT_NONE. When we switch to ISL, this behaviour will no
longer be something we can rely on.
Reviewed-by: Chad Versace <chad.versace@intel.com>
We want to call this function from the shader compiler and having a full
isl_device available at that point isn't practical.
Reviewed-by: Chad Versace <chad.versace@intel.com>
C++ doesn't support designated initializers and g++ in particular doesn't
handle them when the struct gets complicated, i.e. has a union.
Reviewed-by: Chad Versace <chad.versace@intel.com>
To avoid build issues, ensure that you're running `make' at the top level
and/or you've executed `make clean' beforehand.
Reviewed-by: Chad Versace <chad.versace@intel.com>
They can only indicate out of memory conditions, since the other error
conditions are caught earlier.
v2: fix error message in EndQuery
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Even when begin_query succeeds, there can still be failures in query handling.
For example for radeon, additional buffers may have to be allocated when
queries span multiple command buffers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Starting with Skylake, the display engine is capable of scanning out from
Y-tiled buffers. As such, we can and should use Y-tiling for better efficiency.
This also has the added benefit of being able to fast clear the winsys buffer.
Note that the buffer allocation done for mipmaps will already never allocate an
X-tiled buffer for GEN9.
This has an almost universal positive impact on benchmarks, some improving by as
much as 20%.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use PIPE_SWIZZLE_* everywhere.
Use X/Y/Z/W/0/1 instead of RED, GREEN, BLUE, ALPHA, ZERO, ONE.
The new enum is called pipe_swizzle.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
const buffers are no longer used since the clip plane const buffer was
moved to RW buffers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
add it to the RW_BUFFERS descriptor array
now the slot masks don't have to have 64 bits
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
- use an enum
- use a unique slot number regardless of the shader stage
(the per-stage slots will go away for RW buffers)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Screwed up since 0753b135f6.
(Only an issue with different min/mag filters, and then only in some cases,
which is probably why it went unnoticed for quite a while.
The effect should have simply been nearest mip filter instead of linear, iff
min was nearest, mag was linear, and all pixels hit the mignifying path.)
Fixes a bunch of dEQP failures.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
In commit cda886a485, Neil made us stop
advertising RGBX formats on Gen9+, as the hardware apparently no longer
has working fast clear support for those formats. Instead, we just
fall back to RGBA formats, and use SCS to override alpha to 1.0.
This is fine, but had one unintended side effect: it made us fall back
to slow clears when the color mask disables alpha. Normally, we ignore
the color mask for non-existent channels. This includes alpha for XRGB
formats as writing garbage to the X channel is harmless. But, now that
we use RGBA, we think there's a real alpha channel, and can't do the
optimization.
To hack around this, check if _BaseFormat is GL_RGB and ignore alpha.
Improves WebGL Aquarium performance on Skylake GT3e by about 50%
by letting it use repclears instead of slow clears.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We do this in two steps: first we clip the dst rect and adjust the src
rect accordingly. Then we do it the other way around. In both passes
the adjustment part involves multiplying by a scale factor that can lead
to a small precision loss. This is breaking a few dEQP tests.
Specifically, the problem happens when we need to clip the same coordinate
twice. For example, if srcX0 and dstX0 need both to be clipped we want to
avoid the situation where we clip srcX0 first, then adjust dstX0 accordingly
but then we realize that the resulting dstX0 still needs to be clipped, so
we clip dstX0 and adjust srcX0 again. Each of these two passes can lead
to precission loss. What we want to do here is detect the rect that leads
to the largest clip (accounting for the scale factor involved), clip that
rect and adjust the other one. With this we ensure that the adjusted
coordinate does not need to be clipped again and we can skip a second pass,
improving precision.
Fixes the following 4 dEQP tests:
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_src_x_nearest
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_src_x_linear
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_dst_x_nearest
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_dst_x_linear
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Float suffixes are allowed in all subsequent GLSL specifications, and
it's obvious what the user meant if they specify one. Accept it with a
warning to avoid breaking applications, like Planeshift (although it
looks like between 0.6.1 and 0.6.3 they might have removed the suffixes
from their shaders).
Reviewed-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit b449366587.
I removed the pass thinking that it was now not useful, but that was not
true. I believe I ran shader-db on HSW and saw no results, but HSW does
not use the unlit centroid workaround code and as a result does not emit
redundant MOV_DISPATCH_TO_FLAGS instructions.
On IVB, the shader-db results are:
total instructions in shared programs: 6650806 -> 6646303 (-0.07%)
instructions in affected programs: 106893 -> 102390 (-4.21%)
helped: 793
total cycles in shared programs: 56195538 -> 56103720 (-0.16%)
cycles in affected programs: 873048 -> 781230 (-10.52%)
helped: 553
HURT: 209
On SNB, the shader-db results are:
total instructions in shared programs: 7173074 -> 7168541 (-0.06%)
instructions in affected programs: 119757 -> 115224 (-3.79%)
helped: 799
total cycles in shared programs: 98128032 -> 98072938 (-0.06%)
cycles in affected programs: 1437104 -> 1382010 (-3.83%)
helped: 454
HURT: 237
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
There is not initial assignment, thus appending to it does not work.
Fixes: b27c85c4c0 "i965: add build rule for brw_nir_trig_workarounds.c"
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Add GBM_FORMAT_XBGR8888/__DRI_IMAGE_FORMAT_XBGR8888 format support which
is needed for Android.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Add support for 32-bit RGBX/RGBA formats which are preferred for Android.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Add MESA_FORMAT_R8G8B8A8_UNORM and MESA_FORMAT_R8G8B8X8_UNORM formats as
these are the preferred formats for Android.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit bfd17c76c1 ("i965: Port INTEL_PRECISE_TRIG=1 to NIR.") added a
generated file brw_nir_trig_workarounds.c which broke the Android build.
Add the necessary makefiles to the Android build.
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Commit 4db8f15a25 ("glsl: move the android build scripts a level up")
dropped a generated include path for glcpp. Add it back adjusting for the
new location.
Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Use dev_node_from_fd() with HAVE_LIBDRM to provide an implmentation
of loader_get_device_name_for_fd() for non-linux systems that
use libdrm but don't have udev or sysfs.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Use the defines Mesa configure sets to indicate presence of the bswap32
builtins. This lets i965 work on OpenBSD again after the changes that
were made in 0a5d8d9af4.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For systems without udev or sysfs that use drm ioctls in the loader
drm authentication must take place earlier or the loader will fail
"MESA-LOADER: failed to get param for i915".
Patch from Mark Kettenis.
Cc: "11.2 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mark Kettenis <kettenis@openbsd.org>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
[Emil Velikov: remove gratuitous white-space]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We need to enable a bit in the CONTEXT_CONTROL packet for the
loads to work.
v2: Style issues.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2: Use field names provided by Nicolai.
v3: Updated to use CONTEXT_CONTROL prefix.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The missing break caused the IB size to be overwritten with
the size of IB_CONST.
This was introduced in: 7201230582
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Previously the vertex buffer consisted of eight floats per vertex
of which six where constants. These can be as easily provided by
vertex fetcher as it is capable of filling vertex elements with
constant one and zero. This reduces the size of the vertex buffer
from 3 * 8 * 4 = 96 to 3 * 2 * 4 = 24 bytes.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise clearing with blorp will regress performance in some
synthetic test cases.
v2: Used vsize >= 2 instead of vsize > 0, and updated the comment.
Review by Ken in one of the earlier patches revealed this.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In case there is no source it means the program does a simple
clear or a resolve. In such case there is no need to program
sampling state or enable pixel kill in fragment shader.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On gen8 color resolving won't work anymore if the target isn't
the first entry in the binding table.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Ken): Fix the condition on using meta for stencil blits:
use_blorp -> !use_blorp
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds additional MOV instruction for all blorp programs
that use SHADER_OPCODE_TXF. Alternative is to augment blorp program
key to tell if z-coordinate is needed, add condition to the blorp
blit compiler and to produce a variant with and without the MOV.
This seems a little overkill.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In order to support cases where gen9 uses RGBA format to back client
requested RGB, one needs to have means to force alpha channel to one
when user requested RGB surface is used as blit source.
v2 (Ken): Use helper for constructing the swizzle (this should be
changed to use brw_get_texture_swizzle() as a follow-up).
Also calculate the swizzle for CopyTexSubImage.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Ken): Drop GEN8_RASTER_FRONT_WINDING_CCW in raster state
Add emission of pma stall.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Ken): Added switch cases for gen8/9 in texel_fetch(). These
were wrongly introduced in blit-enabling patch.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Ken): Use payload directly instead of retyping it into vec8.
Drop the implied header, it isn't used for gen6+ anyway.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we hardcoded "VS URB Starting Address" to 2 (in 8kB chunks),
which meant VS URB data would start at an offset of 16kB.
However, on Haswell GT3 and Gen8+, we allocate the first 32kB for the
push constant region. This means that the PS push constant and VS URB
data regions overlap, which can lead to corruption.
v2 (Ken): Better description of the change, and do not change vs_size
from 2 to 1.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently the size is sizeof(float) times too large. One reserves
GEN6_BLORP_VBO_SIZE many floats whereas GEN6_BLORP_VBO_SIZE stands
for the size of vertex buffer in bytes.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was part of EXT_gpu_shader4 - as such it should have been supported
by glsl 130.
It was however forgotten, and not added until glsl 430 - with the wrong
syntax no less (glsl 430 mentions it was overlooked).
glsl 440 (but revision 8 only) fixed this finally for good.
At least nvidia supports this with just version glsl version 1.30 as well
(the spec doesn't explicitly say it should be supported retroactively),
so just add this to the other glsl 130 textureOffset functions.
Passes a (hacked) piglit tex-miplevel-selection test (2DArrayShadow
textureOffset -auto) with llvmpipe.
v2: fix up comment (by Ian), add testing to commit message.
Reviewed-by: Dave Airlie <airlied@gmail.com>
The coverage mask is not sufficient - in per-sample mode, we also need
to AND with a mask representing the samples being processed by the
current fragment shader invocation.
Fixes 18 dEQP-GLES31.functional.shaders.sample_variables tests:
sample_mask_in.bit_count_per_sample.multisample_{rbo,texture}_{1,2,4,8}
sample_mask_in.bit_count_per_two_samples.multisample_{rbo,texture}_{4,8}
sample_mask_in.bits_unique_per_sample.multisample_{rbo,texture}_{1,2,4,8}
sample_mask_in.bits_unique_per_two_samples.multisample_{rbo,texture}_{4,8}
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The ARB_sample_shading specification says that setting gl_SampleMask
bits to 0 means that the corresponding sample "should be considered
uncovered for the purposes of multisample fragment operations
(Section 4.1.3)."
The OpenGL 4.4 specification, section 17.3.3 ("Multisample Fragment
Operations") specifies:
"No changes to the fragment alpha or coverage values are made at this
step if MULTISAMPLE is disabled, or if the value of SAMPLE_BUFFERS
is not one."
oMask output alters coverage masks and can kill pixels. We need to
disable it in the above case, which conveniently corresponds to
key->multisample_fbo being false.
Khronos bug #12188 also spells this out clearly:
https://cvs.khronos.org/bugzilla/show_bug.cgi?id=12188
Fixes two Piglit tests:
tests/spec/arb_sample_shading/builtin-gl-sample-mask-simple 0
tests/spec/arb_sample_shading/builtin-gl-sample-mask 0
Fixes 21 ES3 conformance tests:
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_0
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_1
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8.samples_0.mask_7
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_4
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_5
ES31-CTS.sample_variables.mask.rgba8i.samples_0.mask_7
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_3
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_4
ES31-CTS.sample_variables.mask.rgba8ui.samples_0.mask_6
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_zero
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_0
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_2
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_5
ES31-CTS.sample_variables.mask.rgba32f.samples_0.mask_7
Fixes 9 dEQP-GLES31.functional.shaders.sample_variables tests:
sample_mask.discard_half_per_pixel.default_framebuffer
sample_mask.discard_half_per_pixel.singlesample_rbo
sample_mask.discard_half_per_pixel.singlesample_texture
sample_mask.discard_half_per_sample.default_framebuffer
sample_mask.discard_half_per_sample.singlesample_rbo
sample_mask.discard_half_per_sample.singlesample_texture
sample_mask.discard_half_per_two_samples.default_framebuffer
sample_mask.discard_half_per_two_samples.singlesample_rbo
sample_mask.discard_half_per_two_samples.singlesample_texture
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I'm going to need a key entry meaning "we have a multisample FBO,
and multisampling is enabled" in an upcoming patch. This is basically
wm_key->compute_sample_id, except that it also checks that the SAMPLE_ID
system value is read.
The only use of wm_key->compute_sample_id is in emit_sampleid_setup(),
which is only called when handling the SAMPLE_ID system value. So we
can just eliminate the check and generalize the field.
v2: Also update the Vulkan driver.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This was only used by the old gl_SampleID calculations. The new code
doesn't need to handle 2x specially.
v2: Delete it from the Vulkan driver, too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On Gen7+, the thread payload provides the sample ID - we can read it
in two instructions, without any elaborate calculations. We don't even
need a state dependency - this will properly produce zero in the
non-MSAA case. Unfortunately, we need the state flag anyway, so we
may as well continue to use it to produce a single MOV 0 instead of
SHR/AND.
For some reason, the sample ID field is always zero on Gen7/7.5, so
we can't use this yet. However, it works fine on Gen8+. So, land the
code and use it where it's working, and leave a TODO for later.
v2: Fix register types in the comment (caught by Matt Turner!).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, opt_vector_float() always interpreted MOV sources as
floating point, and always created a MOV with a F-type destination.
This meant that we could mess up sequences of integer loads, such as:
mov vgrf6.0.x:D, 0D
mov vgrf6.0.y:D, 1D
mov vgrf6.0.z:D, 2D
mov vgrf6.0.w:D, 3D
Here, integer 0/1/2/3 become approximately 0.0f, so we generated:
mov vgrf6.0:F, [0F, 0F, 0F, 0F]
which is clearly wrong. We can properly handle this by converting
integer values to float (rather than bitcasting), and emitting a type
converting MOV:
mov vgrf6.0:D, [0F, 1F, 2F, 3F]
To do this, see first see if the integer values (converted to float)
are representable. If so, we use a D-type MOV. If not, we then try
the floating point values and an F-type MOV. We make zero not impose
type restrictions. This is important because 0D would imply a D-type
MOV, but is often used in sequences such as MOV 0D, MOV 0x3f800000D,
where we want to use an F-type MOV.
Fixes about 54 dEQP-GLES2 failures with the vec4 VS backend. This
recently became visible due to changes in opt_vector_float() which
made it optimize more cases, but it was a pre-existing bug.
Apparently it also manages to turn more integer loads into VFs,
producing the following shader-db statistics on Haswell:
total instructions in shared programs: 7084195 -> 7082191 (-0.03%)
instructions in affected programs: 246027 -> 244023 (-0.81%)
helped: 1937
total cycles in shared programs: 65669642 -> 65651968 (-0.03%)
cycles in affected programs: 531064 -> 513390 (-3.33%)
helped: 1177
v2: Handle the type of zero better.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We don't handle this properly - we'd have to perform the type conversion
before trying to convert the value to a VF.
While we could do that, it doesn't seem particularly useful - most
vector loads should be consistently typed (all float or all integer).
As a special case, we do allow type-converting MOVs of integer 0, as
it's represented the same regardless of the type. I believe this case
does actually come up.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
After the previous patch, this helper is only called in one place.
So, just fold it back in - there are a lot of parameters here and
not much code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This reworks opt_vector_float() so that there's only one place that
flushes out any accumulated state and emits a VF.
v2: Don't break the sequence for non-representable numbers - just skip
recording their values. Only break it for non-MOVs or register
changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This new macro uses a for loop to create an actual code block in which to
place the macro setup code. One advantage of this is that you syntatically
use braces instead of parentheses. Another is that the code in the block
doesn't even get executed if anv_batch_emit_dwords fails.
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
This is only valid for other atomic operations (including CAS). This
fixes an invalid opcode error from dmesg. While we are it, make sure
to initialize global addr to 0 for other atomic operations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
After some investigation, it seems like that disabling the UNK02C4
command avoid a read fault with texelFetch() from a compute shader.
I have no clue on what this method actually does, but this avoid the
GPU to hang with basic-texelFetch.shader_test without introducing any
compute-related regressions.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Normally, we split uniforms at the end but in Vulkan, we bail because we
don't want pull constants. However, we still need them split because
pack_uniforms relies on it.
I really don't like this patch not because it doesn't work (it does) but
because now that we're using MOV_INDIRECT, uniform numbers and sizes don't
really matter anymore. In the FS backend, uniform splitting and packing is
handled all at once (actual re-assignment of locations happens later) and
we really should do it that way in vec4 eventually as well.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95001
All of the code that did something special based on vec4 vs. scalar is
bogus. In the backend, everything is now in units of bytes and the vec4
backend can handle full std140 packing so we don't need to do anything
special anymore.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94998
Add support for OpenCL global memory buffers, note this has only
been tested with regular load and stores and likely needs more work
for e.g. atomic ops.
Tested with piglet on a gf119 and a gk107:
./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader
[9/9] pass: 9 /
./piglit run -o shader -t '.*arb_compute_shader.*' results/shader
[20/20] skip: 4, pass: 16 |
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Some of the lowering steps we currently do for FILE_MEMORY_GLOBAL only
apply to buffers, making it impossible to use FILE_MEMORY_GLOBAL for
OpenCL global buffers.
This commits changes the buffer code to use FILE_MEMORY_BUFFER at the
ir_from_tgsi and lowering steps, freeing use of FILE_MEMORY_GLOBAL
for use with OpenCL global buffers.
Note that after lowering buffer accesses use the FILE_MEMORY_GLOBAL
register file.
Tested with piglet on a gf119 and a gk107:
./piglit run -o shader -t '.*arb_shader_storage_buffer_object.*' results/shader
[9/9] pass: 9 /
./piglit run -o shader -t '.*arb_compute_shader.*' results/shader
[20/20] skip: 4, pass: 16 |
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
v2: - set interop_version
- simplify the offset_after macro
v2.1: - use version numbers, remove offset_after
- set "out_driver_data_written"
v2.2: - set buf_offset & buf_size for GL_ARRAY_BUFFER too
- add whandle.offset to buf_offset
- disable the minmax cache for GL_TEXTURE_BUFFER
v2: - use "enum" to define stuff
v3: - more comments, define MESA_GLINTEROP_UNSUPPORTED
v4: - add mesa_glinterop_device_info::interop_version
- more comments
- remove #define MESA_GLINTEROP_VERSION
- use const for "in"
v4.1: - use version numbers for structures
- add "out_driver_data_written"
v4.2: - buf_offset & buf_size affect GL_ARRAY_BUFFER too, this is required
for sharing suballocations within a larger buffer
When creating egl images we do a bytes to pixel conversion by deviding
by 4 regardless of the pixel format. This does not work for RGB565. In
this patch, we avoid useless conversion and use proper API when the
conversion cannot be avoided.
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This code is already duplicated twice and will be useful again. This
will also help when adding formats.
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This *seems* like a hw bug, and maybe only applies to certain a4xx
variants/revisions. But setting the SRGB bit in sampler view state
(texconst0) causes invalid alpha for ASTC textures. Work around this
by doing the srgb->linear conversion in the shader instead.
This fixes 392 dEQP tests: dEQP-GLES3.functional.texture.*astc*srgb*
(The remaining fails seem to be a bug w/ ASTC + linear filtering, also
possibly a420.0 specific.)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
No need for it not to be const, and lets caller declare it const if
desired.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Provide an improved lowering for LRP, which can be implemented in two
MAD instructions with a bit of rearranging of the equation, rather
than the literal implementation of two multiplies, an add and a
subtract.
Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Improve XPD lowering to consume less instructions by using the
MAD instruction to perform the multiply and subtraction together.
Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Add support for lowering TRUNC using the following sequence:
FRC tmpA, |src|
SUB tmpA, |src|, tmpA
CMP dst, -tmpA, tmpA
Note that this is incompatible with FRC lowering.
Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Add support for lowering FLR and CEIL to FRC/SUB and FRC/ADD
instructions for GPUs that support FRC but not FLR or CEIL. Since
these uses FRC, it is invalid to ask for FLR or CEIL to be lowered
along with FRC, so add an assert to catch this invalid configuration.
We also need to deal with FLR instructions emitted by the lowering
code. Fix these up with the FRC+SUB equivalent when FLR lowering is
enabled.
Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
v2: Use chip_class instead of family.
v3: Check kernel version for SI.
v4: Preemptively allow amdgpu winsys for SI.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
si_shader_create corrects the SGPR count with si_fix_num_sgprs. We then
recompute the rsrc1 register to use the new SGPR count.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: Add more CS_PARTIAL_FLUSH events.
Essentially every place with waits on finishing for pixel shaders
also has a write after read hazard with compute shaders.
Invalidating L2 waits implicitly on pixel and compute shaders,
so, we don't need a CS_PARTIAL_FLUSH for switching FBO.
v3: Add CS_PARTIAL_FLUSH events even if we already have INV_GLOBAL_L2.
According to Marek the INV_GLOBAL_L2 events don't wait for compute
shaders to finish, so wait for them explicitly.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
v2: - Do check if anything changed earlier
- Use emitted_program instead of emitted_bo to prevent
shaders with shader->bo = NULL confusing the check
- Use radeon_set_sh_reg*
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Instead of having a scratch buffer per program, have one per
context.
Also removed the per kernel wave count calculations, but
that only helped if the total number of waves in the dispatch
was smaller than sctx->scratch_waves.
v2: Fix style issue.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Also removes PKT3_CONTEXT_CONTROL as that is already being done
by si_begin_new_cs, when emitting init_config.
v2: - Use radeon_set_sh_reg_seq.
- Also set COMPUTE_STATIC_THREAD_MGMT_SE2 / SE3 for CIK+
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Also uses a dynamically allocated buffer using u_upload_alloc.
The old buffer per program approach required serializing all
dispatches of the same program.
v2: - Clarified commit message.
- Use radeon_set_sh_reg_seq.
- Also upload input buffer for clover kernels, even when
input_size is 0, as it contains grid parameters.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Declares the shared memory as a global variable so that
LLVM is aware of it and it does not conflict with passes
like AMDGPUPromoteAlloca.
v2: - Use ctx->i8.
- Dropped null-check for declare_memory_region.
- Changed memory region array to single region.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
v2: Load previous list for new CS instead of re-emitting
all descriptors.
v3: Do radeon_add_to_buffer_list in si_ce_upload.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For use by radeonsi.
v2: Make sure that it works for all 64 bits set.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We can then upload only the dirty ones with the constant engine.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: Use 32 byte alignment.
v3: Don't allocate CE space for vertex buffer descriptors.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Based on work by Marek Olšák.
v2: Add preamble IB.
Leaves the load packet in the space calculation as the
radeon winsys might not be able to support a premable.
The added space calculation may look expensive, but
is converted to a constant with (at least) -O2 and -O3.
v3: - Fix code style.
- Remove needed space for vertex buffer descriptors.
- Fail when the preamble cannot be created.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Necessary to prevent performance regressions due to extra flushing.
Probably should enlarge it even further when also updating
uniforms through the CE, but this seems large enough for now.
v2: Add preamble IB.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
64bits MSVCRT's exp2f(-inf) returns -inf instead of 0. Tested with
MSVC 2013's CRT. (I haven't tried 2015 yet.)
Also this does not happen with MinGW.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
All power of two of up native vector length.
There is actually a bug in lp_build_round for v2, whereby it doesn't
round to nearest. Fixing is left to the future, but the test is now
able to expect it to fail.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Just use LLVM_HOST_TRIPLE, which is available at least from LLVM 3.3
onwards, and is pretty much what llvm::sys::getProcessTriple() does anyway,
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
One needs to call setJITMemoryManager for LLVM 3.3, instead of
setMCJITMemoryManager.
This regressed in commits 065256df/75ad4fe7 when trying to make the
code to build with LLVM 3.6.
Tested MCJIT with LLVM 3.3 to 3.6.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
On the LLVM versions that support it, so we can easily switch between
MCJIT/old-jit for testing.
The new option is GALLIVM_MCJIT.
Unfortunately setting GALLIVM_MCJIT=1 for LLVM 3.3 or 3.4 causes
segfault, both on Linux and Windows. I'm almost certain this used to
work, so there probably is a regression somewhere.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
And llvm::raw_string_ostream where not (LLVM 3.3).
Thereby eliminating yet another dependency on unstable LLVM interfaces.
As a bonus this also gets LLVM IR on OutputDebugMessageA on MSVC (which
was disabled, probably due to C++ issues.)
Tested `lp_test_arit -v -v` on LLVM 3.3, 3.4 and 3.8.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Single-sampled texture miplevels > 1 are stored in POT-aligned areas, but
we only get one value to control the stride of the src and dst for single
sampled buffers. A RCL tile blit from level != 1 to level == 0 would
therefore load from the wrong stride.
This isn't currently that easy to expand, so fix it up
before expanding it later to include dynamic samplers.
[airlied: use some local variables (Roland)]
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When we have something like:
MOV OUT[n], IN[m].wzyx
the existing grouping code was missing a potential conflict. Due to
input needing to be sequential scalar regs, we have:
IN: x <-> y <-> z <-> w
which would be grouped to:
OUT: w <-> z2 <-> y2 <-> x (where the 2 denotes a copy/mov)
but that can't actually work. We need to realize that x and w are
already in the same chain, not just that they aren't both already in
new chain being built.
With this fixed, we probably no longer need the hack from f68f6c0.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
It's only required on the compute ring. This matches the closed driver.
The compute flag is removed to prevent confusion and Bas's compute shader
patches remove it in the whole function.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
I'm not sure about this. This will make the engines go idle, but the caches
will be unflushed. This should match app behavior without performance
counters, which can be a good thing.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When this extension was added, an underscore were mistakenly replaced
by a space. Let's correct this, so it's a tad easier to grep for this
extension.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Color clears should respect each drawbuffer's color mask state.
Previously, we tried to leave the color mask untouched. However,
_mesa_meta_drawbuffers_from_bitfield() ended up rebinding all the
color drawbuffers in a different order, so we ended up pairing
drawbuffers with the wrong color mask state.
The new _mesa_meta_drawbuffers_and_colormask() function does the
same job as the old _mesa_meta_drawbuffers_from_bitfield(), but
also rearranges the color mask state to match the new drawbuffer
configuration.
This code was largely ripped off from Gallium's st_Clear code.
This fixes ES31-CTS.draw_buffers_indexed.color_masks, which binds
up to 8 drawbuffers, sets color masks for each, and then calls
glClearBufferfv to clear each buffer individually. ClearBuffer
causes us to rebind only one drawbuffer, at which point we used
ctx->Color.ColorMask[0] (draw buffer 0's state) for everything.
We could probably delete _mesa_meta_drawbuffers_from_bitfield(),
but I'd rather not think about the i965 fast clear code. Topi is
rewriting a bunch of that soon anyway, so let's delete it then.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94847
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This allows meta operations to inspect the existing color mask, and
then do their own smashing.
BlitFramebuffer and Clear already override the color mask, so this
was also redundant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We need to fix up the offset to point at the face of the cube. Fixes
piglit fbo-cubemap, copyteximage CUBE, and glean's fbo test.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Catches the cause of failure in
arb_vertex_buffer_object-mixed-immediate-and-vbo, I've had this class of
failure before, and it probably won't be the last time.
If we're going to sample from or render to them at some particular size,
we'd better make sure that they actually are that size. Causes some tests
under simulation to generate appropriate error messages instead of
failures.
Starting from C++11, several math functions, like isinf, moved into the std
namespace. Since cmath undefines those functions before redefining them inside
the namespace, and glibc 2.23 defines the C variants as macros, the C variants
in global namespace are not accessible any longer.
v2: Move the fix outside of Nouveau, as suggested by Jose Fonseca, since anyone
might need it when GCC switches to C++14 by default with GCC 6.0.
v3:
* Put the code directly inside c99_math.h rather than creating a new header
file, as asked by Jose Fonseca;
* Guard the code behind glibc version checks, as only glibc > =2.23 defines
isinf & co. as functions, as suggested by Jose Fonseca.
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>
I need to do this so I could use R600_BIG_ENDIAN in files which include
r600_pipe_common.h but not r600_pipe.h
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Turns out the previous tarballs got corrupted during upload which I
carelessly forgot to check prior to deleting the local ones.
Lesson learned - double check before removing the local ones.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 79b0e13913)
We have had a guard against OOB array access of images on IVB for a long
time, but it can actually cause hangs on any GPU generation. This can
happen due to getting an untyped SURFACE_STATE for a typed message. We
didn't used to hit this with the piglit test on anything other than IVB
because the OOB in the test would cause us to go past the top of the pull
constant UBO and we would get a surface index of 0 which is was always a
valid surface. Now that we're pushing small arrays, we can end up grabbing
garbage from the GRF and going to some random index which causes a hang.
The solution is to just do the bounds check on all hardware.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94944
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Enable drivers to use their own implementation of this method instead of
the mesa default. Since the drivers that currently overwrite
dd_function_table::CompressedTexSubImage also overwrite
::CompressedTexImage, there should be no behavioral change.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Previously, we just looked at the hardware generation but this meant that
if you did INTEL_DEBUG=vec4 on BDW or SKL, you would have advertised but
non-working features.
Up until now, we have been able to assume that all push constants are
vec4-aligned because this is what the GL driver gives us. In Vulkan, we
need to be able to support full std140 because we get the layout from the
client.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This adds a --with-vulkan-drivers option with one driver, "intel". In the
future, we may add more drivers to this list.
v2: Don't enable any drivers by default. This should prevent this patch
from breaking anyone's build.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The surface format table hasn't entirely been kept up-to-date. This commit
marks a couple more compressed formats as sampleable on gen8+ and adds the
A4B4G4R4 format as renderable on gen9.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This adds functions for splicing one list into another. These have
more-or-less the same API as the kernel list splicing functions. The
implementation, however, was stolen from the Wayland list implementation.
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
On the philosophy that a driver shouldn't change the compile flags
for the entire tree, take the clove approach of moving the c++11 flag
to the swr driver directory.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This code started out like the T case, iterating over utile offsets, but I
had partially switched it to iterating over pixel offsets. I hadn't
caught this before because it's unusual to do piecemeal uploads to small
textures.
Fixes bad text rendering in QT5 apps, which use a 256x16 glyph cache.
Also fixes 6 piglit tests related to glTexSubImage() and
glGetTexSubImage().
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
We have several places where the Vulkan driver explicitly hooks into
valgrind when it's available. We need to be able to detect it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
nir_variable_mode is currently a bitflag enum, while
nir_print::print_var_decl() assumes is still a numbered list.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This removes a hack introduced in 1999 in the first version of
fakeglx.c, with the comment:
/* XXX revisit this after 3.0 is finished. */
Mesa 4.0 was released in 2001. It is now 2016, and Mesa 11.0 was
released last year.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The vishandle member of XMesaVisualInfo is used to support the
comparison of XVisualInfo instances by pointer value, in
find_glx_visual(). The comparison however will always be false, as in
every case the comparison is made, the VisualInfo instance being
compared to is a new allocation passed in through a GLX API call.
In addition, the XVisualInfo instance pointed to by vishandle is itself
never freed, causing a memory leak. Since vishandle is essentially
useless, we just remove it and thereby also fix the leak.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The returned XVisualInfo from glXChooseVisual/glXGetVisualFromFBConfig
is being cached in XMesaVisual.vishandle (and unconditionally
overwritten on subsequent calls). However, these entry points are
specified to return XVisualInfo instances to be owned by the caller and
freed with XFree(), so the return values should not be retained.
With this change, XMesaVisual.vishandle is essentially unused and will
be removed in a subsequent change.
v2: update commit message
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This reverts commit 4115648a6b. This commit
was half-baked and probably never should have been committed. We'll add
this back in properly later when we need it.
This is used to facilitate the Vulkan binding model where each resource is
described by a (descriptor set, binding, array index) tuple.
Reviewed-by: Rob Clark <robdclark@gmail.com>
driCreateContextAttribs() emits an error if bit
__DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS is set for an ES context. But,
EGL_EXT_create_context_robustness and EGL 1.5 both allow creation of
robust ES contexts. One requests a robust ES context by setting the
EGL_CONTEXT_OPENGL_ROBUST_ACCESS *attribute*, which Mesa's EGL layer
translates into the __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS *bit*.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Unfortunately, this also means that we need to use a slightly different
algorithm for assign_constant_locations. The old algorithm worked based on
the assumption that each read of a uniform value read exactly one float.
If it encountered a MOV_INDIRECT, it would immediately bail and push the
whole thing. Since we can now read ranges using MOV_INDIRECT, we need to
be able to push a series of floats without breaking them up. To do this,
we use an algorithm similar to the on in split_virtual_grfs.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This commit moves us to an instruction based model rather than a
register-based model for indirects. This is more accurate anyway as we
have to emit instructions to resolve the reladdr. It's also a lot simpler
because it gets rid of the recursive reladdr problem by design.
One side-effect of this is that we need a whole new algorithm in
move_uniform_array_access_to_pull_constants. This new algorithm is much
more straightforward than the old one and is fairly similar to what we're
already doing in the FS backend.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we have MOV_INDIRECT opcodes, we have all of the size information
we need directly in the opcode. With a little restructuring of the
algorithm used in assign_constant_locations we don't need param_size
anymore. The big thing to watch out for now, however, is that you can have
two ranges overlap where neither contains the other. In order to deal with
this, we make the first pass just flag what needs pulling and handle
assigning pull constant locations until later.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of using reladdr, this commit changes the FS backend to emit a
MOV_INDIRECT whenever we need an indirect uniform load. We also have to
rework some of the other bits of the backend to handle this new form of
uniform load. The obvious change is that demote_pull_constants now acts
more like a lowering pass when it hits a MOV_INDIRECT.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
While we're at it, we also add support for the possibility that the
indirect is, in fact, a constant. This shouldn't happen in the common case
(if it does, that means NIR failed to constant-fold something), but it's
possible so we should handle it.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The subnr field is in bytes so we don't need to multiply by type_sz.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It should work fine without it and the visitor can set it if it wants.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The driver will be installed to $(libdir)/libvulkan_intel.so and just
providing a driver name is enough for the loader. This also ensures that
multi-arch systems work ok.
This will fix the spurious error message: "Failed to query GPU properties."
that was unintentionally added in cc01b63d73.
This patch changes the function to return an int so that the caller is able to
do stuff based on the return value.
The equivalent of this patch was in the original series that fixed up the
warning, but I dropped it at the last moment. It is required to make the desired
behavior of not warning when trying to query GPU properties from the kernel
unless there is something the user can do about it.
v2: Use strerror (Jason)
Make EINVAL check similar in all places (Ian)
NOTE: Broadwell appears to actually have some issue where the kernel returns
ENODEV when it shouldn't be. I will investigate this separately.
Reported-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
I neglected to free the sampler view which was created earlier in the
function. So for each glCallLists() command that used the bitmap atlas
to draw text, we'd leak a sampler view object.
Also, check for st_create_texture_sampler_view() failure and record
GL_OUT_OF_MEMORY.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
I removed that return 0 by mistake. Ooops.
Fixes: 6e23fd4 ("nvc0: allow to use compute support on GM200")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This works like a charm but please not that NVF0_COMPUTE have to be set
because compute support is still not enabled by default on GK110+. This
will require more testing to make sure it won't break the 3D state.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
While it does rely on NIR, it's not really part of the NIR core. At the
moment, it still builds as part of libnir but that can be changed later if
desired.
Emil Velikov:
- Attribute the src/{glsl,compiler}/nir move
- Flesh out to separate SConscript
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Not supported by MSVC, and completely unnecessary -- inline functions
work just as well.
NIR_SRC_INIT/NIR_DEST_INIT could and probably should be replaced by the
inline functions.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Rather than having two almost identical Makefiles, with various VPATH
hacks just fold them, using COMMON_* variables and actually getting
things buildable/shipable.
v2: whitespace fixes, remove Makefile.sources-arch
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
This fixes GS piglit failures after adding SI_PARAM_SHADER_BUFFERS,
which bumped NUM_USER_SGPRS and uncovered this bug on SI.
If this was fixed in LLVM, these workarounds wouldn't be needed.
LLVM would have to look at the calling convention to know how many SGPR
inputs are declared, and add VCC and the scratch wave offset (which is
enabled even if we spill SGPRs but not VGPRs, oh well).
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Except:
- u_cache_test -- too long
- translate_test -- unreliable (it's probably testing corner cases that
translate module doesn't care about.)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Users should never provide a scissor or viewport count of 0 because
they are required to set such state in a graphics pipeline. This
behavior was previously only used in Meta, which actually just
disables those hardware operations at pipeline creation time.
Kristian noticed that the current assignment of viewport count
reduces the number of viewport uploads, so it is not removed.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Meta currently uses screenspace RECTLIST primitives that lie within
the framebuffer rectangle. Since this behavior shouldn't change in the
future, disable the scissor operation whenever rectlists are used.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
For the following reasons, there is no behavioural change with this
commit: the ViewportXYClipTest function of the CLIP stage will continue
to be enabled outside of Meta (where disable_viewport is always false),
and the CLIP stage is turned off within Meta, so this function will
continue to be disabled in that case.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
According to 3D Primitives Overview in the Bspec, when the RECTLIST
primitive is in use, the CLIP stage should be disabled or set to have
a different Clip Mode, and Viewport Mapping must be disabled:
Clipping: Must not require clipping or rely on the CLIP unit’s
ClipTest logic to determine if clipping is required. Either the CLIP
unit should be DISABLED, or the CLIP unit’s Clip Mode should be set
to a value other than CLIPMODE_NORMAL.
Viewport Mapping must be DISABLED (as is typical with the use of
screen-space coordinates).
We swap out ::disable_viewport for ::use_rectlist, because we currently
always use the RECTLIST primitive when we disable viewport mapping, and
we'll likely continue to use this primitive.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Avoid excessive state emission. Relevant state for an action command
will get set by the user:
From Chapter 5. Command Buffers,
When a command buffer begins recording, all state in that command
buffer is undefined.
[...]
Whenever the state of a command buffer is undefined, the application
must set all relevant state on the command buffer before any state
dependent commands such as draws and dispatches are recorded, otherwise
the behavior of executing that command buffer is undefined.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
CmdSet* functions dirty the CommandBuffer's dynamic state. This causes
the new state to be emitted when CmdDraw is called. Since we don't need
the state that would be emitted, don't call the CmdSet* functions.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Since the scissor rectangle always matches that of the framebuffer,
this operation isn't needed.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
The old version of the pass only worked on globals and locals and always
left inputs, outputs, uniforms, etc. alone.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The old GLSL IR based lowering doesn't quite work right in all cases,
and fails several dEQP-GLES31 and Vulkan CTS tests. Jason's new
approach in NIR passes all the tests. There's not likely to be a ton
of advantage to lowering early in GLSL IR anyway, so...switch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The algorithm used is different from both the naive suggestion from the
GLSL spec and the one used in GLSL IR today. Unfortunately, the GLSL IR
implementation that we have today doesn't handle denormals (for those that
care) or the case where the float source is +-inf.
Reviewed-by: Matt Turner <mattst88@gmail.com>
It's not really doing enough anymore to justify a helper function.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reveiewed-by: Kristian Høgsberg <krh@bitplanet.net>
There are several passes where we need to specify some set of variable
modes that the pass needs top operate on. This lets us easily do that.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
- Incorporate flatshade flag into the shader generation
- Use provoking vertex (vc) in shader when flat shading.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This reverts commit 62fa868728.
dEQP-GLES3.functional.occlusion_query.* was unhappy about that change.
Still not really sure *what* the other slots in the sample results
buffer are.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This one is slightly annoying, since trying to write RBRC from draw
would clobber values set in the tiling/gmem code. We could do command-
stream patching for RBRC, as is done on a3xx. Although since it seems
to be a rarely used feature, it is easier just to do RMW to set/clear
the bit.
Fixes dEQP-GLES3.functional.rasterizer_discard.basic.write_depth_triangles
and related tests.
a3xx still needs the same feature, although there it probably makes more
sense to take advantage of the existing cmdstream patching which is
required for RBRC for other reasons.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Seems like a4xx needs offset added to array index for all arrays,
whereas a3xx only for cubemap arrays. Fixes a whole swath of dEQP fails
(roughly *sampler2darray*).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We need to increment offset by # of vertices, not by # of prims. Fixes
a bunch of dEQP fails involving prims other than points. For example,
dEQP-GLES3.functional.transform_feedback.position.lines_separate
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If changed && append, we shouldn't be resetting the internal offset back
to zero. This fixes issues w/ sequences like:
glBeginTransformFeedback()
glDraw()
glPauseTransformFeedback()
glDraw()
glResumeTransformFeedback()
glDraw()
glEndTransformFeedback()
Fixes dEQP-GLES3.functional.transform_feedback.array.separate.points.lowp_vec3
and related tests.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
dEQP noticed that we were advertising completely bogus values. The
actual maximum is 127.0f.
*But* we have to use an artifically low maximum to work around a bug
in the dEQP test, which gets confused when the max line width is too
large and lines start going off-screen.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
There are still some edge cases which result in a neighbor-loop. Which
needs to be fixed, but this hack at least makes deqp tests finish.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Fixes a bunch of flat-varying fail on a4xx (where we need to use ldlv to
read the un-interpolated varying).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Since we cannot mov into a predicate register, the frontend uses a
'cmps.s p0.x, cond, 0' as a stand-in for mov to p0.x. It does this
since it has no way to know that the source cond instruction (ie.
for a kill, br, etc) will only be used to write the predicate reg.
Detect this, and re-write the instruction writing p0.x to skip the
original cmps.[sfu]. (It is done like this, rather than re-writing
the dest of the first cmps.[sfu] in case the first cmps.[sfu]
actually has other users.)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
vertex_input_slots would be an appropriate name for an integer, but not
a bool.
Also remove a cond ? true : false from a count_attribute_slots() call
site, noticed during the rename.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The credit for finding and isolating this bug goes to Vinson and Roland.
The buggy LLVM versions were found by doing
opt -instcombine llvm-pr27332.ll > /dev/null
where llvm-pr27332.ll is the IR from
https://llvm.org/bugs/show_bug.cgi?id=27332#c3
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Now, one can do the following to generate and read the nir Doxygen:
cd $MESA_TOP/doxygen
make
firefox nir/index.html
Update v2:
Correct TAGFILES in nir.doxy
Signed-off-by: Elie TOURNIER <tournier.elie@gmail.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
[Emil Velikov] v3: Rebase.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
These Doxygen features are deprecated, as reported by Doxygen 1.8.9.1
Warning: Tag `USE_WINDOWS_ENCODING' at line 66 of file `common.doxy' has become obsolete.
To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
Warning: Tag `DETAILS_AT_TOP' at line 157 of file `common.doxy' has become obsolete.
To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
Warning: Tag `HTML_ALIGN_MEMBERS' at line 616 of file `common.doxy' has become obsolete.
To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
Warning: Tag `XML_SCHEMA' at line 848 of file `common.doxy' has become obsolete.
To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
Warning: Tag `XML_DTD' at line 854 of file `common.doxy' has become obsolete.
To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
Warning: Tag `MAX_DOT_GRAPH_WIDTH' at line 1115 of file `common.doxy' has become obsolete.
To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
Warning: Tag `MAX_DOT_GRAPH_HEIGHT' at line 1123 of file `common.doxy' has become obsolete.
To avoid this warning please remove this line from your configuration file or upgrade it using "doxygen -u"
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
core.doxy was renamed to main.doxy, along with output folder in
the below 2004 commit.
Correct the other modules' TAGFILE linkage to find the main folder.
commit 3ef972f538
Author: Brian Paul <brian.paul@tungstengraphics.com>
Date: Sun May 16 22:07:02 2004 +0000
Replaced 'core' with 'main'.
Other minor updates.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
The last of these output directories was removed in 2007.
commit c2e0570831
Author: Jerome Glisse <glisse@freedesktop.org>
Date: Fri Feb 16 23:18:56 2007 +0100
Update doxygen doc to reflet vbo changes.
Update doxygen doc, array_cache no longuer exist,
new shiny vbo modules is there. Tested on unix,
but i think i didn't broke that bat :).
commit 3ef972f538
Author: Brian Paul <brian.paul@tungstengraphics.com>
Date: Sun May 16 22:07:02 2004 +0000
Replaced 'core' with 'main'.
Other minor updates.
commit 69db632a9d
Author: Jose Fonseca <j_r_fonseca@yahoo.co.uk>
Date: Thu May 1 23:32:54 2003 +0000
Move the Doxygen configuration files into the usual places and integrate with the build system.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
miniglx was removed in February 2010. Clean up remaining
unnecessary doxygen references.
commit a9e3669683
Author: Kristian Høgsberg <krh@bitplanet.net>
Date: Thu Feb 25 16:17:04 2010 -0500
Remove remaining miniglx references
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
There has never been a doxygen/gbm_setup output folder.
Appears to have been a copy-paste error from original commit
in 245341f406.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Per Doxygen documentation, to combine external documentation (stored in
a *.tag file) with a project the TAGFILES option should be set in the
configuration file.
A tag file typically only contains a relative location of the
documentation from the point where doxygen was run. So when
you include a tag file in other project you have to specify
where the external documentation is located in relation this
project.
You can do this in the configuration file by assigning the
(relative) location to the tag files specified after the
TAGFILES configuration option.
If you use a relative path it should be relative with respect
to the directory where the HTML output of your project is
generated; so a relative path from the HTML output directory
of a project to the HTML output of the other project that is
linked to.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
The src/mesa/glapi folder was relocated in the below commit.
Amend the doxygen/glapi.doxy INPUT setting accordingly.
Whilst here, in addition this change also avoids a bug in the
consolidated Doxygen output caused by doxygen/glapi.doxy inadvertently
overwriting doxygen/swrast.tag via its GENERATE_TAGFILE setting.
This bug depended upon the specific order each *.tag was built.
commit 296adbd545
Author: Chia-I Wu <olv@lunarg.com>
Date: Mon Apr 26 12:56:44 2010 +0800
glapi: Move to src/mapi/.
Move glapi to src/mapi/{glapi,es1api,es2api}.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Mesa has not had a src/mesa/shader/ folder since Mesa 7.9 removed it
in October 2010, as part of a revised GLSL compiler written by Intel.
Remove doxygen/shader.doxy and consequential changes made throughout.
In addition to removing an unnecessary Doxygen doxyfile, this change also
avoids a bug in the consolidated Doxygen output caused by
doxygen/shader.doxy inadvertently overwriting doxygen/swrast.tag via its
GENERATE_TAGFILE setting.
This bug depended upon the specific order each *.tag was built.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
We used to use sse roundps intrinsic directly, but switched to use the llvm
intrinsics for rounding with e4f01da15d.
However, llvm semantics follows standard math lib round function which is
specced to do roundNearestAwayFromZero but we really want roundNearestEven
(moreoever, using round generates atrocious code since the cpu can't do it
directly and it results in scalar calls to libm __roundf).
So, use llvm.nearbyint instead, which does exactly the right thing, and even
has the advantage of being available with llvm 3.3 too. (I've verified it
actually generates a roundps instruction with llvm 3.3.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94909
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
libasan is never linked to shared objects (which doesn't go well with
-z,defs). It must either be linked to the main executable, or (more
practically for OpenGL drivers) be pre-loaded via LD_PRELOAD.
Otherwise works.
I didn't find anything with llvmpipe. I suspect the fact that the
JIT compiled code isn't instrumented means there are lots of errors it
can't catch.
But for non-JIT drivers, the Address/Leak Sanitizers seem like a faster
alternative to Valgrind.
Usage (Ubuntu 15.10):
scons asan=1 libgl-xlib
export LD_LIBRARY_PATH=$PWD/build/linux-x86_64-debug/gallium/targets/libgl-xlib
LD_PRELOAD=libasan.so.2 any-opengl-application
Acked-by: Roland Scheidegger <sroland@vmware.com>
This is supposed to be INVALID_OPERATION in ES. We already did this
for the fv/iv variants, but not Iiv/Iuv, which are new in ES 3.2 (or
extensions).
Fixes:
ES31-CTS.texture_border_clamp.samplerparameteri_non_gen_sampler_error
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Courtesy of address sanitizer.
[airlied: free buffers as well]
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is the last necessary bit for OpenGL 4.2 support. All driver-specific
functionality has already been implemented as part of extensions.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
This is kind of a hack. We currently track precise requirements
by decorating ir_variables. Propagating or grafting the RHS of an
assignment to a precise value into some other expression tree can
lose those decorations.
In the long run, it might be better to replace these ir_variable
decorations with an "exact" decoration on ir_expression nodes,
similar to what NIR does.
In the short run, this is probably good enough. It preserves
enough information for glsl_to_nir to generate "exact" decorations,
and NIR will then handle optimizing these expressions reasonably.
Fixes ES31-CTS.gpu_shader5.precise_qualifier.
v2: Drop invariant handling, as it shouldn't be necessary (caught
by Jason Ekstrand).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
ARB_program_interface_query requires that we add struct fields
recursively down to basic types.
Fixes 52 struct test cases in dEQP-GLES31.functional.program_interface_query.*
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No functional change here, but this now lets us recurse throught structs
in add_shader_variable().
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This lets us pass in the absolution location of a variable instead of
computing it in add_shader_variable() based on variable location and
bias. This is in preparation for recursing into struct variables.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This consolidates the combination of create_shader_variable() and
add_program_resource() into a new helper function. No functional
difference, but we'll expand add_shader_variable() in the next few
commits.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The existing code uses SSSE3, and because it isn't compiled in a
separate file compiled with that, it is usually not used (that, of
course, could be fixed...), whereas SSE2 is always present with 64-bit
builds. This should be pretty much as fast as the pshufb version,
albeit those code paths aren't really used on chips without llc in any
case.
v2: fix andnot argument order, add comments
v3: use pshuflw/hw instead of shifts (suggested by Matt Turner), cut comments
v4: [mattst88] Rebase
Reviewed-by: Matt Turner <mattst88@gmail.com>
Replaces four byte loads and four byte stores with a load, bswap,
rotate, store; or a movbe, rotate, store.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It is incorrect to assume that pixel format is always in BGR byte order.
We need to check bitmask parameters (such as |redMask|) to determine whether
the RGB or BGR byte order is requested.
v2: reformat code to stay within 80 character per line limit.
v3: just fix the byte order problem first and investigate SRGB later.
v4: rebased on top of the GLES3 sRGB workaround fix.
v5: rebased on top of the GLES3 sRGB workaround fix v2.
Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This fixes
dEQP-GLES31.functional.uniform_location.negative.atomic_fragment
dEQP-GLES31.functional.uniform_location.negative.atomic_vertex
Both of which have lines like
layout(location = 3, binding = 0, offset = 0) uniform atomic_uint uni0;
The ARB_explicit_uniform_location spec makes a very tangential mention
regarding atomic counters, but location isn't something that makes sense
with them.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The XMesaVisual instances freed in the visuals table on display close
are being freed with a free() call, instead of XMesaDestroyVisual(),
causing a memory leak.
Signed-off-by: John Sheu <sheu@google.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
GLvoid was used before in OpenGL but it has changed to just using void.
All GLvoids in mesa's state tracker has been changed to void in this patch.
Tested this with piglit and no problems were found. No compiler warnings.
Signed-off-by: Jakob Sinclair <sinclair.jakob@openmailbox.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Now it follows the compatibility criteria listed in section 2.1 of
the GLX 1.4 specification.
This is needed for post-process effects in SW:KotOR.
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
- Check for unused blocks every few frames or every 64K draws
- Delete data unused since the last check if total unused data is > 20MB
Doesn't seem to cause a perf degridation
Acked-by: Brian Paul <brianp@vmware.com>
If the glDrawPixels size changed, we leaked the previously cached
texture, if there was one. This patch fixes the reference counting,
adds a refcount assertion check, and better handles potential malloc()
failures.
Tested with a modified version of the drawpix Mesa demo which changed
the image size for each glDrawPixels call.
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Guard band clipping speeds up rasterization for primitives that are
partially off-screen. This change in particular results in small
framerate improvements in a wide range of games.
Started by Grigori Goronzy <greg@chown.ath.cx>.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The closed driver does this, but it looks at base_level and last_level
and uses a conditional assignment, which LLVM can't generate on SGPRs.
That led me to invent this solution that abuses the image descriptor.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Just for consistency. This is actually not a problem, because both addrlib
and radeon check and fix this.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is a remnant of the times when the DDX was allocating depth-stencil
buffers for windows. Now, st/dri allocates them and doesn't share them.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Add missing PIPE_SHADER_CAP_INTEGERS for frag shaders to
nv30_screen_get_shader_param().
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
It is incorrect to assume BGRA byte order for the GLES3 sRGB workaround.
v2: use _mesa_get_srgb_format_linear to handle all formats
Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This adds support for the features requires for ARB_shader_storage_buffer_object
and ARB_shader_atomic_counters, ARB_shader_atomic_counter_ops.
[airlied: some cleanups applied]
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support for doing load/store/atomic operations on
buffer objects.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For atomic operations we really need to avoid executing unnecessary shaders, so for some
tests that just draw a single point we only want one vertex to get processed not 4,
this fixes a number of the atomic counters tests.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dolphin uses them a lot. Range tracking would be better in the long term,
but this two lines works fine for now.
Signed-off-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This makes the extra multiply visible to NIR's algebraic optimizations
(for constant reassociation) as well as constant folding. This means
that when the result of sin/cos are multiplied by an constant, we can
eliminate the extra multiply altogether, reducing the cost of the
workaround.
It also means we only have to implement it one place, rather than in
both backends.
This makes INTEL_PRECISE_TRIG=1 cost nothing on GPUTest/Volplosion,
which has a ton of sin() calls, but always multiplies them by an
immediate constant. The extra multiply gets folded away.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Some passes may not refer to options->..., at which point the compiler
will warn about an unused variable. Just cast to void unconditionally
to shut it up.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Many shaders contain expression trees of the form:
const_1 * (value * const_2)
Reorganizing these to
(const_1 * const_2) * value
will allow constant folding to combine the constants. Sometimes, these
constants are 2 and 0.5, so we can remove a multiply altogether. Other
times, it can create more immediate constants, which can actually hurt.
Finding a good balance here is tricky. While much more could be done,
this simple patch seems to have a lot of positive benefit while having
a low downside.
shader-db results on Broadwell:
total instructions in shared programs: 8963768 -> 8961369 (-0.03%)
instructions in affected programs: 438318 -> 435919 (-0.55%)
helped: 1502
HURT: 245
total cycles in shared programs: 71527354 -> 71421516 (-0.15%)
cycles in affected programs: 11541788 -> 11435950 (-0.92%)
helped: 3445
HURT: 1224
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Instead of using an array indexed by SYSTEM_VALUE_x, just use a
switch statement. This fixes a regression caused by inserting new
SYSTEM_VALUE_ enums but not updating the mapping to TGSI semantics.
v2: fix a few switch statement mistakes for compute-related enums
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
These weren't added before because they are actually calculated values that
are computed from other inputs. However, in order to handle them in
nir_lower_system_values, it's nice for them to have a cannonical locaiton.
Reviewed-by: Rob Clark <robdclark@gmail.com>
In Vulkan, you have InstanceIndex which begins at the base instance value
rather than the zero-based InstanceID of GL.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Analogous to previous commit - improved readability at the expense of
an extra file.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Preserve the functionality while keeping the files smaller and
more readable.
v2: Do not include Makefile.sources from the GLSL makefile (silences
automake warnings)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
This reverts commit 41c7912d04 but leaves
out the pragma [that inspired the original commit].
Building mesa requires MSVC2013 or later, thus we no longer need this.
v2: Use correct include path (src/glsl/nir -> src/compiler/nir)
Conflicts:
src/gallium/auxiliary/Makefile.am
Acked-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
v2 (Sam):
- Use uint64 instead of float64 for sources and destinations. (Connor)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2 (Sam):
- Use uint64 instead of float64 for sources and destinations. (Connor)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- Squash the printing doubles related patches into one patch (Sam).
v3:
- Print using PRIx64 format: long is 32-bit on some 32-bit platforms but long
long is basically always 64-bit (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- Make the users to give the right bit_sizes as arguments (Jason).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It will only end up getting exposed on gen8+ since it requires GL ES
3.1, but it should be ready to go on gen7 when support for GL ES 3.1 is
completed there.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Consider the case of linking a program with both a vertex and fragment
shader. The VS may compute output varyings that are intended for
transform feedback, and not read by the fragment shader.
In this case, var->data.is_unmatched_generic_inout will be true,
but we still cannot eliminate the varyings. We need to also check
!var->data.is_xfb_only.
Fixes failures in ES31-CTS.gpu_shader5.fma_precision_*, which happen
to use transform feedback in a way we apparently hadn't seen before.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Bit 15 means "interleave" for most messages, but for SIMD8 messages it
means "use channel masks".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The GLSL 4.20 and ESSL 3.00 specs don't list 'buffer' as a reserved
keyword. Make the parser ignore it unless GLSL 4.30 / ESSL 3.10 are
used, or ARB_shader_storage_buffer_objects is enabled.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
This should hopefully make it a little easier to debug with GL
applications like glretrace and looking at command streams.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Provide a callback to reallocate the underlying storage of a resource so
that it is not bound to any existing fences.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This lets us write the Z directly from the FTOI for computed Z, and may
let us coalesce color writes in the future.
No change in my shader-db, but clearly drops an instruction in piglit's
early-z test.
It wasn't correctly flagged everywhere, and QPU generation now handles the
only remaining case that was paying attention to it.
No change on shader-db.
Normal SFU writes couldn't have SF because they were marked as
multi_instruction, but tex_result and tlb_color_read weren't. This ended
up not being a problem according to anything in shader-db, but it seems
possible.
There used to be multi-instruction operations that would use src[] twice,
which is why we couldn't do some optimizations on them. This is no longer
the case.
total instructions in shared programs: 77973 -> 77969 (-0.01%)
instructions in affected programs: 84 -> 80 (-4.76%)
total estimated cycles in shared programs: 234165 -> 234157 (-0.00%)
estimated cycles in affected programs: 92 -> 84 (-8.70%)
I liked having all my NIR be scalar, but nir_validate() complains that the
intrinsic writes 4 components but the destination we set up was only 1
component. I could generate a new scalar variant, but it's a lot easier
to just leave it as a vec4. This doesn't hurt codegen since we GC unused
uniforms, and UCP dot products use all the components anyway.
We don't really suppor control flow yet, but it's a lot nicer to render
something and warn on stderr than to crash.
Fixes the following piglit tests:
- shaders/complex-loop-analysis-bug
- shaders/glsl-fs-discard-04
Converts the following piglit tests from crash to fail:
- shaders/glsl-fs-continue-inside-do-while
- shaders/glsl-fs-loop
- shaders/glsl-fs-loop-continue
- shaders/glsl-fs-loop-nested
- shaders/glsl-texcoord-array
- shaders/glsl-vs-continue-inside-do-while
- shaders/glsl-vs-loop
- shaders/glsl-vs-loop-continue
- shaders/glsl-vs-loop-nested
No piglit regressions.
v2 (Eric): Add stronger stderr warning.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We shouldn't have any NIR functions present since all GLSL functions get
inlined, but this would be a more informative error if it does happen.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ensure NIR control flow graph nodes that are unhandled in QIR
are reported with sufficient verbosity to aid debugging.
This improves piglit outputs, amongst other tools.
There are no other remaining uses of assert(0) as a blunt tool
within vc4.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Found with grep and inspection. Test compiled on RPi hw.
Assists any future effort to remove TGSI as an intermediate stage.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: change check_explicit_uniform_locations() to return an
unsigned 0 (Timothy Arceri)
We were storing the int result of check_explicit_uniform_locations()
in num_explicit_uniform_locs as an unsigned int which caused it to
be 4294967295 when a -1 was returned.
This in turn would cause the following error during linking:
error: count of uniform locations > MAX_UNIFORM_LOCATIONS(4294967295 > 98304)
Results from running piglit tests/all with this patch
and when ARB_explicit_uniform_location disabled:
changes: 178
fixes: 176
regressions: 2
The two regressions are for the following tests:
glean@glsl1-matrix column check (1)
glean@glsl1-matrix column check (2)
which regress from FAIL to CRASH.
The regressions are acceptable because the tests are currently failing due to
the aforementioned linker error.
Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The only place we were using this was in meta_blit2d which always creates a
new image anyway so we can just use the image offset.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Now it just creates the image and view. The caller is responsible for
handling the offset calculations.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The old function tried to work in elements which isn't, strictly speaking,
a valid thing to do. In the case of a non-power-of-two format, there is no
guarantee that the x offset into the tile is a multiple of the format
block size. This commit refactors it to work entirely in terms of a tiling
(not a surface) and bytes/rows.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
With the new blit framework we aren't using array textures and, from
talking with Nanley, we don't think it's going to be useful in the future
either. Just get rid of it for now.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Now that we can use the much simpler rgba8_copy function, we don't need to
hand different functions out based on direction.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Each of the [de]tiling functions has three mem_copy calls:
1) Left edge to tile boundary
2) Tile boundary to tile boundary in a loop
3) Tile boundary to right edge
Copies 2 and 3 start at a tile edge so the pointer to tiled memory is
guaranteed to be at least 16-byte aligned. Copy 1, on the other hand,
starts at some arbitrary place in the tile so it doesn't have any such
alignment guarantees.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Now that the check is restricted to gen8+, we should always get back a non-zero
positive value for the EU and subslice counts.
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Older gen platforms do not actually return a value for sublice and eu total
(IMO, confusingly) they return -ENODEV. This patch defers the SSEU setup until
we have the actual GPU generation to avoid useless warnings when running on
older platforms with older kernels.
Reported-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the first call in a GL app is glReadPixels(GL_FRONT) we'd fail the
assert(st->ctx->FragmentProgram._Current) at st_atom_shader.c:114 in
update_fp().
This is because we were calling st_validate_state() without first
updating Mesa state with _mesa_update_state().
The regression came from commit 83b589301f "st/mesa: fix
frontbuffer glReadPixels regressions".
The new piglit gl-1.0-simple-readbuffer test exercises this.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
In other words, vport scissors are derived from viewport states.
If the scissor test is enabled, the intersection of both is used.
The guard band will disable clipping, so we have to clip per-pixel.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is in preparation of raising the number of exposed sampler views to 32
bits, which will raise the total number of sampler views to 33 for the
polygon stipple texture. That texture should never be compressed (and it's
certainly not a depth texture), but this approach seems cleaner to me than
special-casing the last slot in all affected code paths.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The previous value of 18 was motivated by having drivers that want to expose
16 samplers but also use some additional samplers for internal use. Raising
the value even higher isn't going to hurt that case.
On the other hand, some drivers actually use PIPE_MAX_SAMPLERS as the number
of samplers they expose externally, so raising this number above 32 is fragile
(because several places in the code use bitfields, and tracking down and
widening all of them is prone to miss some case).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It is used as a bitfield, so it seems cleaner to keep it unsigned.
The literal 1 is a (signed) int, and shifting into the sign bit is undefined
in C, so change occurences of 1 to 1u.
v2: add an assert for bitfield size and use 1u << idx
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
The literal 1 is a (signed) int, and shifting into the sign bit is undefined
in C, so change occurences of 1 to 1u.
Reviewed-by: Brian Paul <brianp@vmware.com>
Line anti-aliasing will fail when there is no free sampler available. Make
the corresponding guard more robust in preparation of raising
PIPE_MAX_SAMPLERS to 32.
The literal 1 is a (signed) int, and shifting into the sign bit is undefined
in C, so change occurences of 1 to 1u.
v2: add an assert for bitfield size and use 1u << idx
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v1)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
When hasFixedUnit is false, polygon stippling will fail when there is no free
sampler available. Make the corresponding guard more robust in preparation
of raising PIPE_MAX_SAMPLERS to 32.
The literal 1 is a (signed) int, and shifting into the sign bit is undefined
in C, so change occurences of 1 to 1u.
v2: add an assert for bitfield size and use 1u << idx
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v1)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
These small mallocs will probably never fail, but static analysis tools
may complain about the missing checks.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Lets give the developer a little hand if we are going to assert
on a zero literal at the end of a branch.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For ARB_framebuffer_no_attachment; A is_format_supported() query
with 'PIPE_FORMAT_NONE' passed implies a query of the number of
samples supported from the framebuffer with no attachment.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Carries across the number of samples and layers state in the
'softpipe_set_framebuffer_state()' callback. This state is
part of 'ARB_framebuffer_no_attachments' support.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Handle the case of ARB_framebuffer_no_attachment.
Also, kill off a dead debug printf() call while we are here.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Here we store the number of samples and layers directly in the
pipe_framebuffer_state so that in the case of
ARB_framebuffer_no_attachment we may make use of them directly.
Further, we adjust various gallium/auxiliary helper functions
accordingly.
V2:
Convert branches in util_framebuffer_get_num_layers() and
util_framebuffer_get_num_samples() to their canonical form.
V3:
'git stash pop' the typo fix of 'cbufs' which should be
'nr_cbufs' that was missing in V2, woops! Thanks Marek for
pointing this out yet again.
V4:
Squash in the following patch:
'gallium/util: Ensure util_framebuffer_get_num_samples() is valid'
Upon context creation, internal driver structures are malloc()'ed
and memset() to zero them. This results in a invalid number of
samples 'by default'. Handle this in the simplest way to avoid
elaborate and probably equally sub-optimial solutions.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Using PIPE_FORMAT_NONE to indicate what MSAA modes are supported
with a framebuffer using no attachment.
V.2:
Rewrite MSAA mode loop to be more general.
V.3:
Move comment to right place after loop was rewritten.
V.4: [airlied]
remove unneeded variable, and assert, and unneeded pipe assignment
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Set default values for the constants required in
ARB_framebuffer_no_attachments and obtained the number
of layers from ``PIPE_CAP_MAX_TEXTURE_ARRAY_LAYERS``.
We also obtain the MaxFramebufferSamples value using
a query back to the driver for PIPE_FORMAT_NONE.
V.1:
Merge if branch predicates into one branch.
Move const init into st_init_limits()
[airlied: whitespace fixup]
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add PIPE_CAP to determine if the GL extension
'GL_ARB_framebuffer_no_attachments' shall be
supported.
The driver is required to support 'PIPE_FORMAT_NONE'
via its 'is_format_supported()' callback in order
to determine the MSAA modes the hardware supports so
that values requested from the application using
'GL_ARB_framebuffer_no_attachments' may be quantized
to what the hardware expects.
V.2:
Fix doc for a more detailed description of the PIPE_CAP
and the corresponding GL constant.
V.3:
Renamed and repurposed once again.
V.4:
Remove CAP from cap_mapping array.
[airlied: fix damaged whitespace]
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Change references to gl_framebuffer::Width, Height, MaxNumLayers
and Visual::samples to use the _mesa_geometric_ convenience functions
for those places where the geometry of the gl_framebuffer is needed.
This is in contrast to the geometry of the intersection of the
attachments of the gl_framebuffer.
This patch paves the way to enable GL_ARB_framebuffer_no_attachements
for all gallium drivers.
V.2:
Remove itermeditate variable state.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously, we were walking over the shader source to figure out which
inputs should be marked flat. Now, we can just pull it out of prog_data.
This is needed for properly setting up 3DSTATE_SF/SBE for Vulkan and it
also means that it will get properly cached.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In the Vulkan driver we use a single flat input instead of a uniform
because setting up push constants is more disruptive to the pipeline than
setting up another vertex input. This uses the number of uniforms as a key
to keep it working for the GL driver.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's used by brw_compile_gs in brw_vec4_gs_visitor.cpp so it needs to be in
a file that's linked into libi965_compiler.la.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note that old mesa + new LLVM or new mesa + old LLVM breaks
with this change and the corresponding LLVM change (D18559).
For LLVM version <= 3.8 we use the old method, but we can't detect
people using a post 3.8 svn version that is still too old.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
With this change we create the UBO and SSBO arrays separately from the
beginning rather than putting them into a combined array and splitting
it apart later.
A bug is with UBO and SSBO stage reference querying is also fixed as
we now use the block index to lookup the references in the separate arrays
not the combined buffer block array.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
We had an implicit assumption that the phi src was assigned in it's
source (pred) block leading into the phi. But this is not true with
NIR, so we can't just ignore the source block specified in the
nir_phi_src. Insert an extra mov in the source block. If it is not
required the CP pass will take it back out again.
Fixes:
./tests/spec/glsl-1.10/execution/vs-call-in-nested-loop.shader_test
./tests/spec/glsl-1.10/execution/vs-inner-loop-modifies-outer-loop-var.shader_test
and probably others.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The frontend inserts (abs) and (neg)'s to convert between NIR boolean
(~0/0) and native boolean (1/0). So we'd end up with things like:
cmps.s.ge r1.x, ...
absneg.s r1.x, (neg)r1.x
absneg.s r1.x, (abs)r1.x
sel.b32 r2.x, r0.x, r1.x, r0.y
The (neg) already gets collapsed due to the following (abs). Now by
realizing that r1.x comes from a cmps.s instruction, we can drop the
(abs) as well.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Instead of failing an assertion, disable DCC and CMASK on the first export
that needs it, and merge the external usage flags.
v2: clear the EXPLICIT_FLUSH flag if it's not set; whitespace fixes
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This patch enables an EGL extension, EGL_KHR_reusable_sync.
This new extension basically provides a way for multiple APIs or
threads to be excuted synchronously via a "reusable sync"
primitive shared by those threads/API calls.
This was implemented based on the specification at
https://www.khronos.org/registry/egl/extensions/KHR/EGL_KHR_reusable_sync.txt
v2
- use thread functions defined in C11/threads.h instead of
using direct pthread calls
- make the timeout set with reference to CLOCK_MONOTONIC
- cleaned up the way expiration time is calculated
- (bug fix) in dri2_client_wait_sync, case EGL_SYNC_CL_EVENT_KHR
has been added.
- (bug fix) in dri2_destroy_sync, return from cond_broadcast
call is now stored in 'err' intead of 'ret' to prevent 'ret'
from being reset to 'EGL_FALSE' even in successful case
- corrected minor syntax problems
v3
- dri2_egl_unref_sync now became 'void' type. No more error check
is needed for this function call as a result.
- (bug fix) resolved issue with duplicated unlocking of display in
eglClientWaitSync when type of sync is "EGL_KHR_REUSABLE_SYNC"
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Otherwise we end up with funny things like:
mov.f32f32 r0.x, r1.y
mov.f32f32 r0.x, r1.y
(It doesn't happen as much after fixing the problem w/ CP into phi src,
but it can still happen since we aren't too clever about generating phi
sources in the first place.)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The block defining a phi source might not have been executed. If we
allow copy propagation, we could end up pointing to a src instruction in
the wrong block.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Been on my TODO list for a while. If nothing else this will make gdb
properly grok the opc_t enum.
This first step preserves ir3_instruction::category (with an added
assert that category matches what is encoded in opc_t). Next step is
to drop the category field (and arg to ir3_instr_create()), but that
is split into next commit for bisectability and so that we can run
piglit in the intermediate state to flush out any problems.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
They are compute-shader only and that's where the code for doing atomics on
shared variables lives so it seemes to make sense.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Our hardware requires an LOD for all texelFetch commands even if they are
on buffer textures. GLSL IR gives us an LOD of 0 in that case, but the LOD
is really rather meaningless. This commit allows other NIR producers to be
more lazy and not provide one at all.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
%ld and %lu aren't the right format specifiers for int64_t and uint64_t
on 32-bit (x86) systems. They're %zu on Linux and %Iu on Windows.
Use the standard C99 macros in hopes that they work everywhere.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
There may not be a previous block. In this case, there's no real work
to do, so just continue on to the next one.
v2: Update for bblock->prev() API change.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The bblock_t::prev/prev_const/next/next_const API returns bblock_t
pointers, rather than exec_nodes. So it's a bit surprising.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
lower_variable_index_to_cond_assign() did not handle system values.
gl_SampleMaskIn[] is a system value, and also an array. Accessing it
with a variable index would trigger an unreachable() assert.
Rather than adding a new EmitNoIndirectSystemValues flag, we simply
lower unconditionally. There is exactly one case where this occurs,
and for all current drivers, lowering produces optimal code. Even
for future drivers with 32x MSAA, it produces reasonable code.
Fixes Piglit's new samplemaskin-indirect test. Also fixes many ES31-CTS
tests when OES_sample_variables is enabled.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In the first pass of implementing exact handling, I made a mistake with
search-and-replace. In particular, we only reallly handled exact/inexact
on the root of the tree. Instead, we need to check every node in the tree
for an exact/inexact match. As an example of this, consider the following
GLSL code
precise float a = b + c;
if (a < 0) {
do_stuff();
}
In that case, only the add will be declared "exact" and an expression that
looks for "b + c < 0" will still match and replace it with "b < -c" which
may yield different results. The solution is to simply bail if any of the
values are exact when matching an inexact expression.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The SIN and COS instructions on Intel hardware can produce values
slightly outside of the [-1.0, 1.0] range for a small set of values.
Obviously, this can break everyone's expectations about trig functions.
According to an internal presentation, the COS instruction can produce
a value up to 1.000027 for inputs in the range (0.08296, 0.09888). One
suggested workaround is to multiply by 0.99997, scaling down the
amplitude slightly. Apparently this also minimizes the error function,
reducing the maximum error from 0.00006 to about 0.00003.
When enabled, fixes 16 dEQP precision tests
dEQP-GLES31.functional.shaders.builtin_functions.precision.
{cos,sin}.{highp,mediump}_compute.{scalar,vec2,vec4,vec4}.
at the cost of making every sin and cos call more expensive (about
twice the number of cycles on recent hardware). Enabling this
option has been shown to reduce GPUTest Volplosion performance by
about 10%.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
See commit 3b0279a69 - this restriction is documented in the "Surface
Format" field of RENDER_SURFACE_STATE.
Looking at newer documentation, this restriction appears to exist on
Haswell, but no longer applies on Gen8+.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
This extension is identical to ARB_base_instance. Reuse the same
entrypoints.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The extension spec was extended to also support ES. This functionality
is provided all the way back to ES 1.0.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
For adding .v4f32 like suffixes to intrinsics, taking special care for
scalar case, which was being often neglected.
This fixes invalid IR when doing mipmap filtering on SSE2 (the only
case where we'd use intrinsics with scalars.)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Also avoid double-adding the *sampler2DMS types when the array ext is
enabled.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
I can't tell whether this actually matters, but we're creating function
signatures with this predicate, so it should probably match when SSBO's
are available.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
As the relevant extensions get implemented, the lines should be
uncommented. I believe this is (almost) everything needed for those GL
versions though.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Oddly a bunch of the features it adds are actually from ESSL 3.20. But
the spec is quite clear, oh well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
We could unconditionally use these instrinsics, but performance with SSE2
would suck, as LLVM falls back to calling libm.
lp_test_arit.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
LLVM often can't determine the mask elements are all ones/zeros, and
there doesn't seem to be a good way to hint that.
Thanks to Roland Scheidegger for spotting and analyzing the issue.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Only provide a fallback for LLVM 3.3.
One less dependency on LLVM C++ interface.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The current DSQRT lowering code emits an OP_SELP, so we have to handle
its emission. This will eventually go away, but no harm supporting this
op.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes piglit tests like
tests/spec/glsl-1.10/execution/variable-indexing/vs-output-array-float-index-wr.shader_test
and related ones.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Currently mesa fails building with the x32 abi as ms_abi is not defined
in such a case.
The patch uses ms_abi only for amd64 targets and stdcall only for i386
targets to be sure that those are defined.
This patch additionally checks for __GNUC__ to guarantee that
__attribute__ is available.
CC: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Christian Schmidbauer <ch.schmidbauer@gmail.com>
Acked-by: Axel Davy <axel.davy@ens.fr>
Rather than the currently bound texture. This goes along with the
earlier patch to get away from examining bound textures and sampler
views during shader translation.
Fixes VMware bug 1632739.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This allows us to simplify the code and drop InterfaceBlockStageIndex
which is a per stage array of integers the size of all blocks in the
program combined including duplicates across stages. Adding a stage
ref per block will use less memory.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This changes the code to use the buffer counts stored for each stage
rather than counting from scratch. It also moves the checks outside
of the for loop which means we now just get a single link error
message if we go over the max rather than X error messages where X
is the number we have exceeded the max by.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will allow us to use them when checking resources in a
following patch and clean up a bunch of code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since 8683d54d2b there is now a single instance of the buffer
block information that needs to be updated rather than one instance
for each stage.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With SSO, the GL_PROGRAM_INPUT and GL_PROGRAM_OUTPUT interfaces refer to
the first and last shader stage linked into a program. This may not be
the vertex and fragment shader stages.
So, subtracting VERT_ATTRIB_GENERIC0 and FRAG_RESULT_DATA0 is bogus.
We need to subtract VERT_ATTRIB_GENERIC0 for VS inputs,
FRAG_RESULT_DATA0 for FS outputs, and VARYING_SLOT_VAR0 for other cases.
Note that built-in variables get a location of -1.
Fixes 4 dEQP-GLES31.functional.program_interface_query tests:
- program_input.location.separable_fragment.var_explicit_location
- program_input.location.separable_fragment.var_array_explicit_location
- program_output.location.separable_vertex.var_array_explicit_location
- program_output.location.separable_vertex.var_array_explicit_location
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
We were recording locations for all variables, even ones without an
explicit location set. Implement the rules from the spec, and record
-1 in the resource list accordngly. Make program_resource_location
stop doing math on negative values. Remove hacks that are no longer
necessary now that we've stopped doing that.
Fixes 4 dEQP-GLES31.functional.program_interface_query tests:
- program_input.location.separable_fragment.var
- program_input.location.separable_fragment.var_array
- program_output.location.separable_vertex.var_array
- program_output.location.separable_vertex.var_array
v2: Delete more code
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
A program will either have gl_VertexID or gl_VertexIDMESA (the lowered
zero-based version), not both. Just spoof it in the resource list so
the hacks are done in a single place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
System values are just built-in input variables that we've opted to
special-case out of convenience. We need to consider all inputs,
regardless of how we've classified them.
Unfortunately, there's one exception: we shouldn't add gl_BaseVertex
unless ARB_shader_draw_parameters is enabled, because it doesn't
actually exist in the language, and shouldn't be counted in the
GL_ACTIVE_RESOURCES query.
Fixes dEQP-GLES31.functional.program_interface_query.program_input.
resource_list.compute.empty, which expects gl_NumWorkGroups to appear
in the resource list.
v2: Delete more code
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This makes no sense. If the stage being considered is the vertex
shader, then we'll add inputs and system values appropriately.
If we're not considering the vertex shader, then we absolutely should
not do anything with it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
add_interface_variables() is supposed to add variables for the inputs of
the first shader stage linked into a program, and the outputs of the
last shader stage linked into a program.
From the ARB_program_interface_query specification:
"* PROGRAM_INPUT corresponds to the set of active input variables used by
the first shader stage of <program>. If <program> includes multiple
shader stages, input variables from any shader stage other than the
first will not be enumerated.
* PROGRAM_OUTPUT corresponds to the set of active output variables
(section 2.14.11) used by the last shader stage of <program>. If
<program> includes multiple shader stages, output variables from any
shader stage other than the last will not be enumerated."
Previously, we used build_stageref here, which walks over all linked
shaders in the program. This meant that internal varyings would be
visible. We don't actually need any of build_stageref's code: we
already explicitly skip packed varyings, handle modes, and the name
comparisons just do a fuzzy string comparison of name with itself.
Fixes two tests: dEQP-GLES31.functional.program_interface_query.
program_{input,output}.referenced_by.referenced_by_vertex_fragment.
These tests have a VS and FS linked together into a single program.
Both stages have an input called "shaderInput". But the FS input
should not be visible because it isn't the first stage.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This is a bitfield of which stages refer to a variable. It is not used
to mask off bits. In fact, it's used to contribute additional bits.
Rename it and tidy a bit of the logic.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
add_interface_variables is supposed to add variables from either the
first or last stage of a linked shader. But it has no way of knowing
the stage it's being asked to process, which makes it impossible to
produce correct stagerefs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
If the GL_ARB_shader_draw_parameters extension is enabled, we'll already
have a gl_BaseVertex variable. It will have var->how_declared set to
ir_var_declared_implicitly, and will appear in the program resource
list.
If not, we make one for internal use. We don't want it to be listed
in the program resource list, as the application won't be expecting
it. Marking it hidden will properly exclude it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
We occasionally generate variables internally that we want to exclude
from the program resource list, as applications won't be expecting them
to be present.
The next patch will make use of this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is necessary for ARB_texture_stencil8 support on classic drivers.
Presumably Gallium works because it implements its own ChooseTexFormat.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Note: This patch appears to violate older OpenGL and OpenGLES specs.
The OpenGLES GLSL 3.1 and OpenGL GLSL 4.3 specifications both remove
the requirement for the output and input centroid qualifiers to match.
The deqp
dEQP-GLES3.functional.shaders.linkage.varying.rules.differing_interpolation_2
test wants the newer OpenGLES 3.1 specification behavior, even for
OpenGLES 3.0. This patch simply removes the checking in all cases.
The OpenGLES 3.0 conformance test suite doesn't appear to require the
older ("must match") spec behavior.
For reference, here are the relavent spec citations:
The OpenGL 4.2 spec says: "the last active shader stage output
variables and fragment shader input variables of the same name must
match in type and qualification (other than out matching to in)"
The OpenGL 4.3 spec says: "interpolation qualification (e.g., flat)
and auxiliary qualification (e.g. centroid) may differ."
The OpenGLES GLSL 3.00.4 specification says: "The output of the
vertex shader and the input of the fragment shader form an
interface. For this interface, vertex shader output variables and
fragment shader input variables of the same name must match in type
and qualification (other than precision and out matching to in)."
The OpenGLES GLSL 3.10 Specification says: "interpolation
qualification (e.g., flat) and auxiliary qualification (e.g.
centroid) may differ"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92743
Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=7819
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For radeonsi, native and TGSI use different compilers and this results
in different limits for different IR's.
The set we strictly need for radeonsi is only the MAX_BLOCK_SIZE
and MAX_THREADS_PER_BLOCK params, but I added a few others as shader
related that seemed like they would also typically depend on the
compiler.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Currently radeonsi synchronizes after every dispatch and Clover
does nothing to synchronize. This is overzealous, especially with
GL compute, so add a barrier for global buffers.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The value 0 for unknown has been chosen to so that
drivers using tgsi_scan_shader do not need to detect
missing properties if they zero-initialize the struct.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
It used to be in nir_gather_info.c until I moved it out to nir.h so it
could be re-used with some linking code that never got merged. We'll move
it back out if and when we have real code to share it with.
Compute support on GK110 is still unstable for weird reasons, but
this can be fixed later as the NVF0_COMPUTE envvar prevent using
compute.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The maximum number of uniform blocks (MAX_COMPUTE_UNIFORM_BLOCKS)
per compute program must be at least 12.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
For Maxwell, the ATOMS instruction can be used to perform atomic
operations on shared memory instead of this load/store lowering pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Make sure to avoid out of bounds access in presence of indirect
array indexing by loading the size from the driver constant buffer.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The grid size is stored as three 32-bits integers in the indirect
buffer but the launch descriptor uses a 32-bits integer for both
griddim_y and griddim_z like this (z << 16) | y. To make it work,
the 16 high bits of griddim_y are overwritten by griddim_z.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reduce likelihood of collision with real buffers by placing the
hole at the top of the 4G area. This fixes some indirect draw+compute
tests with large buffers.
Suggested by Ilia Mirkin.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Uniform buffer objects will be sticked to the driver constant buffer
like buffers because the launch descriptor only allows 8 CBs.
Input kernel parameters for OpenCL are still uploaded to screen->parm
which is bound on c0, but this will be changed later with a new series.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Instead of using the screen->parm buffer object which will be removed,
upload auxiliary constants to uniform_bo to be consistent regarding
what we already do for Fermi.
This breaks surfaces support (for compute only) but this will be
properly re-introduced later for ARB_shader_image_load_store.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Just noticed this in passing.. gl_shader_stage already has tess so this
comment no longer applies.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Mesa demos are no longer part of the main Mesa tree/tarball.
Add Gallium and GLX code to list of major components.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Android Bionic does not support strchrnul() string function,
gallium auxiliary util/u_string.h provides util_strchrnul()
This change avoids the following building error:
external/mesa/src/gallium/drivers/radeonsi/si_shader.c:3863: error:
undefined reference to 'strchrnul'
collect2: error: ld returned 1 exit status
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Set EGL_FRAMEBUFFER_TARGET_ANDROID and EGL_RECORDABLE_ANDROID config
attributes to true for Android. These are required in Marshmallow.
The implementation of EGL_RECORDABLE_ANDROID support has 2 options in
the definition of the extension. Android implements the 2nd option
which is the encoder must support RGB input. The requested input format
is RGB888, so setting the attribute on all the native Android visual
formats should be sufficient.
Similarly, setting EGL_FRAMEBUFFER_TARGET_ANDROID for all configs with
a EGL_NATIVE_VISUAL_ID should be sufficient. Most likely, the HWC should
support the same set of formats the underlying DRM driver supports.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Builds with gallium enabled fail on x86 with linker error:
external/mesa3d/src/mesa/vbo/vbo_exec_array.c:127: error: undefined reference to '_mesa_uint_array_min_max'
The problem is sse_minmax.c is not included in the libmesa_st_mesa
library. Since the SSE4.1 files are needed for both libmesa_st_mesa
and libmesa_dricore, move SSE4.1 files into a separate static library
that can be used by both.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This is an old patch I had around.
Vector selects seem to work well from LLVM 3.3. Using them should
improve code quality, as it might make constant propagation pass more
effective.
Tested lp_test_*
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Needed because not all the built-in variables are marked as system
values, so they still have the mode ir_var_auto. Right now it fixes
raising the warning when gl_GlobalInvocationID and
gl_LocalInvocationIndex are used.
v2: use is_gl_identifier instead of filtering for some names (Ilia
Mirkin)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is the same ext as ARB_draw_buffers_blend (plus some core
functionality that already exists). Add the alias entrypoints.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
According to the Sandybridge PRM's description of the resinfo message,
the .z value returned will be Depth == 0 ? 0 : Depth + 1. The earlier
PRMs have the same table.
This means we return 0 for array textures with a single slice, when
we ought to return 1. Just override it to max(depth, 1).
Fixes 10 dEQP-GLES3.functional tests on Sandybridge:
shaders.texture_functions.texturesize.sampler2darray_fixed_vertex
shaders.texture_functions.texturesize.sampler2darray_fixed_fragment
shaders.texture_functions.texturesize.sampler2darray_float_vertex
shaders.texture_functions.texturesize.sampler2darray_float_fragment
shaders.texture_functions.texturesize.isampler2darray_vertex
shaders.texture_functions.texturesize.isampler2darray_fragment
shaders.texture_functions.texturesize.usampler2darray_vertex
shaders.texture_functions.texturesize.usampler2darray_fragment
shaders.texture_functions.texturesize.sampler2darrayshadow_vertex
shaders.texture_functions.texturesize.sampler2darrayshadow_fragment
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Oddly, this did not affect the shader where I first noticed the pattern.
That particular shader doesn't get its if-statement converted to a bcsel
because there are two assignments in the else-statement. This led to me
submitting https://bugs.freedesktop.org/show_bug.cgi?id=94747.
shader-db results:
Sandy Bridge
total instructions in shared programs: 8467384 -> 8467069 (-0.00%)
instructions in affected programs: 36594 -> 36279 (-0.86%)
helped: 46
HURT: 0
total cycles in shared programs: 117573448 -> 117568518 (-0.00%)
cycles in affected programs: 339114 -> 334184 (-1.45%)
helped: 46
HURT: 0
Ivy Bridge / Haswell / Broadwell / Skylake:
total instructions in shared programs: 7774258 -> 7773999 (-0.00%)
instructions in affected programs: 30874 -> 30615 (-0.84%)
helped: 46
HURT: 0
total cycles in shared programs: 65739190 -> 65734530 (-0.01%)
cycles in affected programs: 180380 -> 175720 (-2.58%)
helped: 45
HURT: 1
No change on G45 or Ironlake.
I also tried these expressions, but none of them affected any shaders in
shader-db:
(('bcsel', a, 'a@bool', 'b@bool'), ('ior', a, b)),
(('bcsel', a, 'b@bool', False), ('iand', a, b)),
(('bcsel', a, 'b@bool', 'a@bool'), ('iand', a, b)),
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The number of channels must be 4 for all RGBA components.
Fixes: 22d129601 ("tgsi: add support for image operations to tgsi_exec. (v2.1)")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
It was kind of overloaded, returning two different things. Now get
the index of the shadow reference src register with a new
tgsi_util_get_shadow_ref_src_index() function.
To verify the new code, I added some temp/debug code which looped
over all TGSI_TEXTURE_x values, calling the old function and new and
checking that the returned indexes matched.
Also tested piglit "shadow" tests with softpipe/llvmpipe.
No testing of ilo and radeonsi changes.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Should fix the assertion in piglit
spec@arb_gpu_shader5@texturegather@fs-r-none-shadow-2d when the
TXQ instruction specifies a 2D target but the sampler view was
declared as SHADOW2D.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
This fixes a null pointer dereference during the register allocation pass,
if a function had arguments.
Functions arguments get a definition from the function itself, a definition
which is therefore not linked to any instruction. If a value ends up having
a definition but no linked instruction, the register allocation pass doesn't
need to consider whether that value is generated by an instruction that
can only handle "short" registers (on nv50).
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
The extension is identical to GL_OES_copy_image. But dEQP has tests that
want the EXT variant.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
We require the full ARB_gpu_shader5 for now, but in the future some
other CAP could get exposed to indicate that only the multisample-related
behavior of ARB_gpu_shader5 is available.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Instead of removing every instruction in add_insts_from_block(), just
move the instruction to its scheduled location. This is a step towards
doing both bottom-up and top-down scheduling without conflicts.
Note that this patch changes cycle counts for programs because it begins
including control flow instructions in the estimates.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
I think when this code was written, basic blocks were always ended by a
control flow instruction or an end-of-thread message. That's no longer
the case, and removing this restriction actually helps things:
instructions in affected programs: 7267 -> 7244 (-0.32%)
helped: 4
total cycles in shared programs: 66559580 -> 66431900 (-0.19%)
cycles in affected programs: 28310152 -> 28182472 (-0.45%)
helped: 9577
HURT: 879
GAINED: 2
The addition of the is_control_flow() checks is not a functional change,
since the add_insts_from_block() does not put them in the list of
instructions to schedule. I plan to change this in a later patch.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Missing this causes an assertion failure in the scheduler with the next
patch.
Additionally, this gives cmod propagation enough information to optimize
code better.
total instructions in shared programs: 7112991 -> 7112852 (-0.00%)
instructions in affected programs: 25704 -> 25565 (-0.54%)
helped: 139
total cycles in shared programs: 64812898 -> 64810674 (-0.00%)
cycles in affected programs: 127224 -> 125000 (-1.75%)
helped: 139
Acked-by: Francisco Jerez <currojerez@riseup.net>
This reverts commit d0e1d6b7e2.
The change in the vec4 code is a mistake -- there's never an
FS_OPCODE_FB_WRITE in vec4 code.
The change in the fs code had the (harmless) effect of not recognizing
an FB_WRITE as a scheduling barrier even if it was marked EOT --
harmless because the scheduler marked the last instruction of a block as
a barrier, something I'm changing in the following patches.
This will be reimplemented later in the series.
All of these were simply code for "architecture register file" (and in
the case of destinations, "not the null register").
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
These printed the cycle count the last basic block (sched.time is set
per basic block!). We have accurate, full program, data printed
elsewhere.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This enables in shader defined transform feedback mode even if the
only place xfb_stride is defined is on the global out.
We don't worry about xfb_buffer since Issue 22 c) in the spec says:
"If the shader has an "xfb_buffer" qualifier identifying a buffer,
but doesn't declare "xfb_offset" on anything associated with it,
what happens?
...
variables not qualified with "xfb_offset" are not captured, which
makes the associated "xfb_buffer" qualifier irrelevant."
Reviewed-by: Dave Airlie <airlied@redhat.com>
When we move to the next buffer we need to reset the stream
so that we don't generate an error message about streams not
matching.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This moves the check until after we have done the stride
calculation and applies it to the xfb_* qualifiers.
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_enhanced_layous spec:
"It is a compile-time or link-time error to have any *xfb_offset*
that overflows *xfb_stride*, whether stated on declarations before
or after the *xfb_stride*, or in different compilation units.
...
When no *xfb_stride* is specified for a buffer, the stride of a
buffer will be the smallest needed to hold the variable placed at
the highest offset, including any required padding."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Here we use the built-in validation in
ast_layout_expression::process_qualifier_constant() to check for mismatching
global out strides on buffers in a single shader.
From the ARB_enhanced_layouts spec:
"While *xfb_stride* can be declared multiple times for the same buffer,
it is a compile-time or link-time error to have different values
specified for the stride for the same buffer."
For intrastage validation a new helper link_xfb_stride_layout_qualifiers()
is created. We also take this opportunity to make sure stride is at least
a multiple of 4, we will validate doubles at a later stage.
From the ARB_enhanced_layouts spec:
"If the buffer is capturing any double-typed outputs, the stride must
be a multiple of 8, otherwise it must be a multiple of 4, or a
compile-time or link-time error results."
Finally we update store_tfeedback_info() to apply the strides to
LinkedTransformFeedback and update the buffers bitmask to mark any global
buffers with a stride as active. For example a shader with:
layout (xfb_buffer = 0, xfb_offset = 0) out vec4 gs_fs;
layout (xfb_buffer = 1, xfb_stride = 64) out;
Is expected to have a buffer bound to both 0 and 1.
From the ARB_enhanced_layouts spec:
"A binding point requires a bound buffer object if and only if its
associated stride in the program object used for transform feedback
primitive capture is non-zero."
Reviewed-by: Dave Airlie <airlied@redhat.com>
This will be used in a following patch to implement interface
query support for TRANSFORM_FEEDBACK_BUFFER.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This allows us to print the correct binding point when not all
buffers declared in the shader are bound.
For example if we use a single buffer:
layout(xfb_buffer=2, offset=0) out vec4 v;
We now print '2' when the buffer is not bound rather than '0'.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This adds the initial infrastructure for enabling transform feedback
mode via in shader qualifiers and adds initial buffer support.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This function checks for any xfb_* qualifiers which will enable
transform feedback mode and cause any API defined xfb varyings
to be ignored.
It also counts the number of varyings that have a xfb_offset
qualifier and finally it calls the create_xfb_varying_names()
helper to generate the names of varyings to be caputured.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This will be used to get a count of the number of varying name
strings we are required to generate for use with the query api.
Reviewed-by: Dave Airlie <airlied@redhat.com>
When we have an interface block like:
layout (xfb_buffer = 0, xfb_offset = 0) out Block {
vec4 var1;
layout (xfb_stride = 32) vec4 var2;
vec4 var3;
};
We take into account the stride of var2 when calculating the offset
for var3.
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_enhanced_layouts spec:
"The *xfb_stride* qualifier specifies how many bytes are consumed
by each captured vertex. It applies to the transform feedback
buffer for that declaration, whether it is inherited or explicitly
declared. It can be applied to variables, blocks, block members,
or just the qualifier out. If the buffer is capturing any
double-typed outputs, the stride must be a multiple of 8, otherwise
it must be a multiple of 4, or a compile-time or link-time error
results.
...
The resulting stride (implicit or explicit) must be less than or
equal to the implementation-dependent constant
gl_MaxTransformFeedbackInterleavedComponents."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Also copies the qualifier values to GLSL IR.
From the ARB_enhanced_layouts spec:
"The *xfb_buffer* qualifier can be applied to the qualifier out,
to output variables, to output blocks, and to output block
members. Shaders in the transform feedback capturing mode have
an initial global default of
layout(xfb_buffer = 0) out;
This default can be changed by declaring a different buffer with
xfb_buffer on the interface qualifier out. This is the only way
the global default can be changed. When a variable or output block
is declared without an xfb_buffer qualifier, it inherits the global
default buffer. When a variable or output block is declared with an
xfb_buffer qualifier, it has that declared buffer. All members of a
block inherit the block's buffer. A member is allowed to declare
an xfb_buffer, but it must match the buffer inherited from its
block, or a compile-time error results.
The *xfb_buffer* qualifier follows the same conventions, behavior,
defaults, and inheritance rules as the qualifier stream, and the
examples for stream apply here as well. This includes a block's
inheritance of the current global default buffer, a block member's
inheritance of the block's buffer, and the requirement that any
*xfb_buffer* declared on a block member must match the buffer
inherited from the block.
...
It is a compile-time error to specify an *xfb_buffer* that is
greater than the implementation-dependent constant
gl_MaxTransformFeedbackBuffers."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Since any of the xfb_* qualifiers trigger the shader to be in
transform feedback mode we need an extra field to track if
the xfb_buffer on interface members was set explicitly since
xfb_buffer will always have a default value.
Reviewed-by: Dave Airlie <airlied@redhat.com>
These will be used to hold qualifier values for interface and
struct members.
Support is added to the struct/interface constructors to copy these
fields upon creation.
We also update record_compare() to ensure we don't reuse a glsl_type
with the wrong xfb_* qualifier values.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This adds validation for all qualifiers as allowed by the
table in Section 4.4 (Layout Qualifiers) of the GLSL 4.5 spec.
Reviewed-by: Dave Airlie <airlied@redhat.com>
We reuse the existing offset field for holding the xfb_offset
expression but create a new flag as to avoid hitting the rules
for the offset qualifier for UBOs.
xfb_buffer qualifiers require extra processing when merging as
they can be applied to global out defaults. We just apply the
same rules as we do for the stream qualifier as the spec says:
"The *xfb_buffer* qualifier follows the same conventions,
behavior, defaults, and inheritance rules as the qualifier
stream, and the examples for stream apply here as well."
For xfb_stride we push everything into a global out field for
later processing as xfb_stride applies to the entire buffer.
We still need to have a separate field to store per variable
strides because they can still effect implicit offsets
e.g. when applied to block members with implicit offsets.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Firstly this updates the named interface lowering pass to store the
interface without the arrays removed.
Note we need to remove the arrays in the interface/varying matching
code to not regress things but in future this should be fixed
futher as it would seem we currently successfully match interface
blocks with differnt array sizes.
Since we now know if the interface was an array we can reduce the
IR flags from_named_ifc_block_array and from_named_ifc_block_nonarray
to just from_named_ifc_block.
Next rather than having a different code path for named interface
blocks in program_resource_visitor we just make use of the one used
by UBOs this allows us to now handle arrays of arrays correctly.
Finally we add a new param to the recursion function
named_ifc_member this is because we only want to process a single
member at a time. Note that this is also the glsl_struct_field
from the original ifc type before lowering rather than the type
from the lowered variable. This fixes a bug in Mesa where we would
generate the names like WithInstArray[0].g[0][0] when it should be
WithInstArray[0].g[0] for the following interface.
out WithInstArray {
float g[3];
} instArray[2];
Reviewed-by: Dave Airlie <airlied@redhat.com>
It seems expected that both lhs and rhs could be of type error_type
in this code however the TCS case wasn't expecting it.
Fixes segfault in an enhanced layouts GL CTS test.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This adds support for ARB_shader_image_load_store to softpipe.
v2: add RESQ support (Ilia)
v3: constify, cleanup internals, add some comments (Brian).
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds support for passing through images to the
tgsi execution stage.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support for load/store/atomic operations on images
along with image tracking support.
v2: add RESQ support. (Ilia)
v2.1: constify interface (Brian)
split get_image_coord_dim (Brian)
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
ARB_shader_image_load_store adds support for explicit early
depth testing. However we need to make sure we don't overwrite
values using the shader written values in this case.
This fixes early depth testing in softpipe to conform with
those requirements.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a mask of which of the current 2x2 grid are non-helper
invocations. This allows us to mask off the helper invocations
later for the image operations.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
It's called by the inline intel_batchbuffer_begin() function which
itself is used in BEGIN_BATCH. So in sequence of code emitting multiple
packets, we have inlined this ~200 byte function multiple times. Making
it an out-of-line function presumably improved icache usage.
Improves performance of Gl32Batch7 by 3.39898% +/- 0.358674% (n=155) on
Ivybridge.
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Similar to radeonsi linear layout should work for all not compressed
or depth/stencil formats. Fixes issues with VDPAU on r600.
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
The null check of result was the wrong way around. Also, move memset
and dereference of result after the null check.
Reviewed-by: Christian König <christian.koenig@amd.com>
Type of GLSL_SAMPLER_DIM_BUF can be sampler or image.
Spotted while trying to run dEQP tests related to
ARB_shader_image_load_store.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Float suffixes are not allowed in GLSL 1.10 nor GLSL ES 1.00.
Fixes the following piglit tests:
tests/spec/glsl-1.10/compiler/literals/invalid-float-suffix-capital-f.vert
tests/spec/glsl-1.10/compiler/literals/invalid-float-suffix-f.vert`
v2: modify error message
v3: parse the float instead of returning an ERROR_TOK
v4: (by Ken) Change to is_version(120, 300) to avoid breaking ES3
shaders; update commit message accordingly.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81585
Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The SPIR-V spec says geometry shaders are supposed to have one invocation
by default. The execution mode is only required if there are multiple
invocations.
d3d10 state tracker does not encode (valid) target (only offsets are
really used from the texture bits), since that information always comes
from the sview dcl, and not the instruction (note the meaning of target
is actually slightly different between gl and d3d10 in any case, because
d3d10 target does never include shadow bit).
Also move the msaa sampler identification as well - would need to set that
on the sview not sampler, so while this does not fix it make it at least
obvious it won't work with sample instructions.
Since there is no way to create immutable texture buffers in GL ES,
mutable buffer textures are allowed to back images. See issue 7 of the
GL_OES_texture_buffer specification.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The prepare_mipmap_level() wrapper for _mesa_prepare_mipmap_level() is
not needed. It only served to undo the GL_TEXTURE_1D_ARRAY height/depth
change was was made before the call to prepare_mipmap_level()
Said another way, regardless of how the meta code manipulates the height/
depth dims for GL_TEXTURE_1D_ARRAY, the gl_texture_image dimensions are
correctly set up by _mesa_prepare_mipmap_levels().
Tested by plugging _mesa_meta_GenerateMipmap() into the swrast driver
and testing with piglit.
v2 (idr): Early out of the mipmap generation loop with dstImage is NULL.
This can occur for immutable textures that have a limited range of
levels or in the presense of memory allocation failures. Fixes
arb_texture_view-mipgen on Intel platforms.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We assert that fullinst->Instruction.Texture != 0 above so no need to
check it in the conditional. We also have the fullinst->Texture.Texture
value in a local variable, so use it.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Texture sample instructions specify a sampler unit and texture target
such as "1D", "2D", "CUBE", etc. Sampler view declarations also specify
the sampler unit and texture target.
This patch checks that the texture instructions agree with the declarations
and collects the texture target type for each sampler unit.
v2: only compare instruction's texture target to the sampler view declaration
target if the instruction is a TEX instruction, not a SAMPLE instruction.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This adds the glXCreateContextAttribsARB() function for the xlib/swrast
driver. This allows more piglit tests to run with this driver.
For example, without this patch we get:
$ bin/fbo-generatemipmap-1d -auto
piglit: error: waffle_config_choose failed due to WAFFLE_ERROR_UNSUPPORTED_
ON_PLATFORM: GLX_ARB_create_context is required in order to request an OpenGL
version not equal to the default value 1.0
piglit: error: Failed to create waffle_config for OpenGL 2.0 Compatibility Context
piglit: info: Failed to create any GL context
PIGLIT: {"result": "skip" }
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
The whole st_generate_mipmap() function was overly complicated. Now
we just call the new _mesa_prepare_mipmap_levels() function to prepare
the texture mipmap memory, then call the generate function which fills
in the texture images.
This fixes a failed assertion in llvmpipe/softpipe which is hit with the
new piglit generatemipmap-base-change test. Also fixes some device errors
(format mismatches) with the VMware svga driver.
v2: fix a comment typo, per Sinclair
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Simplifies the loops in generate_mipmap_uncompressed() and
generate_mipmap_compressed(). Will be used in the state tracker too.
Could probably be used in the meta code. If so, some additional
clean-ups can be done after that.
v2: use unsigned types instead of GLuint, per Ian
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There is no linear filtering for integer formats, so we should always
be using CLAMP_TO_EDGE mode.
Fixes 46 dEQP cases on Ivybridge (which were likely broken by commit
0faf26e6a0).
This workaround doesn't appear to be necessary on any other hardware;
I haven't found any documentation mentioning errata in this area.
v2: Only apply on Ivybridge/Baytrail to avoid regressing GLES3.1 tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v1]
This is a tiny housekeeping patch which does the following:
* Replaced tabs with three spaces.
* Formatted oneline and multiline code comments. Some doxygen
comments weren't marked as such and some code comments were marked
as doxygen comments.
* Spaces between if- and while-statements and their parenthesis.
According to the mesa coding style guidelines.
Reviewed-by: Brian Paul <brianp@vmware.com>
With commit dc9ecf58c0,
we are now getting the sampler target from the sampler view
declaration. But since a sampler view declaration can be defined
after a sampler declaration, we need to emit the
sampler declarations in the pre-helpers function, otherwise,
the sampler target might not have defined yet for the sampler declaration.
Fixes viewperf maya-03 and various gl trace regressions in hwv11.
Reviewed-by: Brian Paul <brianp@vmware.com>
svga_shader_expand() will fall back to using non-malloced memory for
emit.buf if malloc fails. We should check if the memory is malloced
before freeing it in the error path of svga_tgsi_vgpu9_translate.
Original patch by Thomas Hindoe Paaboel Andersen <phomes@gmail.com>.
Remove trivial svga_destroy_shader_emitter() function, by BrianP.
Signed-off-by: Brian Paul <brianp@vmware.com>
Avoid using internal structures from another API.
v2: rebase and moved includes so they don't cause problem when VDPAU isn't installed.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Leo Liu <leo.liu@amd.com>
That should allow us to get away from passing internal structures around.
v2: rebased
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
We are going to need that in the Mesa state tracker as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Use DMA-buf for the VDPAU interop interface instead of using
internal structures.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Works around a bug in radeonsi and tiling is actually
not very beneficial in this use case.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Linear layout should work for all not compressed or depth/stencil formats.
v2: restrict it a bit more
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
OES_texture_buffer combines bits from a number of desktop extensions.
When they're all available, turn it on.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
A modest size savings:
text data bss dec hex filename
264143 15608 232 279983 445af libglx.so.before
254303 15608 232 270143 41f3f libglx.so.after
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Useful to know if a expression is the recipient of an assignment
or not, that would be used to (for example) raise warnings of
"use of uninitialized variable" without getting a false positive
when assigning first a variable.
By default the value is false, and it is assigned to true on
the following cases:
* The lhs assignments subexpression
* At ast_array_index, on the array itself.
* While handling the method on an array, to avoid the warning
calling array.length
* When computed the cached test expression at test_to_hir, to
avoid a duplicate warning on the test expression of a switch.
set_is_lhs setter is added, because in some cases (like ast_field_selection)
the value need to be propagated on the expression tree. To avoid doing the
propatagion if not needed, it skips if no primary_expression.identifier is
available.
v2: use a new bool on ast_expression, instead of a new parameter
on ast_expression::hir (Timothy Arceri)
v3: fix style and some typos on comments, initialize is_lhs default value
on constructor, to avoid a c++11 feature (Ian Romanick)
v4: some tweaks on comments (Timothy Arceri)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94129
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Previously, the pass assumed that the entrypoint would be whatever function
happened to have the name "main". We really shouldn't trust in the
function names.
Reviewed-by: Rob Clark <robdclark@gmail.com>
We need to add a new bit since the GL ES exts require functionality from
a combination of texture buffer extensions as well as images (for
imageBuffer) support. Additionally, not all GPUs support all the texture
buffer functionality (e.g. rgb32 isn't supported by nv50).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This fixes all failures with dEQP tests in this area. While
ARB_texture_buffer_object explicitly says that GetTexLevelParameter & co
should not be supported, GL 3.1 reverses this decision and allows all of
these queries there.
Conversely, there is no text that forbids the buffer-specific queries
from being used with non-buffer images.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
The only place this was used was in a gallium debug function that
had to be manually enabled.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
To not overwrite buffers and surfaces information, we need to use
a different offset in the driver constant buffer. Currently, OP_SUQ
is only supported for buffers but this will be slightly updated for
images support.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Yuanhan Liu decided these were useful for linear filtering in
commit 76669381 (circa 2011). Prior to that, we never set them;
it seems he tried to preserve that behavior for nearest filtering.
It turns out they're useful for nearest filtering, too: setting
these fixes the following dEQP-GLES3 tests:
functional.fbo.blit.rect.nearest_consistency_mag
functional.fbo.blit.rect.nearest_consistency_mag_reverse_src_x
functional.fbo.blit.rect.nearest_consistency_mag_reverse_src_y
functional.fbo.blit.rect.nearest_consistency_mag_reverse_dst_x
functional.fbo.blit.rect.nearest_consistency_mag_reverse_dst_y
functional.fbo.blit.rect.nearest_consistency_mag_reverse_src_dst_x
functional.fbo.blit.rect.nearest_consistency_mag_reverse_src_dst_y
functional.fbo.blit.rect.nearest_consistency_min
functional.fbo.blit.rect.nearest_consistency_min_reverse_src_x
functional.fbo.blit.rect.nearest_consistency_min_reverse_src_y
functional.fbo.blit.rect.nearest_consistency_min_reverse_dst_x
functional.fbo.blit.rect.nearest_consistency_min_reverse_dst_y
functional.fbo.blit.rect.nearest_consistency_min_reverse_src_dst_x
functional.fbo.blit.rect.nearest_consistency_min_reverse_src_dst_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_src_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_src_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_dst_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_dst_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_src_dst_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_mag_reverse_src_dst_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_src_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_src_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_dst_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_dst_y
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_src_dst_x
functional.fbo.blit.rect.nearest_consistency_out_of_bounds_min_reverse_src_dst_y
Apparently, BLORP has always set these bits unconditionally.
However, setting them unconditionally appears to regress tests using
texture projection, 3D samplers, integer formats, and vertex shaders,
all in combination, such as:
functional.shaders.texture_functions.textureprojlod.isampler3d_vertex
Setting them on Gen4-5 appears to regress Piglit's
tests/spec/arb_sampler_objects/framebufferblit.
Honestly, it looks like the real problem here is a lack of precision.
I'm just hacking around problems here (as embarassing as it is).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
When using seamless cube map mode and NEAREST filtering, we explicitly
overrode the wrap modes to CLAMP_TO_EDGE. This was to implement the
following spec text:
"If NEAREST filtering is done within a miplevel, always apply apply
wrap mode CLAMP_TO_EDGE."
However, textureGather() ignores the sampler's filtering mode, and
instead returns the four pixels that would be blended by LINEAR
filtering. This implies that we should do proper seamless filtering,
and include pixels from adjacent cube faces.
It turns out that we can simply delete the NEAREST -> CLAMP_TO_EDGE
overrides. Normal cube map sampling works by first selecting the
face, and then nearest filtering fetches the closest texel. If the
nearest texel was on a different face, then that face would have been
chosen. So it should always be within the face anyway, which
effectively performs CLAMP_TO_EDGE.
Fixes 86 dEQP-GLES31.texture.gather.basic.cube.* tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Suggested-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Our driver uses the brw_render_cache mechanism to track buffers we've
rendered to and are about to sample from.
Previously, we did a single PIPE_CONTROL with the following bits set:
- Render Target Flush
- Depth Cache Flush
- Texture Cache Invalidate
- VF Cache Invalidate
- Instruction Cache Invalidate
- CS Stall
This combined both "top of pipe" invalidations and "bottom of pipe"
flushes, which isn't how the hardware is intended to be programmed.
The "top of pipe" invalidations may happen right away, without any
guarantees that rendering using those caches has completed. That
rendering may continue altering the caches. The "bottom of pipe"
flushes do wait for the rendering to complete. The CS stall also
prevents further work from happening until data is flushed out.
What we wanted to do was wait for rendering complete, flush the new
data out of the render and depth caches, wait, then invalidate any
stale data in read-only caches. We can accomplish this by doing the
"bottom of pipe" flushes with a CS stall, then the "top of pipe"
flushes as a second PIPE_CONTROL. The flushes will wait until the
rendering is complete, and the CS stall will prevent the second
PIPE_CONTROL with the invalidations from executing until the first
is done.
Fixes dEQP-GLES3.functional.texture.specification.teximage2d_pbo
subtests on Braswell and Skylake. These tests hit the meta PBO
texture upload path, which binds the PBO as a texture and samples
from it, while rendering to the destination texture. The tests
then sample from the texture.
For now, we leave Gen4-5 alone. It probably needs work too, but
apparently it hasn't even been setting the (G45+) TC invalidation
bit at all...
v2: Add Sandybridge post-sync non-zero workaround, for safety.
Cc: mesa-stable@lists.freedesktop.org
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
dEQP-GLES31.functional.fbo.no_attachments.* draws a quad with no
framebuffer attachments, using a shader that discards based on
gl_FragCoord. It uses occlusion queries to inspect whether pixels
are rendered or not.
Unfortunately, the hardware is not dispatching any pixel shaders,
so discards never happen, and the full quad of pixels increments
PS_DEPTH_COUNT, making the occlusion query results bogus.
To understand why, we have to delve into the WM_INT internal
signalling mechanism's formulas.
The "WM_INT::Pixel Shader Kill Pixel" signal is defined as:
3DSTATE_WM::ForceKillPixel == ON ||
(3DSTATE_WM::ForceKillPixel != Off &&
!WM_INT::WM_HZ_OP &&
3DSTATE_WM::EDSC_Mode != PREPS &&
(WM_INT::Depth Write Enable || WM_INT::Stencil Write Enable) &&
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(3DSTATE_PS_EXTRA::PixelShaderKillsPixels ||
3DSTATE_PS_EXTRA:: oMask Present to RenderTarget ||
3DSTATE_PS_BLEND::AlphaToCoverageEnable ||
3DSTATE_PS_BLEND::AlphaTestEnable ||
3DSTATE_WM_CHROMAKEY::ChromaKeyKillEnable))
Because there is no depth or stencil buffer, writes to those buffers
are disabled. So the highlighted condition is false, making the whole
"Kill Pixel" condition false. This then feeds into the following
"WM_INT::ThreadDispatchEnable" condition:
3DSTATE_WM::ForceThreadDispatch != OFF &&
!WM_INT::WM_HZ_OP &&
3DSTATE_PS_EXTRA::PixelShaderValid &&
(3DSTATE_PS_EXTRA::PixelShaderHasUAV ||
WM_INT::Pixel Shader Kill Pixel ||
WM_INT::RTIndependentRasterizationEnable ||
(!3DSTATE_PS_EXTRA::PixelShaderDoesNotWriteRT &&
3DSTATE_PS_BLEND::HasWriteableRT) ||
(WM_INT::Pixel Shader Computed Depth Mode != PSCDEPTH_OFF &&
(WM_INT::Depth Test Enable || WM_INT::Depth Write Enable)) ||
(3DSTATE_PS_EXTRA::Computed Stencil && WM_INT::Stencil Test Enable) ||
(3DSTATE_WM::EDSC_Mode == 1 && (WM_INT::Depth Test Enable ||
WM_INT::Depth Write Enable ||
WM_INT::Stencil Test Enable)))
Given that there's no depth/stencil testing, no writeable render target,
and the hardware thinks kill pixel doesn't happen, all of these
conditions are false. We have to whack some bit to make PS invocations
happen. There are many options.
Curro suggested using the UAV bit. There's some precedence in doing
that - we set it for fragment shaders that do SSBO/image/atomic writes
when no color buffer writes are enabled. We can simply include discard
here too.
Fixes 64 dEQP-GLES31.functional.fbo.no_attachments.* tests.
v2: Add a comment suggested and written by Jason Ekstrand.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In the first pass of implementing exact handling, I made a mistake with
search-and-replace. In particular, we only reallly handled exact/inexact
on the root of the tree. Instead, we need to check every node in the tree
for an exact/inexact match. As an example of this, consider the following
GLSL code
precise float a = b + c;
if (a < 0) {
do_stuff();
}
In that case, only the add will be declared "exact" and an expression that
looks for "b + c < 0" will still match and replace it with "b < -c" which
may yield different results. The solution is to simply bail if any of the
values are exact when matching an inexact expression.
Found with grep and inspection. Test compiled on RPi hw.
Assists any future effort to remove TGSI as an intermediate stage.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Meaning, always rebuild them when asked instead of bothering to look at
timestamps (and then wondering why nothing happened when you said make).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Broken by:
commit 9ace0b5422
Author: Dylan Baker <baker.dylan.c@gmail.com>
Date: Wed May 20 15:49:11 2015 -0700
glapi: glX_proto_size.py: use argparse instead of getopt
Which changed most, but not all, callers to use --header-tag instead of
-h.
Reviewed-by: Dylan Baker <baker.dylan.c@gmail.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
We disable the vertex attributes, but also disable the VBO fetch details
as well, just in case. Not known to fix anything.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Back in the dawn of time, we used to do immediate uploads for the vertex
data, and all was well. However Maxwell dropped support for immediate
vertex data, so we started feeding in a VBO (in all cases). But we
forgot to disable some things that apply in such cases, specifically
primitive restart and index bias. The latter was causing WoW and other
Blizzard games trouble as they use a pattern where they draw with a base
vertex (aka index bias), followed by texture uploads (aka blits,
internally).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91526
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <nouveau@karolherbst.de>
On Fermi, there's an argument in front of the coords that combines array
and indirect handle, while on Kepler the array and the indirect handle
are separate (and in front of the coords). We were previously only
accounting for the array bit of it, if there were an indirect access it
wouldn't be counted in the formula.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Apparently there's no post-FS clamping logic, so we have to do this by
hand. The depth will never be outside of the 0..1 range, even on
floating point zeta buffers, so this should be safe.
Fixes dEQP-GLES3.functional.fbo.depth.*clamp.* which tests writing
invalid values on various zeta buffer formats.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The st_texture_object documentation says:
"the number of 1D array layers will be in height0"
We can't minify that.
Spotted by luck. No app is known to hit this issue.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: comment about the purpose of the code
v3: also compare texFormat,
add a perf debug message,
formatting fixes
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This fixes crash when post-processing is enabled in SW:KotOR.
v2: fix const-ness
v3: move assignment into the if() block
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
v2: fix arithmetic for special opcodes,
fix fog state, cleanup
v3: simplify handling of special opcodes,
fix rebinding with different textargets or fog equation,
lots of formatting fixes
v4: adapt to the compile early, fix later architecture,
formatting fixes
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Commit `d4e847ea` introduced a warning about making an
integer from a pointer without a cast, fix it here.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This function differs from the open-coded implementation in that the
ImageView's width is determined by the caller and is not unconditionally
set to match the number of texels within the surface's pitch.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is required to create multiple, horizontally adjacent, max-width
images from one blit2d surface. This is also required for more accurate
width specification of surfaces within a larger surface (which is seen
as the smaller surface's enclosing region).
Note that anv_image_create_info::stride has been unused since commit,
b369389640 .
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This reduces some of the craziness required for handling buffer
blocks. The problem is each shader stage holds its own information
about a block in memory, we were copying that information to a
program wide list but the per stage information remained meaning
when a binding was updated we needed to update all versions of it.
This changes the per stage blocks to instead point to a single
version of the block information in the program list.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Pass pointer to core buckets mgr back to sim layer.
Add support for RDTSC_START/RDTSC_STOP macros in the builder.
Each unique shader now has a unique bucket associated with it,
enabling more detailed reporting at the shader level. Currently
due to some llvm issue with thread local storage, 64bit runs require
single threaded mode.
Use head/tail ring buffer indices for thread synchronization.
1. SwrWaitForIdle loops until ring is empty. (head == tail)
2. GetDrawContext waits until ring is not full. (head - tail) == Ring Size
3. Draw enqueues by incrementing head.
4. Last worker thread to move past a DC dequeues by incrementing tail.
Todo: To reduce contention we can cache the tail in the API thread. For
example, if you know you have 64 free entries in the ring then you don't
need to keep checking the tail until you used those 64 entries.
The programming of the L3 Cache registers should match the previous
manually packed LRI values.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
From the ES 3.2 spec, section 16.1.1 (Selecting Buffers for Reading):
"An INVALID_ENUM error is generated if src is not BACK or one of
the values from table 15.5."
Table 15.5 contains NONE and COLOR_ATTACHMENTi.
Mesa properly returned INVALID_ENUM for unknown enums, but it decided
what was known by using read_buffer_enum_to_index, which handles all
enums in every API. So enums that were valid in GL were making it
past the "valid enum" check. Such targets would then be classified
as unsupported, and we'd raise INVALID_OPERATION, but that's technically
the wrong error code.
Fixes dEQP-GLES31's
functional.debug.negative_coverage.get_error.buffer.read_buffer
v2: Only call read_buffer_enuM_to_index when required (Eduardo).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Prepare Image extents and offsets for internal consumption by assigning
the default values implicitly defned by the spec. Fixes textures on
several Vulkan demos in which the VkImageCopy depth is set to zero when
copying a 2D image.
v2 (Jason Ekstrand):
Replace "prep" with "sanitize"
Make function static inline
Pass structs instead of pointers
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
This commit adds a NIR pass for lowering away returns in functions. If the
return is in a loop, it is lowered to a break. If it is not in a loop,
it's lowered away by moving/deleting code as needed.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This can happen if a function ends in a return instruction and you remove
the return.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The efficiency should be approximately the same. We do a little more work
per phi node because we have to sort the predecessors. However, we no
longer have to walk the blocks a second time to pop things off the stack.
The bigger advantage, however, is that we can now re-use the phi placement
and per-block SSA value tracking in other passes.
As a side-benifit, the phi builder actually handles unreachable blocks
correctly. The original vars_to_ssa code, because of the way it iterated
the blocks and added phi sources, didn't add sources corresponding to
predecessors of unreachable blocks. The new strategy employed by the phi
builder creates a phi source for each predecessor and should correctly
handle unreachable blocks by setting those sources to SSA undefs.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, nir_dominance.c didn't properly handle unreachable blocks.
This can happen if, for instance, you have something like this:
loop {
if (...) {
break;
} else {
break;
}
}
In this case, the block right after the if statement will be unreachable.
This commit makes two changes to handle this. First, it removes an assert
and allows block->imm_dom to be null if the block is unreachable. Second,
it properly skips unreachable blocks in calc_dom_frontier_cb.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Right now, we have phi placement code in two places and there are other
places where it would be nice to be able to do this analysis. Instead of
repeating it all over the place, this commit adds a helper for placing all
of the needed phi nodes for a value.
v2: Add better documentation
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
radeonsi uses this property to make the best decision about which
shader to compile, but this is not currently used by our codegen.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The KHR_debug spec doesn't actually say we should handle this, but that
is most likely an oversight - it says to check against strlen and
generate errors if length is negative. It appears they just forgot to
explicitly spell out that we should then proceed to actually handle it.
Fixes crashes from uncaught std::string exceptions in many
dEQP-GLES31.functional.debug.error_filters.* tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
From the KHR_debug spec, section 5.5.5 (Externally Generated Messages):
"If <length> is negative, it is implied that <buf> contains a null
terminated string. The error INVALID_VALUE will be generated if the
number of characters in <buf>, excluding the null terminator when
<length> is negative, is not less than the value of
MAX_DEBUG_MESSAGE_LENGTH."
This indicates that length should be set to strlen for all types, not
just GL_DEBUG_TYPE_MARKER. We want it to be after validate_length()
so we still generate appropriate errors.
Fixes crashes from uncaught std::string exceptions in many
dEQP-GLES31.functional.debug.error_filters.* tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
From the KHR_debug spec:
"Applications can query the number of messages currently in the log by
obtaining the value of DEBUG_LOGGED_MESSAGES, and the string length
(including its null terminator) of the oldest message in the log
through the value of DEBUG_NEXT_LOGGED_MESSAGE_LENGTH."
Because we weren't including the null terminator, many dEQP tests
called glGetDebugMessageLog with a bufSize parameter that was 1 too
small, and unable to contain the message, so we skipped returning it,
failing many cases.
Fixes 298 dEQP-GLES31.functional.debug.* tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Stephane Marchesin <stephane.marchesin@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
The following Coverity warning
5378 tmpl.fetch_args = atomic_fetch_args;
5379 tmpl.emit = atomic_emit;
>>> CID 1357115: Uninitialized variables (UNINIT)
>>> Using uninitialized value "tmpl". Field "tmpl.intr_name" is uninitialized.
5380 bld_base->op_actions[TGSI_OPCODE_ATOMUADD] = tmpl;
5381 bld_base->op_actions[TGSI_OPCODE_ATOMUADD].intr_name = "add";
... is a false positive, but what the hell. This change should "fix" it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In many places, the convention is to pass an existing ssadef name ptr
when construction/initializing a new nir_ssa_def. But that goes badly
(as noticed by garbage in nir_print output) when the original string
gets freed.
Just use ralloc_strdup() instead, and add ralloc_free() in the two
places that would care (not that the strings wouldn't eventually get
freed anyways).
Also fixup the nir_search code which was directly setting ssadef->name
to use the parent instruction as memctx.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Many of our optimizations, while great for cutting shaders down to size,
aren't really precision-safe. This commit tries to flag all of the
inexact floating-point optimizations so they don't get run on values that
are flagged "exact". It's a bit conservative and maybe flags some safe
optimizations as unsafe but that's better than missing one.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The previous transformation got the arguments to fmin backwards. When NaNs
are involved, the GLSL min/max aren't commutative so it matters.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The fxor opcode is required to return 1.0f or 0.0f but the input variable
may not be 1.0f or 0.0f.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Allow the sequence operator to be a constant expression in GLSL ES
versions prior to GLSL ES 3.0
Fixes the following piglit test:
/all/spec/glsl-es-1.0/compiler/array-sized-by-sequence-in-parenthesis.vert
This is similar to the logic from process_initializer() which performs
the same check for constant variable initialization with sequence
operators.
v2: Fixed regression pointed out by Eduardo Lima Mitev
Signed-off-by: Lars Hamre <chemecse@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Results are undefined but may not crash. Without this change, out-of-bounds
indexing can lead to VM faults and GPU hangs.
Constant buffers, samplers, and possibly others will eventually need similar
treatment to support GL_ARB_robust_buffer_access_behavior.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
This fixes arb_shader_image_load_store-host-mem-barrier.
v2: flush TC L2 for index buffers on <= CIK (Marek)
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
NIR already has this optimization and it can do much better than the little
peephole in the backend.
No shader-db change on Haswell or Broadwell.
Reviewed-by: Matt Turner <mattst88@gmail.com>
No shader-db changes, but this is symmetric with the previous commit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This also prevented some regressions with other patches in my local
tree.
Broadwell / Skylake
total instructions in shared programs: 8980835 -> 8980833 (-0.00%)
instructions in affected programs: 45 -> 43 (-4.44%)
helped: 1
HURT: 0
total cycles in shared programs: 70077904 -> 70077900 (-0.00%)
cycles in affected programs: 122 -> 118 (-3.28%)
helped: 1
HURT: 0
No changes on earlier platforms.
v2: Modify the comments to look more like a proof.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On Intel platforms that don't set lower_flrp, using bcsel instead of
flrp seems to be a small amount worse. On those platforms, the use of
flrp, bcsel, and multiply of b2f is still an active area of research.
In review, Matt suggested this is because bcsel turns into CMP+SEL, and
because of the flag register we can't schedule instructions well.
shader-db results:
G4X / Ironlake
total instructions in shared programs: 4016538 -> 4012279 (-0.11%)
instructions in affected programs: 161556 -> 157297 (-2.64%)
helped: 1077
HURT: 1
total cycles in shared programs: 84328296 -> 84315862 (-0.01%)
cycles in affected programs: 4174570 -> 4162136 (-0.30%)
helped: 926
HURT: 53
Unsurprisingly, no changes on later platforms.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously we were doing the lowering by hand in vec4_visitor::emit_lrp.
By doing it in NIR, we have the opportunity for NIR to do additional
optimization of the expanded code.
This also enables optimizations added by the next commit.
shader-db results:
G4X / Ironlake
total instructions in shared programs: 4024401 -> 4016538 (-0.20%)
instructions in affected programs: 447686 -> 439823 (-1.76%)
helped: 2623
HURT: 0
total cycles in shared programs: 84375846 -> 84328296 (-0.06%)
cycles in affected programs: 16964960 -> 16917410 (-0.28%)
helped: 2556
HURT: 41
Unsurprisingly, no changes on later platforms.
v2: Formatting and comment changes suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I noticed some heap corruption running virgl tests, and valgrind
helped me to track it down to the following error:
==29272== Invalid write of size 4
==29272== at 0x90283D4: push_loop_stack (brw_eu_emit.c:1307)
==29272== by 0x9029A7D: brw_DO (brw_eu_emit.c:1750)
==29272== by 0x90554B0: fs_generator::generate_code(cfg_t const*, int) (brw_fs_generator.cpp:1999)
==29272== by 0x904491F: brw_compile_fs (brw_fs.cpp:5685)
==29272== by 0x8FC5DC5: brw_codegen_wm_prog (brw_wm.c:137)
==29272== by 0x8FC7663: brw_fs_precompile (brw_wm.c:638)
==29272== by 0x8FA4040: brw_shader_precompile(gl_context*, gl_shader_program*) (brw_link.cpp:51)
==29272== by 0x8FA4A9A: brw_link_shader (brw_link.cpp:260)
==29272== by 0x8DEF751: _mesa_glsl_link_shader (ir_to_mesa.cpp:3006)
==29272== by 0x8C84325: _mesa_link_program (shaderapi.c:1042)
==29272== by 0x8C851D7: _mesa_LinkProgram (shaderapi.c:1515)
==29272== by 0x4E4B8E8: add_shader_program (vrend_renderer.c:880)
==29272== Address 0xf2f3cb0 is 0 bytes after a block of size 112 alloc'd
==29272== at 0x4C2AA98: calloc (vg_replace_malloc.c:711)
==29272== by 0x8ED11F7: ralloc_size (ralloc.c:113)
==29272== by 0x8ED1282: rzalloc_size (ralloc.c:134)
==29272== by 0x8ED14C0: rzalloc_array_size (ralloc.c:196)
==29272== by 0x9019C7B: brw_init_codegen (brw_eu.c:291)
==29272== by 0x904F565: fs_generator::fs_generator(brw_compiler const*, void*, void*, void const*, brw_stage_prog_data*, unsigned int, bool, gl_shader_stage) (brw_fs_generator.cpp:124)
==29272== by 0x9044883: brw_compile_fs (brw_fs.cpp:5675)
==29272== by 0x8FC5DC5: brw_codegen_wm_prog (brw_wm.c:137)
==29272== by 0x8FC7663: brw_fs_precompile (brw_wm.c:638)
==29272== by 0x8FA4040: brw_shader_precompile(gl_context*, gl_shader_program*) (brw_link.cpp:51)
==29272== by 0x8FA4A9A: brw_link_shader (brw_link.cpp:260)
==29272== by 0x8DEF751: _mesa_glsl_link_shader (ir_to_mesa.cpp:3006)
if_depth_in_loop is an array of size p->loop_stack_array_size, and
push_loop_stack() will access if_depth_in_loop[p->loop_stack_depth+1],
thus the condition to grow the array should be
p->loop_stack_array_size <= (p->loop_stack_depth + 1) (it's currently
off by 2...)
This can be reproduced by running the following test with virgl test
server:
LIBGL_ALWAYS_SOFTWARE=y GALLIUM_DRIVER=virpipe bin/shader_runner
./tests/shaders/glsl-fs-unroll-explosion.shader_test -auto
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add code to handle GL_INTERNALFORMAT_PREFERRED.
Add code to deal with GL_RENDERBUFFER being passes into ChooseTextureFormat.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
According to the ES 3.0 and GL 4.4 specifications, glBlitFramebuffer
is supposed to perform sRGB decoding and encoding whenever sRGB formats
are in use. The ES 3.0 specification is completely clear, and has
always stated this.
However, the GL specification has changed behavior in 4.1, 4.2, and
4.4. The original behavior stated that no sRGB encoding should occur.
The 4.4 behavior matches ES 3.0's wording. However, implementing the
new behavior appears to break applications such as Left 4 Dead 2.
This patch changes Meta to apply the ES 3.x rules in ES 3.x, but
leaves OpenGL alone for now, to avoid breaking applications.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Because the rules for sRGB are so insane, we change brw_blorp_miptrees
to take decode_srgb and encode_srgb flags, which control linearization
of the source and destination separately.
This should make it easy to implement whatever crazy combination of
rules people throw at us. For now, it should be equivalent.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
According to the ES 3.0 and GL 4.4 specifications, glBlitFramebuffer
is supposed to perform sRGB decoding and encoding whenever sRGB formats
are in use. The ES 3.0 specification is completely clear, and has
always stated this.
However, the GL specification has changed behavior in 4.1, 4.2, and
4.4. The original behavior stated that no sRGB encoding should occur.
The 4.4 behavior matches ES 3.0's wording. However, implementing the
new behavior appears to break applications such as Left 4 Dead 2.
This patch changes Meta to apply the ES 3.x rules in ES 3.x, but
leaves OpenGL alone for now, to avoid breaking applications.
Meta implements several other functions in terms of BlitFramebuffer,
and many of those explicitly do not perform sRGB encoding. So, this
patch explicitly disables sRGB encoding in those other functions,
preserving the existing (correct) behavior.
If you're from the future and are reading this, hi! Welcome to
the "fun" of debugging sRGB problems! Best of luck!
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This enables ARB_shader_image_load_store and ARB_shader_image_size.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
[allow the same number of images for all shader stages and require LLVM 3.9]
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Prevent loads from being re-ordered or coalesced.
Atomics don't need special handling by definition, and stores don't need
special handling because LLVM is unable to detect dead image or buffer
stores.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: new signature style for buffer intrinsics (offsets)
v3: new signature style for llvm.amdgcn.buffer.load.format (overloaded return)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
Whether DCC is disabled depends on the access flags with which the image
is bound: image_load supports DCC, but store and atomic don't.
v2: remove an unnecessary masking of images->desc.enabled_mask
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Re-order flags in the order in which they appear in the OpenGL spec in the
description of MemoryBarrier().
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The svga winsys modules can use this to send debug messages to the
state tracker and Mesa.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The index buffer handle saved in the hw_state structure could
be invalid after the index buffer is destroyed. Instead of
rebinding the index buffer using the saved index buffer handle,
we will reset the index buffer handle in the hw_state structure
to force resending of the index buffer.
Fixes bug 1593320
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
To ensure stream output target surfaces are available for the draw commands,
we need to rebind the current stream output targets at the first draw in the
command buffer.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Similar to other resources, current index buffer needs to be
rebound at the first draw of the current command buffer to make
sure the buffer is available for the draw command.
Fixes bug 1587263.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When a constant buffer slot is allocated in the upload buffer,
the allocated slot size is always in multiple of 256. But the actual buffer
size might not be in multiple of 256. This causes a gap between
the ending offset of a slot and the starting offset of the next slot.
The gap will prevent the two slots to be updated in a single update command.
In order to maximize the chance of merging the contiguous dirty ranges,
when a slot is to be allocated in the constant upload buffer,
specify a buffer size in multiple of 256.
There is about 10% performance improvement with Lightsmark2008 and
30% with Cinebench R11.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
This patch adds the following HUD queries:
.num-resource-updates -- number of resource update. Commands include
UPDATE_SUBRESOURCE, UPDATE_GB_IMAGE.
.num-buffer-uploads -- number of buffer uploads.
.num-const-buf-updates -- number of set constant buffer.
.num-const-updates -- number of set shader constant.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
To find out how many image readback command is issued.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Previously, we looked at the bound textures (via the pipe_sampler_views)
to determine texture dimensions (1D/2D/3D/etc) and datatype (float vs.
int). But this could fail in out of memory conditions. If we failed to
allocate a texture and didn't create a pipe_sampler_view, we'd default
to using 0 (PIPE_BUFFER) as the texture type. This led to device errors
because of inconsistent shader code.
This change relies on all TGSI shaders having an SVIEW declaration for
each SAMP declaration. The previous patch series does that.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
In June 2015, Rob Clark started updating the tgsi utility code to emit
SVIEW declarations in various shaders (for polygon stipple, blitting,
etc). These patches do the same for the Mesa state tracker.
The VMware driver will use this.
v2: support both TGSI_TEXTURE_2D and _RECT
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Depending on the driver's support for NPOT textures, we might use
a RECT texture instead of 2D texture. We should propogate that info
to the fragment shader's TEX instruction.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Instead of hard-coded 2D tex target in tgsi_transform_tex_2d_inst()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
These magic file-index defines where only ever used in the nouveau code
and that no longer uses them.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
handeLOAD / handleSTORE / handleATOM can only handle TGSI_FILE_BUFFER
and TGSI_FILE_MEMORY. Make things fail explictly when another
register-file is used in these functions.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
Commit c3083c7082 ("nv50/ir: add support for BUFFER accesses") disabled /
commented out some of the old resource handling code, but not all of it.
Effectively all of it is dead already, if we ever enter the old code
paths in handeLOAD / handleSTORE / handleATOM we will get an exception
due to trying to access the now always zero-sized resources vector.
Disable all the dead code.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
Make the store offset handling in CodeEmitterGK110::emitSTORE identical
to the one in CodeEmitterGK110::emitLOAD handling.
This is just a cleanup, it does not cause any functional changes.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Use the dst temp variable which was used in the TGSI_FILE_OUTPUT
case everywhere. This makes the code somewhat easier to reads
and helps avoiding going over 80 chars with upcoming changes.
This also brings the dst handling more in line with the src
handling.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Extend the MEMORY file support to differentiate between global, private
and shared memory, as well as "input" memory.
"MEMORY[x], INPUT" is intended to access OpenCL kernel parameters, a
special memory type is added for this, since the actual storage of these
(e.g. UBO-s) may differ per implementation. The uploading of kernel
parameters is handled by launch_grid, "MEMORY[x], INPUT" allows drivers
to use an access mechanism for parameter reads which matches with the
upload method.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
When support for decl.Atomic and .Shared was added, tgsi_build_declaration
was not updated to propagate these properly.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v2)
tgsi_default_instruction_memory / tgsi_build_instruction_memory were
returning uninitialized memory for tgsi_instruction_memory.Texture and
tgsi_instruction_memory.Format. Note 0 means not set, and thus is a
correct default initializer for these.
Fixes: 3243b6fc97 ("tgsi: add Texture and Format to tgsi_instruction_memory")
Cc: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When we have native integers, these have full precision. Whether they're
low/medium/high isn't piped through the TGSI yet, but eventually those
might have differing precisions. For now they're just 32-bit ints.
Fixes the following dEQP tests:
dEQP-GLES3.functional.state_query.shader.precision_vertex_highp_int
dEQP-GLES3.functional.state_query.shader.precision_fragment_highp_int
which expected highp ints to have full 32-bit precision, not the default
23-bit float precision.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
If a layer parameter is provided, we want to flip it to position 0 (and
combine it with any indirect params). However if the target is not an
array, there is no layer, so we have to shift all of the arguments down
by one to make room for it.
This fixes situations where there were non-coordinate parameters, such
as bias, lod, depth compare, explicit derivatives. Instead of adding a
new parameter at the front for the indirect reference, we would swap one
of those in its place.
Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.uniform.compute.*shadow
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reported-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
We make sure that that image depth matches the level's depth before
copying it into place. However we should only be minifying the first
level's depth for 3d textures - array textures have the same depth for
all levels.
This fixes tests such as
dEQP-GLES3.functional.texture.specification.texsubimage3d_depth.* and I
suspect account for a number of other odd situations I've run into where
level > 0 of array textures was messed up.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
In "manual" derivative mode (always used on nv50 and sometimes on nvc0
but always for cube), the idea is that using the quadop instruction, we
set up the "other" quads to have values such that the derivatives work
out, and then run the texture instruction as if nothing were strange. It
pulls values from the other lanes, and does its magic.
However cube coordinates have to be normalized - one of the 3 coords has
to be 1, to determine which is the major axis, to say which face is
being sampled. We were normalizing the coordinates first, and then
adding the derivatives. This is wrong for two reasons:
- the coordinates got normalized by a scaling factor but the
derivatives didn't
- the result of the addition didn't end up normalized
To resolve this, we flip the logic around to normalize *after* the
per-lane coordinates are set up.
This fixes a bunch of textureGrad cube dEQP tests.
NOTE: nv50 cube arrays with explicit derivatives are still broken, to be
resolved at a later date.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
It was useful for testing and as a prototype for radeonsi bringup,
but it's not used anymore and doesn't support OpenGL 3.3 even.
v2: try to fix OpenCL build
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
Cons:
- it was only integrated in r600g
- it doesn't work with GPUVM
- it records buffer contents at the end of IBs instead of at the beginning,
so the replay isn't exact
- it lacks an IB parser and user-friendliness
A better solution is apitrace in combination with gallium/ddebug, which
has a complete IB parser and can pinpoint hanging CP packets.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This allows compiling the main shader part as ES or LS.
If we get the correct hint, non-separable GLSL shaders no longer have to be
compiled as VS first, followed by LS or ES compiled on demand.
The result is that fewer shaders are compiled by piglit, but it doesn't
improve piglit running time.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Radeonsi needs to know which shader stage will execute after a shader
in order to make the best decision about which shader variant to compile
first.
This is only set for VS and TES, because we don't need it elsewhere.
VS has 3 variants:
- next shader is FS
- next shader is GS
- next shader is TCS
TES has 2 variants:
- next shader is FS
- next shader is GS
Currently, radeonsi always assumes the next shader is FS, which is suboptimal,
since st/mesa always knows which shader is next if the GLSL program is not
a "separate shader".
By default, ureg always sets "next shader is FS".
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Fixes:
dEQP-GLES3.functional.negative_api.state.get_framebuffer_attachment_parameteriv
Apparently, GL_FRAMEBUFFER_ATTACHMENT_OBJECT_NAME is not allowed when
GL_FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is GL_FRAMEBUFFER_DEFAULT, and
is expected to result in a GL_INVALID_ENUM error.
No GL specification actually defines what GL_FRAMEBUFFER_DEFAULT means.
It probably means the window system FBO. It also doesn't mention the
behavior of any queries for that type. Various ARB folks seem fairly
confused about it too. For now, just do something vaguely like what
dEQP expects.
I think we probably need to check the visual bits against 0 for the
attachment, but we haven't been doing that thusfar, and given how
confusingly this is specified, I can't imagine anyone relying on it.
v2: Improve comments, move error condition above the
_mesa_get_fb0_attachment call, add forgotten "return"
(all suggested/caught by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This fix is analogous to commit ff085d014.
This fixes some use-after-free situations in dEQP when an xfb state is
removed, and then a clear is triggered, which only does a partial
validation. It would attempt to read the no-longer-valid buffers,
resulting in crashes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Instead make use of constants to improve readability.
The first 32 bytes of the driver constant buffer are unknown... This
doesn't seem to be used in the codegen part, but if the texBindBase
offset is shifted from 0x20 to 0x00, this breaks the universe for
really weird reasons. This sounds like to be related to textures.
Anyway, name this NVC0_CB_AUX_UNK_INFO and add a todo should be
enough for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
This avoids using magic numbers for the driver constbuf slot which
is always 15 except for compute shaders on gk104+ where the slot 0
is used.
For gk104+, some special compute-related values like the thread
index are uploaded to screen->parm which is currently bound on c0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Pierre Moreau <pierre.morrow@free.fr>
In release build mode only, op may be used uninitialized because
the assertion has been removed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
According to the OpenGL ES 3.2 spec's description of GenerateMipmap:
"An INVALID_OPERATION error is generated if the levelbase array was not
specified with an unsized internal format from table 8.3 or a sized
internal format that is both color-renderable and texture-filterable
according to table 8.10."
Similar text exists in the ES 3.0 specification as well.
Our existing rules are pretty close, but miss a few things. The
OpenGL specification actually doesn't have any text about internal
format checking - our existing code comes from a Khronos bug report.
The ES 3.x spec provides a clearer description.
Fixes dEQP-GLES3.functional.negative_api.texture.generatemipmap and
dEQP-GLES2.functional.negative_api.texture.generatemipmap_zero_level
_array_compressed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
OpenGL ES 3.x contains a table of sized internal formats and their
required properties. In particular, each format is marked as
"Color Renderable" or "Texture Filterable".
This patch introduces two functions that can be used to query the
information from that table.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Wide points and lines are not supposed to be clipped by the viewport.
Rather, they should be rendered, and any fragments outside of the
viewport should be discarded.
The traditional use case for this behavior is rendering moving wide
point particles. When the center of the point approaches the viewport
edge, clipping would make it pop out of view early.
Fixes:
- dEQP-GLES2.functional.clipping.point.wide_point_clip
- dEQP-GLES3.functional.clipping.point.wide_point_clip
- dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_center
- dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_corner
- dEQP-GLES3.functional.clipping.line.wide_line_clip_viewport_center
- dEQP-GLES3.functional.clipping.line.wide_line_clip_viewport_corner
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94453
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94454
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Similar to is_drawing_points().
v2: Account for isoline tessellation output topology.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I need to use this in multiple source files.
v2: Rebase on TES output domain fix.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Thanks to James Legg for finding this!
From the ARB_tessellation_shader spec:
"The number of isolines generated is derived from the first outer
tessellation level; the number of segments in each isoline is
derived from the second outer tessellation level."
According to the PRM, "TF.LineDensity determines # lines" while
"TF.LineDetail determines # segments". Line Density is stored at
DWord 6, while Line Detail is at DWord 7. So, they're not reversed
like they are for triangles and quads.
Fixes Piglit's spec/arb_tessellation_shader/execution/isoline,
and about 24 dEQP isoline tests (with GL_EXT_tessellation_shader
hacked on - it's not normally enabled).
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94524
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Now that we implement tessellation shaders, the TES might be the last
stage enabled. If it's outputting points, then the primitive type
reaching the SF is points. We need to account for this.
Caught by Ilia Mirkin.
v2: Update dirty bit comment above caller (caught by Iago)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reduce the amount of duplicated code by re-using
nv50_program_validate(). While we are at it, change the prototype to
return void. We don't check anymore if the translation fails but
improving the state validation is a long process.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Patch provides a default for a set pbuffer surface size when
EGL_LARGEST_PBUFFER is used by the client. MIN2 macro is moved
to egldefines so that it can be shared.
Fixes following Piglit test:
egl-create-largest-pbuffer-surface
From EGL 1.5 spec:
"Use EGL_LARGEST_PBUFFER to get the largest available pbuffer
when the allocation of the pbuffer would otherwise fail."
Currently there exists no API to query largest available pixmap size
using xlib or xcb so right now this seems most straightforward way to
ensure that we fulfill above API and also we don't attempt to allocate
'too big' pixmap which might succeed on server side but not work in
practice when driver starts to use it as a texture.
v2: add more explanation about the change (Emil)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Removed bound_to_context. We now pick up the context from the screen
instead of the resource itself. The resource could be out-of-date
and point to a pipe that is already freed.
Fixes manywin mesa xdemo.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
The bitcasting which is possible with shader images (and texture views?)
requires that when the user specifies a sized internal format for a
texture, we really allocate that format. To this end:
(1) find_exact_format should ignore sized internal formats and
(2) some of the entries in the mapping table corresponding to sized
internal formats are reordered to use an RGBA format instead of
a BGRA one.
This fixes arb_shader_image_load_store-bitcast in the (work in progress)
ARB_shader_image_load_store implementation for radeonsi.
v2: don't change the mapping of GL_RGB10: the change caused a regression
because it preferred a format with an alpha channel, and GL_RGB10
is not among the supported formats for shader images
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Currently, configure script is forcing 'enable_asm' to be 'no'
whenever cross-compilation is performed on X86 host. This is
based on an assumption that target architecture is different
from host's (i.e. ARM). But there's always a case that we do
cross-compilation for target that is also X86 based just like
host in which same ASM codes will be supported. 'enable_asm'
should not be forced to be "no" anymore in this case.
v2: corrected commit message
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
In GL 4.4+ there is no guarantee that interpolation qualifiers will
match between stages so we cannot safely pack varyings using the
current packing pass in Mesa.
We also disable packing on outerward facing interfaces for SSO
because in ES we need to retain the unpacked varying information
for draw time validation. For desktop GL we could allow packing for
SSO in versions < 4.4 but its just safer not to do so.
We do however enable packing on individual arrays, structs, and
matrices as these are required by the transform feedback code and it
is still safe to do so.
Finally we also enable packing when a varying is only used for
transform feedback and its not a SSO.
This fixes all remaining rendering issues with the dEQP SSO tests,
the only issues remaining with thoses tests are to do with validation.
Note: There is still one remaining SSO bug that this patch doesn't fix.
Their is a chance that VS -> TCS will have mismatching interfaces
because we pack VS output in case its used by transform feedback but
don't pack TCS input for performance reasons. This patch will make the
situation better but doesn't fix it.
V4: fix out of order function params after rebase, make sure packing
still disabled in tess stages. Update comments as to why we disable
packing on SSO.
V3: ES 3.1 *does* require interpolation to match so don't disable
packing there. Rebased on master rather than on enhanced layouts
component packing series.
V2: Make is_varying_packing_safe() a function in the varying_matches
class, fix spelling (Matt) and make sure to remove the outer array
when dealing with Geom and Tess shaders where appropriate.
Lastly fix piglit regression in new piglit test and document the
undefined behaviour it depends on:
arb_separate_shader_objects/execution/vs-gs-linking.shader_test
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This will allow us to choose to ignore the disable which will be
useful for more fine grained control over when to enable or disable
packing.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For older extensions, there is an explanation first and the extension
name in brackets, like that:
Clamping controls (GL_ARB_color_buffer_float)
I inverted that so we have the extension first and then the explanation
in brackets, like that:
GL_ARB_color_buffer_float (Clamping controls)
It will help me later to parse the few extensions that use this syntax:
all drivers that support <GL_extension>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This fixes some exceptions I have to deal with in mesamatrix.net.
The extensions GL_ARB_texture_buffer_object had a comment between "DONE"
and the brackets.
And the extension GL_KHR_robustness (in GL 4.5 and GLES 3.1) was using
"90% done" instead of "in progress". The "90% done" is still here
though, but as an extension comment.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The "Status" column was misaligned in some GL sections.
This is a lot of diffs, but it's only spaces in the end.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Added a small guide on how to read and edit GL3.txt.
I think this would help as much the devs as the users reading this file.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When we replace an expresion we have to compute bitsize information for the
replacement. We do this in two passes to validate that bitsize information
is consistent and correct: first we propagate bitsize from child nodes to
parent, then we do it the other way around, starting from the original's
instruction destination bitsize.
v2 (Iago):
- Always use nir_type_bool32 instead of nir_type_bool when generating
algebraic optimizations. Before we used nir_type_bool32 with constants
and nir_type_bool with variables.
- Fix bool comparisons in nir_search.c to account for bitsized types.
v3 (Sam):
- Unpack the double constant value as unsigned long long (8 bytes) in
nir_algrebraic.py.
v4 (Sam):
- Use helpers to get type size and base type from nir_alu_type.
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
v2: Squash multiple commits addressing the new parameter in different
files so we don't break the build (Iago)
v3: Fix tgsi (Samuel)
v4: Fix nir_clone.c (Samuel)
v5: Fix vc4 and freedreno (Iago)
v6 (Sam)
- Fix build errors in nir_lower_indirect_derefs
- Use helper to get type size from nir_alu_type.
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Some opcodes need explicit bitsizes, and sometimes we need to use the
double version when constant folding.
v2: fix output type for u2f (Iago)
v3: do not change vecN opcodes to be float. The next commit will add
infrastructure to enable 64-bit integer constant folding so this is isn't
really necessary. Also, that created problems with source modifiers in
some cases (Iago)
v4 (Jason):
- do not change bcsel to work in terms of floats
- leave ldexp generic
Squashed changes to handle different bit sizes when constant
folding since otherwise we would break the build.
v2:
- Use the bit-size information from the opcode information if defined (Iago)
- Use helpers to get type size and base type of nir_alu_type enum (Sam)
- Do not fallback to sized types to guess bit-size information. (Jason)
Squashed changes in i965 and gallium/nir drivers to support sized types.
These functions should only see sized types, but we can't make that change
until we make sure that nir uses the sized versions in all the relevant places.
A later commit will address this.
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This really hacky commit adds a bit size to registers and SSA values. It
also adds rules in the validator to validate that they do the right things.
It's still an open question as to whether or not we want a bit_size in
nir_alu_instr or if we just want to let it inherit from the destination.
I'm inclined to just let it inherit from the destination. A similar
question needs to be asked about intrinsics.
v2 (Connor):
- Relax validation: comparisons have explicit destination sizes
and implicit source sizes.
v3 (Sam):
- Use helpers to get size and base types of nir_alu_type enum.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
As nir_alu_type has now embedded the data size, the check for the
instruction's output type (to see if a boolean resolve is required)
should ignore the data size part.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
v2: Fix size/type mask to properly handle 8-bit types.
v3: Add helpers to get the bitsize and base type of a
nir_alu_type enum.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This allows us to first generate atomic operations for shared
variables using these opcodes, and then later we can lower those to
the shared atomics intrinsics with nir_lower_io.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously we were receiving shared variable accesses via a lowered
intrinsic function from glsl. This change allows us to send in
variables instead. For example, when converting from SPIR-V.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This code in brw_set_dest adjusts the execution size of any instruction
with a dst.width < 8. However, we don't want to do this with instructions
operating on doubles, since these will have a width of 4, but still
need an execution size of 8 (for SIMD8). Unfortunately, we can't just check
the size of the operands involved to detect if we are doing an operation on
doubles, because we can have instructions that do operations on double
operands interpreted as UD, operating on any of its 2 32-bit components.
Previous commits have made it so we never emit instructions with a horizontal
width of 4 that don't have the correct execution size set for gen6+, so
we can skip it in this case, avoiding the conflicts with fp64 requirements.
Expanding the same fix to other hardware generations requires many more
changes but since we are not targetting fp64 support on them
wer don't really care for now.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
v2 (Topi):
- No need to set the execsize for the indirect send message,
the next patch will handle that.
- Set the execution size explicitly instead of taking it from
the width of the dst that we set before.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Previously, we gave our internal clear/blit shaders actual GL handles
and stored them in the shader/program hash table. We used ordinary
GL API entrypoints to work with them.
We thought this shouldn't be a problem because GL doesn't allow
applications to invent their own names for shaders or programs.
GL allocates all names via glCreateShader and glCreateProgram.
However, having them in the hash table is a bit risky: if a broken
application guesses the name of our shaders or programs, it could
alter them, potentially screwing up future meta operations.
Also, test cases can observe the programs in the hash table. Running
a single dEQP process that executes the following test list:
dEQP-GLES3.functional.negative_api.buffer.clear
dEQP-GLES3.functional.negative_api.shader.compile_shader
dEQP-GLES3.functional.negative_api.shader.delete_shader
would result in the last two tests breaking. The compile_shader test
calls glCompileShader(9) straight away, and since it hasn't even created
any shaders or programs, it expects to get a GL_INVALID_VALUE error
because there's no such name. However, because the clear test ran
first, it created Meta programs, so an object named "9" did exist.
This patch reworks Meta to work with gl_shader and gl_shader_program
pointers directly. These internal programs have bogus names, and are
never stored in the hash tables, so they're invisible to applications.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94485
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This will allow me to use them directly from Meta, bypassing the
versions that work with GL integer handles.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
In half the callers, we already have a pointer, and don't need
to look it up again. This will also help with upcoming meta work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
In half the callers, we already have a pointer, and don't need
to look it up again. This will also help with upcoming meta work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Now that the field exists in the instruction, we can make discards less
special. As a bonus, that means that we should be able to merge some more
.sf instructions together when we get around to that.
This causes some scheduling changes, as it allows tlb_color_reads to be
delayed past the discard condition setup. Since the tlb_color_read ends
up later, this may mean performance improvements, but I haven't tested.
total instructions in shared programs: 78114 -> 78035 (-0.10%)
instructions in affected programs: 1922 -> 1843 (-4.11%)
total estimated cycles in shared programs: 234318 -> 234329 (0.00%)
estimated cycles in affected programs: 8200 -> 8211 (0.13%)
The register allocator doesn't really do anything about the temp, so it
doesn't seem like it should matter. However, the scheduler would think
that a new def is being created.
This doesn't change anything yet, but it avoids a bunch of regressions in
the next commit.
Since we're using texelFetch with a sampled image, a sampler is no
longer needed. This agrees with the Vulkan Spec section 13.2.4
Descriptor Set Updates:
sampler is a sampler handle, and is used in descriptor updates for types
VK_DESCRIPTOR_TYPE_SAMPLER and VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER
if the binding being updated does not use immutable samplers.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The texelFetch operation requires that the sampled texture coordinates
be unnormalized integers. This will simplify the copy shader for
w-tiled images (stencil buffers).
v2 (Jason):
Use f2i for texel coords
Fix num_components indirectly
Use float inputs for interpolation
Nest tex_pos functions
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* Add fields in meta struct
* Add support in meta init/teardown
* Switch to custom meta_emit_blit2d()
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is cleaner than using glBindAttribLocation().
Not all drivers support the extension, but I don't think those drivers
use GLSL in the first place. Apparently some Meta shaders already use
GL_ARB_explicit_attrib_location, so I think it should be okay.
Honestly, I'm not sure how the old code worked anyway - we bound the
attribute location for "texcoords", while all the shaders capitalized
or spelled it differently.
v2: Convert another instance in brw_meta_fast_clear.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is done instead of copy propagating the VPM reads into the
instructions using them, because VPM reads have to stay in order.
shader-db results:
total instructions in shared programs: 78509 -> 78114 (-0.50%)
instructions in affected programs: 5203 -> 4808 (-7.59%)
total estimated cycles in shared programs: 234670 -> 234318 (-0.15%)
estimated cycles in affected programs: 5345 -> 4993 (-6.59%)
Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Rhys Kidd <rhyskidd@gmail.com>
This file will contain optimization passes for both vpm reads
and writes.
Signed-off-by: Varad Gautam <varadgautam@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Some rasterization code relies (for sse) on the first and third planes
(but not the second for now) being 128bit aligned, and we didn't get that
on 32bit - I mistakenly thought the 64bit number in the struct would get
the thing aligned to 64bit even on 32bit archs.
Stephane Marchesin really figured this out.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
CC: <mesa-stable@lists.freedesktop.org>
The logic was comparing actual ints, not true/false values.
This meant that it was emitting always multiple line segments instead of just
one even if the stipple test had the same result, which looks inefficient, and
the segments also overlapped thus breaking line aa as well.
(In practice, with the no-op default line stipple pattern, for a 10-pixel
long line from 0-9 it was emitting 10 segments, with the individual segments
ranging from 0-1, 0-2, 0-3 and so on.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94193
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
CC: <mesa-stable@lists.freedesktop.org>
All these img filter loops iterate through NUM_CHANNELS, not QUAD_SIZE.
In practice both are of course the same unchangeable value (4), but it
makes the code look a bit confusing. Moreover, some of the functions were
actually given an array of 4 values according to the declaration, yet the
code was addressing values 0/4/8/12 out of it, so fix this by just saying
it's a pointer to floats like the other functions.
While here, also add comment about not quite correct filtering.
There's no actual code difference.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The filt_args->offset wasn't assigned but was always used later leading
to a crash (as far as I can tell, texel offsets don't actually make much
sense with anisotropic filtering, but because there's no explicit setting
if offsets are enabled there the array is always accessed).
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94481
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
CC: <mesa-stable@lists.freedesktop.org>
This allows drivers to make smarter decisions e.g. about whether the image
has to be decompressed.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is required to preserve the image variable's coherent/restrict/volatile
qualifiers in TGSI.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Frontends should have this information readily available, and it simplifies
image LOAD/STORE/ATOM* handling especially with indirect image access.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The enums MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS and
MAX_COMBINED_SHADER_OUTPUT_RESOURCES are equal and should therefore only
appear once.
Noticed while implementing ARB_shader_image_load_store without previously
implementing SSBO.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Should have no functional change. The IP value of an instruction that
reads src_var cannot possibly be after the end of the live interval of
the variable it's reading from, by the definition of live interval.
Might save future readers a momentary WTF while trying to understand
this code.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bug found by the liveness analysis validation pass that will be
introduced in a later commit. The no-op MOV check in
opt_register_coalesce() was removing instructions which makes the
cached liveness analysis calculation inconsistent with the shader IR.
We were failing to set progress to true in that case though, which
means that invalidate_live_intervals() wouldn't necessarily be called
at the end of the function.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bug found by the liveness analysis validation pass that will be
introduced in a later commit. fixup_3src_null_dest() was allocating
registers which makes the cached liveness analysis calculation
incomplete, so it must be invalidated.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bug found by the liveness analysis validation pass that will be
introduced in a later commit. opt_sampler_eot() was allocating
registers and inserting and removing instructions, which makes the
cached liveness analysis calculation inconsistent with the shader IR,
so it must be invalidated.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
After pipe_grid_info.indirect was introduced, clover was not modified
to set it causing it to pass uninitialized memory for it to launch_grid.
This commit fixes this by zero-ing the entire pipe_grid_info struct when
declaring it, to avoid similar problems popping-up in the future.
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
[ Francisco Jerez: Trivial codestyle fix. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This should help the next person working on hardware enabling figure out
where in the Intel PRMs to find the magic platform hardware values.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
When reading the source code, it's useful to indicate that a group of
fields in a struct are related in someway. There were several places
where people tried to group related structure members with the {@
syntax, without realizing they also needed to add the \name syntax in
order to generate correct doxygen html.
There are several files with groupings that look like this:
struct foo {
/**
* Related fields description
* @{
*/
int bar;
char baz;
/** @} */
long qux;
}
However, the doxygen syntax for grouping is:
struct foo {
/**
* \name Related fields description
* @{
*/
int bar;
char baz;
/** @} */
long qux;
}
https://www.stack.nl/~dimitri/doxygen/manual/grouping.html
Without the group name definition, the fields don't get properly
grouped. Instead, the group description is applied to the first field.
Fix the Intel hardware information structure, brw_device_info to
properly group the GPU hardware limitations and hardware quirks fields.
Once you've run `cd doxygen; make clean; make all`,
updated documentation can be found at
mesa/doxygen/i965/structbrw__device__info.html
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Better tracking of resource state and synchronization.
A follow on commit will clean up resource functions into a new
swr_resource.cpp file.
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
From the point it's constructed the CFG contains the only existing
copy of the program IR, and it never becomes invalid. Calling
backend_shader::invalidate_cfg would have destroyed the program
structure irrecoverably -- We weren't calling it at all for a good
reason.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
The following commits broke things by starting to feed us unhandled
extract_u16/extract_u8 opcodes:
commit 905ff86198
Author: Matt Turner <mattst88@gmail.com>
AuthorDate: Wed Feb 3 14:28:31 2016 -0800
Commit: Matt Turner <mattst88@gmail.com>
CommitDate: Fri Mar 4 11:52:34 2016 -0800
nir: Recognize open-coded extract_u16.
commit 76289fbfa8
Author: Matt Turner <mattst88@gmail.com>
AuthorDate: Thu Jan 21 09:09:48 2016 -0800
Commit: Matt Turner <mattst88@gmail.com>
CommitDate: Fri Mar 4 11:52:34 2016 -0800
nir: Recognize open-coded extract_u8.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
First off, st/mesa lowers DSQRT incorrectly (it uses CMP to attempt to
find out whether the input is less than 0). Secondly the current
approach (x * rsq(x)) behaves poorly for x = inf - a NaN is produced
instead of inf.
Instead we switch to the less accurate rcp(rsq(x)) method - this behaves
nicely for all valid inputs. We still don't do this for DSQRT since the
RSQ/RCP ops are *really* inaccurate, and don't even have Newton-Raphson
steps right now. Eventually we should have a separate library function
for DSQRT that does it more precisely (and perhaps move this lowering to
the post-opt phase).
This fixes a number of dEQP precision tests that were expecting better
behavior for infinite inputs.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The idea is that a single triangle will cover the whole area being
drawn, allowing the blit shader to do its work. However the max fb size
is 16384x16384, which means that the triangle we draw needs to be twice
that in order to cover the whole area fully. Increase the size of the
triangle to 32768x32768.
This fixes a number of dEQP tests that were failing because a blit was
involved which would miss some of the resulting texture.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Bitfields where shuffled around for the better on a4xx, so we don't need
any patching on this one. It appears to be something we set entirely in
the gmem code so no conflict between tiling and render state like we had
in a3xx.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Move where we pick dummy FS for binning pass, so the whole driver sees
the same dummy/no-op FS stage.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Since we aren't going to put the function parameters or the return variable
in the list of locals, it won't get a proper declaration. This changes
nir_print to print the type along with each parameter or return variable.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Otherwise, we have a problem when we go to print functions with arguments
because their names get added to the hash table during declaration which
happens after we print the prototype.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
NIR has never been used on IR where we haven't already done function
inlining so this code has been dead from the beginning. Let's just get rid
of it for now. We can always put it back in if we decide to use NIR for
function inlining at some point in the future.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
flush_pipeline_before_pipeline_select adds workarounds required before
switching the pipeline.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
This fixes a "regression" on Haswell and prior caused by merging the gen7
and gen8 flush_state functions. Haswell should still work just fine if
you're on a 4.4 kernel, but we really should make it detect the command
parser version and do something intelligent.
The old DRI3 implementation just used CopyArea instead of present. We
still don't support all the MST fancyness, but it should at least avoid
some copies and allow for.
v2 (Jason Ekstrand):
- Better object cleanup and destruction
- Handle the CONFIGURE_NOTIFY event and return OUT_OF_DATE when needed
- Track dirtyness via IDLE_NOTIFY rather than interating through the
images sequentially
Right now, Vulkan apps can pretty easily DOS the GPU by simply submitting a
lot of batches. This commit makes us wait until the rendering for earlier
frames is comlete before continuing. By waiting 2 frames out, we can still
keep the pipe reasonably full but without taking the entire system down.
This is similar to what the GL driver does today.
This is more consistent with the way the rest of the driver works and
ensures that all structs we pass into the kernel are zero'd out except for
the fields we actually want to fill. We were previously doing then when
building with valgrind to keep valgrind from complaining. However, we need
to start doing this unconditionally as recent kernels have been getting
touchier about this. In particular, as of kernel commit b31e51360e88 from
Chris Wilson, context creation and destroy fail if the padding bits are not
set to 0.
"Braswell" is a Cherryview based *thing*. It unfortunately requires extra
information to determine its marketing name. Unlike all previous products, and
hopefully all future ones, there is no unique 1:1 mapping of PCI device ID to
brand string.
I put up a fight about adding any complexity to our GL renderer string code for
a very long time. However, a wise man made a comment to me that I couldn't argue
with: if a user installs Windows on their hardware, the brand string should be
the same as what we display in Linux. The Windows driver apparently does this
check, so we should too.
Note that I did manage to find a good use for this info anyway in the compute
shader thread counts.
v2: memcpy instead of strncpy, and some minor changes (Matt)
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com
We have better information now, and 28 was not a valid thing to support. 6 EUs
per sublice with 7 threads per EU is the minimum supported config.
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com
The way we are organizing this code, the statically configured max_cs_threads
should always be the minimum value we actually support (ie. are aware of). As a
result, we can fall back to that if we get invalid numbers from the kernel (ie.
when the query succeeds, but the result is lower than expected).
I was originally planning to use an assert, but there is no reason to be so
mean.
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com
With the previous patches, the code can find out the actual number of available
compute threads. It is enabled only for Cherryview since that is the only
platform I know for a fact has shipped devices which can benefit from this. It
seems like other platforms /might/ benefit from this because of fused
configurations which /might/ have shipped. Fallback code is still there.
v2: Some minor adjustments from Matt
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com
Certain products are not uniquely identifiable based on device id alone. The
kernel exports an interface to help deal with this. This patch merely introduces
the consumer of the interface and makes sure nothing breaks.
It is also possible to use these values for programming GPGPU mode, and I plan
to do that as well.
The interface was introduced in libdrm 2.4.60, which is already required, so it
should all be fine.
v2: Some minor changes recommended by Matt
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Clear DCC flags if necessary when binding a new sampler view.
v2: Do not reset DCC flags of bound sampler views.
v3: Check that we have a real texture (Nicolai)
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will allow the nouveau backend to not try and split up ops that are
fused in GLSL.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Since it is all about calling into blitter functions, it makes more
sense here. This change also reduces the size of the interfaces between
.c files.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There is an annoying corner case that I stumbled across while looking into
piglit's arb_shader_image_load_store/execution/load-from-cleared-image.shader_test
(which can be easily adapted to demonstrate the bug without the
ARB_shader_image_load_store extension)
When we bind a texture and then clear it using glClear (by attaching it
to the current framebuffer) for the first time, we allocate a separate
cmask for the texture to do fast clear, but the corresponding bit in
compressed_colortex_mask is not set. Subsequent rendering will use
incorrect data.
Conversely, when a currently bound texture with an existing cmask is
exported leading to that cmask being disabled, the compressed_colortex_mask
bit will remain set, leading to an assertion later on in debug builds.
Since iterating through all contexts and/or remembering where every
texture is bound would be costly, and cmask enable/disable should be
rare, we will maintain a global counter to signal contexts that they
must update their compressed_colortex_masks.
This patch introduces the global counter, and subsequent patches will
do the mask update.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
intel_alloc_private_renderbuffer_storage did:
rb->_BaseFormat = _mesa_base_fbo_format(ctx, internalFormat);
Unfortunately, internalFormat was usually an unsized format (such as
GL_DEPTH_COMPONENT). In OpenGL ES, _mesa_base_fbo_format() refuses to
accept unsized formats, and returns 0 rather than a real base format.
This meant that we ended up with a completely bogus rb->_BaseFormat for
window system buffers on OpenGL ES. All other renderbuffer allocation
functions in intel_fbo.c instead use the mesa_format, and do:
rb->_BaseFormat = _mesa_get_format_base_format(...);
We can do likewise, using rb->Format. This appears to work just fine.
dEQP-GLES3.functional.state_query.fbo.framebuffer_attachment_x_size_initial
failed, as it tried to perform a GL_FRAMEBUFFER_ATTACHMENT_DEPTH_SIZE query
on the window system depth buffer. That query relies on a proper
rb->_BaseFormat being set, so it broke because rb->_BaseFormat was 0 due
to the above bug.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94458
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We were failing to reset our location tracking when encountering a
NEWLINE in the <HASH> state. Rip the code from the <*>{NEWLINE} rule,
which handles this properly.
Also, update 146-version-first-hash.c to have proper expectations.
When I introduced the test, I didn't verify that the line/column
numbers were correct, and it turns out they varied based on the type
of newline ending.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94447
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Because compute support is not enabled by default for these chipsets,
NVF0_COMPUTE=1 needs to be used, along with GALLIUM_HUD to enable
performance counters.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is really verbose but most of the configuration will be reused
for SM35 (GK110).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Before we would always report 16 for both and we would only fail if either
one exceeded 16. Now we fail if the maximum for each is exceeded, even if
it is smaller than 16 and we report the correct maximum.
Also, expand the size of to_assign[] to 32. There is code at the top
of the function handling max_index up to 32, so this just makes the
code more consistent.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
CXX codegen/nv50_ir.lo
In file included from codegen/nv50_ir.cpp:28:
./nouveau_debug.h:19:30: error: invalid suffix on literal; C++11 requires a space between literal and identifier
[-Wreserved-user-defined-literal]
fprintf(stderr, "%s:%d - "fmt, __FUNCTION__, __LINE__, ##args)
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
According to the GL 4.4 core specification, section 2.2.2 ("Data
Conversions For State Query Commands"):
"If a command returning integer data is called, such as GetIntegerv or
GetInteger64v, a boolean value of TRUE or FALSE is interpreted as one
or zero, respectively. A floating-point value is rounded to the nearest
integer, unless the value is an RGBA color component, a DepthRange
value, or a depth buffer clear value. In these cases, the query command
converts the floating-point value to an integer according to the INT
entry of table 18.2; a value not in [−1, 1] converts to an undefined
value."
The INT entry of table 18.2 shows that b = 32, meaning the expectation
is to convert it to a 32-bit integer value.
Fixes:
dEQP-GLES3.functional.state_query.floats.blend_color_getinteger64
dEQP-GLES3.functional.state_query.floats.color_clear_value_getinteger64
dEQP-GLES3.functional.state_query.floats.depth_clear_value_getinteger64
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94456
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The new organization is as follows:
* anv_meta_blit.c: Blit and state setup/teardown commands
* anv_meta_copy.c: Copy and update commands
* anv_meta_blit2d.c: 2D Blitter API commands
Also, change the formatting to contain most lines
within 80 columns.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This can be reverted if the only other consumer, anv_meta_blit2d(),
uses a different method.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
In addition to demystifying the value being added to the height,
this future-proofs the code for new tiling modes and keeps the
image height as small as possible.
v2: Actually use the smallest height possible.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Add one missing extern C guard within include/pipe/p_video_enums.h, and
remove the wrapping throughout gallium.
On Haiku one could even use the gallium debug_printf() although
that's another topic.
v2: Leave dbghelp.h as is (Jose)
Cc: Jose Fonseca <jfonseca@vmware.com>
Cc: Brian Paul <brianp@vmware.com>
Cc: Alexander von Gluck IV <kallisti5@unixzen.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
A commit earlier this year reworked out python scripts to use a separate
file for these. Followed by removing support from the parser, and
removing all of the offset tags.
Seems like we either missed a few, or people added them by mistake.
Either way let's nuke the ones that are still around.
Cc: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The reason is that the shader image atoms call st_finalize_texture, which
may set ST_NEW_FRAMEBUFFER.
This fixes an assertion triggered by a subtest of piglit's
arb_shader_image_load_store-invalid.
v2: add comment explaining order constraints (suggested by Ilia)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This instruction has the resource (buffer or image) as a destination to
represent the writemask for SSBO writes. However, this is obviously not
a "real" destination for the purpose of emitting LLVM IR.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will be queried by the OpenCL stack using an interop call.
I have tested that the values match lspci.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
view->resource is redundant with view->base.texture, so get rid of it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
and rename .._buffers -> .._buffer
Based loosely on Nicolai's patch. This will make it easier to cherry-pick
Nicolai's patches from his image support branch.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is currently not needed but will be necessary when we have
features that do not work with DCC enabled, such as image stores
and sharing non-scanout surfaces.
v2: Marek: rebase, remove decompression from si_flush_resource (not needed)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This was needed for DRM < 2.12.0 where the kernel was rewriting tiling flags
in IBs.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will allow drivers to make better decisions about texture sharing
for DRI2, DRI3, Wayland, and OpenCL.
v2: add read/write flags, take advantage of __DRI_IMAGE_USE_BACKBUFFER
Reviewed-by: Axel Davy <axel.davy@ens.fr>
This applies the rule to empty declarations.
Fixes:
dEQP-GLES3.functional.shaders.arrays.invalid.empty_declaration_without_var_name_vertex
dEQP-GLES3.functional.shaders.arrays.invalid.empty_declaration_without_var_name_fragment
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With this issue 'mpv --hwdec=vdpau --vo=vdpau <stream>' fails
for vdpau decode if the stream height is 4096. Vdpau decode of
height upto 4096 is necessary usecase on amdgpu driver for VI
and newer platforms.
The fix is in driver specific implementation of "Decoder
Query Capabilities" API to return 4096 for VI and newer
platforms. With this fix vdpauinfo reports height support as
4096 and mpv for vdpau decode works fine for 4096 height streams.
Signed-off-by: Tamil velan <Tamil-Velan.Jayakumar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Enlarge the buffer hashlist to prevent large numbers of misses
due to adding more buffers than can be cached in the hashlist.
The game I tested had CS's with up to 1500 buffers and the overhead
of amdgpu_lookup_buffer for various sizes was:
4096 1.97% (new value)
2048 4.37%
1024 6.92%
512 9.47% (old value)
(percentage of CPU usage in render thread as determined by perf)
The time spent in amdgpu_add_buffer self is ~4.2% in all cases and
for 4096 the time needed to clear the hashlist is still < 0.10%,
so I am not expecting significant regressions.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Special-casing the PS_BLEND packet wasn't really gaining us anything. It's
defined to be more-or-less the contents of blend state entry 0 only without
the indirection. We can just copy-and-paste the contents. If there are no
valid color targets, then blend state 0 will be 0-initialized anyway so
it's basically the same as the special case we had before.
Previously, we would always emit all of the render targets in the subpass.
This commit changes it so that we compact render targets just like we do
with other resources. Render targets are represented in the surface map by
using a descriptor set index of UINT16_MAX.
This makes use of the new state validation interface to be consistent
with 3d.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This exposes an interface for state validation that will be also used
to rework the compute validation path.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
If the local workgroup size is sufficiently large, then the SIMD8
program can't be used. In this case we can skip generating the SIMD8
program. For complex programs this can save a significant amount of
time.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For fragment shaders, we can always use a SIMD8 program. Therefore, if
we detect spilling with a SIMD16 program, then it is better to skip
generating a SIMD16 program to only rely on a SIMD8 program.
Unfortunately, this doesn't work for compute shaders. For a compute
shader, we may be required to use SIMD16 if the local workgroup size
is bigger than a certain size. For example, on gen7, if the local
workgroup size is larger than 512, then a SIMD16 program is required.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93840
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Since we store some member qualifiers in the interface type
we need to be more careful about rejecting shaders just because
the pointer doesn't match. Its perfectly valid for some qualifiers
such as precision to not match across shader interfaces.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The ES 3.0+ specifications contain the exact same text as the OpenGL
specification, which says that we should return GL_INVALID_OPERATION.
ES 2.0 contains different text saying we should return GL_INVALID_ENUM.
Previously, Mesa chose the error code based on API (GL vs. ES).
This patch makes ES 3.0+ follow the GL behavior. ES 2 remains as is.
Fixes dEQP-GLES3.functional.fbo.api.attachment_query_empty_fbo.
However, breaks the dEQP-GLES2 variant of the same test for drivers
which silently promote to ES 3.0. This can be worked around by
exporting MESA_GLES_VERSION_OVERRIDE=2.0, but is a bug in dEQP-GLES2.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The dEQP-GLES3.functional.fbo.completeness.renderable.texture.
{color0,depth,stencil}.{red,rg}_unsigned_byte tests appear to expect
GL_RED/GL_RG and GL_UNSIGNED_BYTE to map to GL_R8/GL_RG8, rather than
returning an INVALID_OPERATION error.
This makes perfect sense. However, RED and RG are strangely missing
from the ES 3.0/3.1/3.2 spec's "Effective internal format corresponding
to external format and type" tables. It may be worth filing a spec bug.
Fixes the 6 dEQP tests mentioned above.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This mutex is initialized when the blitter is created, but it is never
destroyed. This doesn't hurt anything but it makes sense to destroy it
at blitter deletion.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This new pass lowers load/store_var intrinsics that act on indirect derefs
to if-ladder of direct load/store_var intrinsics. The if-ladders perform a
simple binary search on the indirect.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Now that brw_vec4_visitor::emit_untyped_atomic was removed, there is no need
to explicitly set it.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
surface_access emit_untyped_read and emit_untyped_atomic provides the same
functionality.
v2: surface parameter of emit_untyped_atomic is a const, no need to
specify default predicate on emit_untyped_atomic, use retype
(Francisco Jerez).
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
If the src is invalid, so src size is zero, the src_sz passed to emit
send should be zero too, instead of a default 1 if we are in a simd4x2
case. This can happens if using emit_untyped_atomic for an atomic
dec/inc.
v2: use the proper src_sz when calling emit_send, instead of just
avoid loading src at emit_send if BAD_FILE (Francisco Jerez)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Apparently this causes a slight difference in the parser's token
expectations, leading to a different error message.
It seems harmless, but I wanted to be cautious and separate it out.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I didn't want to pollute the previous patch with all the $4 -> $3
changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We now have a bigger hammer. The HASH_TOKEN NEWLINE rule still needs
to exist to ensure the 146-version-hash-first.c test still passes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We resolved the implicit version directive when processing control lines,
such as #ifdef, to ensure any built-in macros exist. However, we failed
to resolve it when handling ordinary text.
For example,
int x = __VERSION__;
should resolve __VERSION__ to 110, but since we never resolved the implicit
version, none of the built-in macros exist, so it was left as is.
This also meant we allowed the following shader to slop through:
123
#version 120
Nothing would cause the implicit version to take effect, so when we saw
the #version directive, we thought everything was peachy.
This patch makes the lexer's per-token action resolve the implicit
version on the first non-space/newline/hash token that isn't part of
a #version directive, fulfilling the GLSL language spec:
"The #version directive must occur in a shader before anything else,
except for comments and white space."
Because we emit #version as HASH_TOKEN then VERSION_TOKEN, we have to
allow HASH_TOKEN to slop through as well, so we don't resolve the
implicit version as soon as we see the # character. However, this is
fine, because the parser's HASH_TOKEN NEWLINE rule does resolve the
version, disallowing cases like:
#
#version 120
This patch also adds the above shaders as new glcpp tests.
Fixes dEQP-GLES2.functional.shaders.preprocessor.predefined_macros.
{gl_es_1_vertex,gl_es_1_fragment}.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This reduces the number of allocations a bit and cuts back on memory usage.
Kind-of a micro-optimization but it also makes the error handling a bit
simpler so it seems like a win.
We cast he constant 0xfff values to a uintptr_t before applying a bitwise
negate to ensure that they are actually 64-bit when needed. Also, the
count variable doesn't need to be explicitly cast, it will get upcast as
needed by the "|" operation.
Previously we asserted every time you tried to pack a pointer and a counter
together. However, this wasn't really correct. In the case where you try
to grab the last element of the list, the "next elemnet" value you get may
be bogus if someonoe else got there first. This was leading to assertion
failures even though the allocator would safely fall through to the failure
case below.
In a shader such as:
struct S { float f; }
float identity(float S) { return S; }
we would think that "S" in "return S" referred to a structure, even
though it's shadowed by the "float S" parameter in the inner struct.
This led to the parser's grammar seeing TYPE_IDENTIFIER and getting
confused.
Fixes dEQP-GLES2.functional.shaders.scoping.valid.
function_parameter_hides_struct_type_{vertex,fragment}.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The lexer/parser use a symbol table to classify identifiers as
variables, functions, or structure types.
For some reason, we neglected to add variables in simple declarations
such as
int x = 5;
but did add subsequent variables in multi-declarations:
int x = 5, y = 6; // y gets added, but not x, for some reason
Fixes four dEQP-GLES2.functional.shaders.scoping.valid subcases:
- local_int_variable_hides_struct_type_vertex
- local_int_variable_hides_struct_type_fragment
- local_struct_variable_hides_struct_type_vertex
- local_struct_variable_hides_struct_type_fragment
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
If glGenerateMipmap was called with a bogus target, then it would
pass that to _mesa_get_current_tex_object(), which would raise a
_mesa_problem() telling people to file bugs. We'd then do the
proper error checking, raise an error, and bail.
Doing the check first avoids the _mesa_problem(). The DSA variant
doesn't take a target parameter, so we leave the target validation
exactly as it was in that case.
Fixes one dEQP GLES2 test:
dEQP-GLES2.functional.negative_api.texture.generatemipmap.invalid_target.
v2: Rebase on Antia's recent patch to this area.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
The SHARED TGSI keyword is only allowed with TGSI_FILE_MEMORY and not
with TGSI_FILE_BUFFER. I have found this by using the nouveau_compiler
from command line.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
To know when we're flushing the command buffer because we need to
write to surface in the command buffer.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
In the rather unusual case of Bind + Delete, we need to make sure that
we unbind the current tf object.
Fixes dEQP-GLES3.functional.lifetime.delete_bound.transform_feedback
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This fixes a crash in
dEQP-GLES3.functional.transform_feedback.array_element.separate.points.lowp_mat3x2
and likely others. The vertex shader has > 16 input variables (without
explicit locations), which causes us to index outside of the to_assign
array.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Applications may create a *lot* of fences, perhaps as much as one per
vkQueueSubmit. Really, they're supposed to use ResetFence, but it's easy
enough for us to make them crazy-cheap so we might as well.
This simplifies the code that iterates over the per-component values
found in the matching copy_entry struct and checks whether the
register regions that were copied to each component are similar enough
to be treated as a single (reswizzled) value which can be propagated
into the current instruction.
Aside from being scattered between opt_copy_propagation(),
try_copy_propagate(), and try_constant_propagate(), what I found
terribly confusing about the preexisting logic was that
opt_copy_propagation() tried to reorder the array of values according
to the swizzle of the instruction source, which meant one would have
had to invert the reordering applied at the top level in order to find
out which component to take from each value (we were just taking the
i-th component from the i-th value, which is not correct in general).
The saturate mask was also being swizzled incorrectly.
This consolidates the logic for matching multiple components of a
copy_entry into a single function which returns the result as a
regular src_reg on success, as if the copy had been performed with a
single MOV instruction copying all components of the src_reg into the
destination.
Fixes several ARB_vertex_program MOV test-cases from:
https://cgit.freedesktop.org/~kwg/piglit/log/?h=arb_program
Acked-by: Matt Turner <mattst88@gmail.com>
Scalar immediates used to be handled correctly by swizzle() (as the
identity) but since commit 58fa9d47b5 it
will corrupt the contents of the immediate. Vector immediates were
never handled correctly, but we had ad-hoc code to swizzle VF
immediates in the vec4 copy propagation pass. This takes care of
swizzling V and UV in addition.
v2: Don't implement swizzling of V/UV immediates (Matt). If you need
to swizzle an integer vector immediate in the future apply the
following diff to go back to v1:
--- a/src/mesa/drivers/dri/i965/brw_eu.c
+++ b/src/mesa/drivers/dri/i965/brw_eu.c
@@ -119,11 +119,10 @@ brw_swap_cmod(uint32_t cmod)
static unsigned
imm_shift(enum brw_reg_type type, unsigned i)
{
- assert(type != BRW_REGISTER_TYPE_UV && type != BRW_REGISTER_TYPE_V &&
- "Not implemented.");
-
if (type == BRW_REGISTER_TYPE_VF)
return 8 * (i & 3);
+ else if (type == BRW_REGISTER_TYPE_UV || type == BRW_REGISTER_TYPE_V)
+ return 4 * (i & 7);
else
return 0;
}
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
And replace brw_swizzle1() with brw_swizzle(). Seems slightly cleaner
and will allow reusing brw_swizzle() in the vec4 back-end more easily.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This fixes some use-after-free situations in dEQP when an xfb state is
removed, and then a clear is triggered, which only does a partial
validation. It would attempt to read the no-longer-valid buffers,
resulting in crashes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Between the initial check the returns NO_KERNEL and compiling the
shader, other threads may have added the shader to the cache. Before
uploading the kernel, check again (under the mutex) that the compiled
shader still isn't present.
There is no API for setting the point size and the shader is always
required to set it. Section 24.4:
"If the value written to PointSize is less than or equal to zero, or
if no value was written to PointSize, results are undefined."
As such, we can just always program PointWidthSource to Vertex. This
simplifies anv_pipeline a bit and avoids trouble when we enable the
pipeline cache and don't have writes_point_size in the prog_data.
Using anv_pipeline_cache_upload_kernel() will re-upload the kernel and
prog_data when we merge caches. Since the kernel and prog_data is
already in the program_stream, use anv_pipeline_cache_add_entry()
instead to only add the entry to the hash table.
This function is a helper that unconditionally sets a hash table entry
and expects the cache to have enough room. Calling it 'add_entry'
suggests it will grow the cache as needed.
We can serialize as much as the application asks for and just stop once
we run out of memory. This lets applications use a fixed amount of
space for caching and still get some benefit.
compute.c: In function ‘launch_grid’:
compute.c:435:20: warning: assignment discards ‘const’ qualifier from pointer target type [enabled by default]
info.input = input;
^
Maybe the pipe_grid_info::input field should be const void *?
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This fixes many CTS cases, but will require an update to the kernel
command parser register whitelist. (The CS GPRs and TIMESTAMP
registers need to be whitelisted.)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
From Section 4.4.5 (Uniform and Shader Storage Block Layout
Qualifiers) of the OpenGL 4.50 spec:
"The align qualifier makes the start of each block member have a
minimum byte alignment. It does not affect the internal layout
within each member, which will still follow the std140 or std430
rules. The specified alignment must be a power of 2, or a
compile-time error results.
The actual alignment of a member will be the greater of the
specified align alignment and the standard (e.g., std140) base
alignment for the member's type. The actual offset of a member is
computed as follows: If offset was declared, start with that
offset, otherwise start with the next available offset. If the
resulting offset is not a multiple of the actual alignment,
increase it to the first offset that is a multiple of the actual
alignment. This results in the actual offset the member will have.
When align is applied to an array, it affects only the start of
the array, not the array's internal stride. Both an offset and an
align qualifier can be specified on a declaration.
The align qualifier, when used on a block, has the same effect as
qualifying each member with the same align value as declared on
the block, and gets the same compile-time results and errors as if
this had been done. As described in general earlier, an individual
member can specify its own align, which overrides the block-level
align, but just for that member.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The old comment was for the location not the offset, we now use
the field for block members so mention that also.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
In this patch we also copy the offset value from the ast and
implement offset linking rules by adding it to the record_compare()
function.
From Section 4.4.5 (Uniform and Shader Storage Block Layout Qualifiers)
of the GLSL 4.50 spec:
"Two blocks linked together in the same program with the same block
name must have the exact same set of members qualified with
offset and their integral-constant-expression values must be the
same, or a link-time error results."
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
This implements the rules for the offset qualifier on block members.
From Section 4.4.5 (Uniform and Shader Storage Block Layout Qualifiers)
of the GLSL 4.50 spec:
"The offset qualifier can only be used on block members of blocks
declared with std140 or std430 layouts."
...
"It is a compile-time error to specify an offset that is smaller than
the offset of the previous member in the block or that lies within the
previous member of the block."
...
"The specified offset must be a multiple of the base alignment of the
type of the block member it qualifies, or a compile-time error results."
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Global in validation is already handled, this will do the validation
for variables, blocks and block members.
This fixes some CTS tests for the new enhanced layouts transform
feedback qualifiers.
V2: add some more valid input flags
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Previously interface blocks were giving the global default flags of
uniform blocks. This meant we could not check for invalid qualifiers
on interface blocks because they always contained invalid flags.
This changes parsing so that interface blocks now get an empty
set of layouts.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
If the following patch we will stop setting these layouts by default
on interface blocks, so we need to do this to avoid hitting the
assert.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
When BaseLevel > 0, we magnify the dimensions to fill out the size of
miplevels [0..BaseLevel). In particular, this was magnifying depth,
thinking that the depth doubles at each level. This is perfectly
reasonable for 3D textures, but dead wrong for array textures.
Changing the depth != 1 condition to a target == GL_TEXTURE_3D check
should make this only happen in the appropriate cases.
Fixes about 32 dEQP tests:
- dEQP-GLES31.functional.texture.gather.*.level_{1,2}
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
opt_vector_float() transforms several scalar MOV operations to a single
vectorial MOV.
This is done when those MOV covers all the components of the destination
register. So something like:
mov vgrf3.0.xy:D, 0D
mov vgrf3.0.w:D, 1065353216D
mov vgrf3.0.z:D, 0D
is transformed in:
mov vgrf3.0:F, [0F, 0F, 0F, 1F]
But there are cases where not all the components are written. For
example, in:
mov vgrf2.0.x:D, 1073741824D
mov vgrf3.0.xy:D, 0D
mov vgrf3.0.w:D, 1065353216D
mov vgrf4.0.xy:D, 1065353216D
mov vgrf4.0.w:D, 0D
mov vgrf6.0:UD, u4.xyzw:UD
Nor vgrf3 nor vgrf4 .z components are written, so the optimization is
not applied.
But it could be applied anyway with the components covered, using a
writemask to select the ones written. So we could transform it in:
mov vgrf2.0.x:D, 1073741824D
mov vgrf3.0.xyw:F, [0F, 0F, 0F, 1F]
mov vgrf4.0.xyw:F, [1F, 1F, 0F, 0F]
mov vgrf6.0:UD, u4.xyzw:UD
This commit does precisely that: opportunistically apply
opt_vector_float() when possible.
total instructions in shared programs: 7124660 -> 7114784 (-0.14%)
instructions in affected programs: 443078 -> 433202 (-2.23%)
helped: 4998
HURT: 0
total cycles in shared programs: 64757760 -> 64728016 (-0.05%)
cycles in affected programs: 1401686 -> 1371942 (-2.12%)
helped: 3243
HURT: 38
v2: change vectorize_mov() signature (Matt).
v3: take in account predicates (Juan).
v4 [mattst88]: Update shader-db numbers. Fix some whitespace issues.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
screen may still be used by other resources that are not yet freed.
To correctly fix this there will be a need to account for resources
differently, but this quick fix is not any worse than the original
code that leaked screens anyway.
Reviewed-by: Brian Paul <brianp@vmware.com>
No shader-db changes, but does recognize some extract_u16 which enables
the next patch to optimize some code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack
three bytes from an integer and convert each into a float:
float((val >> 16u) & 0xffu)
float((val >> 8u) & 0xffu)
float((val >> 0u) & 0xffu)
Instead of shifting, masking, and type converting like this:
shr(8) g15<1>UD g25<8,8,1>UD 0x00000010UD
and(8) g16<1>UD g15<8,8,1>UD 0x000000ffUD
mov(8) g17<1>F g16<8,8,1>UD
shr(8) g18<1>UD g25<8,8,1>UD 0x00000008UD
and(8) g19<1>UD g18<8,8,1>UD 0x000000ffUD
mov(8) g20<1>F g19<8,8,1>UD
and(8) g21<1>UD g25<8,8,1>UD 0x000000ffUD
mov(8) g22<1>F g21<8,8,1>UD
i965 can simply extract a byte and convert to float in a single
instruction:
mov(8) g17<1>F g25.2<32,8,4>UB
mov(8) g20<1>F g25.1<32,8,4>UB
mov(8) g22<1>F g25.0<32,8,4>UB
This patch implements the first step: recognizing byte extraction. A
later patch will optimize out the conversion to float.
instructions in affected programs: 28568 -> 27450 (-3.91%)
helped: 7
cycles in affected programs: 210076 -> 203144 (-3.30%)
helped: 7
This patch decreases the number of instructions in the two Unigine
programs by:
#1721: 4520 -> 4374 instructions (-3.23%)
#1706: 3752 -> 3582 instructions (-4.53%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
sample_c is backwards from what GL and Vulkan expect.
See intel_state.c in i965.
v2: Drop unused vk_to_gen_compare_op.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This resolves some order dependencies between the already existing
callback the newly created one.
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is a limit of 10 display connections, which was a
problem for apps/tests that were continuously opening/closing display
connections.
This fix uses XAddExtension() and XESetCloseDisplay() to keep track
of the status of the display connections from the X server, freeing
mesa-related data as X displays get destroyed by the X server.
Poster child is the VTK "TimingTests"
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
glXCreatePixmap() may specify a GLX_TEXTURE_FORMAT_RGB_EXT format
for an RGBA resource, causing us to create an RGBX view for an
RGBA resource, a combination vgpu10 does not support.
When this is detected, change the request to create an RGBA view
instead.
Reviewed-by: Brian Paul <brianp@vmware.com>
With this patch, make sure the shader resource view is properly created
before referencing it in the generate mipmap command.
Reviewed-by: Brian Paul <brianp@vmware.com>
If running with a software renderer backend, the timeout may be
insufficient, and we don't want to release busy buffers too early.
In practice, SVGA gpu lockups are extremely rare.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
dEQP-GLES31.functional.fbo.no_attachments.maximums.{all,height,size,width}
started hitting assertion failures when emitting SURFACE_STATE, after
commit e8fd60e789 where Samuel increased the maximum viewport size to
32768, from 16384.
MaxFramebufferWidth/Height were being set to the maximum viewport size,
but are actually limited by the SURFACE_STATE width/height field range,
which is 16384 on Gen7+ (where ARB_framebuffer_no_attachments is
exposed). So, reduce these to 16384 explicitly.
Fixes assert fails in the above mentioned dEQP tests. (Those tests
still fail, however.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The adjusted polynomial coefficients come from the numerical
minimization of the L2 norm of the relative error. The old
coefficients would give a maximum relative error of about 15000 ULP in
the neighborhood around acos(x) = 0, the new ones give a relative
error bounded by less than 2000 ULP in the same neighborhood.
Fixes four dEQP subtests:
dEQP-GLES31.functional.shaders.builtin_functions.precision.acos.
highp_compute.{scalar,vec2,vec3,vec4}
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This will allow us to share the implementation while using different
polynomials for asin() and acos().
Francisco Jerez did this in the SPIR-V front-end; I'm merely porting
his idea to the GLSL world.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
From section 6.2 ("State Tables") of the GL 2.1 specification
(the text also appears in the GL 3.0 and ES 3.1 specifications):
"However, state variables for which IsEnabled is listed as the query
command can also be obtained using GetBooleanv, GetIntegerv, GetFloatv,
and GetDoublev."
GL_DEBUG_OUTPUT, GL_DEBUG_OUTPUT_SYNCHRONOUS, and GL_FRAGMENT_SHADER_ATI
were missing from the glGet*() functions. All other IsEnabled() pnames
look to be present, as far as I can tell.
Fixes 8 dEQP-GLES31.functional.debug.state_query subtests:
debug_output[_synchronous]_get{boolean,float,integer,integer64}.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
dEQP-GLES31.functional.debug.state_query.debug_group_stack_depth_*
tries to call glGet on GL_DEBUG_GROUP_STACK_DEPTH right away, before
doing any other debug setup. This should return 1.
However, because ctx->Debug wasn't allocated, we bailed and returned 0.
This patch removes the open-coded locking and switches the two glGet
functions to use _mesa_lock_debug_state(), which takes care of
allocating and initializing that state on the first time. It also
conveniently takes care of unlocking on failure for us, so we don't
need to handle that in every caller.
Fixes dEQP-GLES31.functional.debug.state_query.debug_group_stack_depth_
{getboolean,getfloat,getinteger,getinteger64}.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Support in Mesa main and i965 has just been added.
v2: Include note in 'New Features' of docs/relnotes/11.3.0.html.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Most of the code in anv_meta_blit2d() is borrowed from do_buffer_copy().
Create an image and image view for each rectangle.
Note: For tiled RGB images, ISL will align the image's row_pitch up to
the nearest tile width.
v2 (Jason):
Keep pitch in units of bytes
Make src_format and dst_format variables
s/dest/dst/ in every usage
v3: Fix dst_image width
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Some fields are unnecessary. The variables "pitch" and "bs" are used
for consistency with ISL.
v2: Keep pitch in units of bytes (Jason)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This API is designed to be an abstraction that sits between the VkCmdCopy
commands and the hardware. The idea is that it is simple enough that it
*should* be implementable using the blitter but with enough extra data that
we can implement it with the 3-D pipeline efficiently. One design
objective is to allow the user to supply enough information that we can
handle most blit operations with a single draw call even if they require
copying multiple rectangles.
This is a preparatory commit that will simplify the future usage of
this function.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
If a linear image is requested, the only possible result should be a
linearly-tiled surface.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
If a specific bit is set, the intention to create a surface with a
specific tiling format should be respected.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This pname is tricky. The spec states that an internal format should be
returned, that is compatible with the passed internal format, and has
at least the same precision. There is no clear API to resolve this.
The closest we have (and what other drivers (i.e, NVidia proprietary) do,
is to return the same internal format given as parameter. But we validate
first that the passed internal format is supported by i965.
To check for support, we have the TextureFormatSupported map'. But
this map expects a 'mesa_format', which takes a format+typen. So, we must
first "come up" with a generic type that is suited for this internal format,
then get a mesa_format, and then do the validation.
The cleanest solution here is to add a method that does exactly what
the spec wants: a driver's preferred internal format from a given
internal format. But at this point we lack a clear view of what
defines this preference, and also there seems to be no API for it.
Reviewed-by: Dave Airlie <airlied@redhat.com>
_mesa_format_from_format_and_type() is currently not considering DEPTH and
STENCIL formats, which are not array formats and are not handled anywhere.
This patch adds cases for common combinations of DEPTH/STENCIL format and
types.
Reviewed-by: Dave Airlie <airlied@redhat.com>
We call the driver to provide its preferred type, but also provide a
default implementation that selects a generic type based on the passed
internal format.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is supported since very early version of OpenGL, but we still call the
driver to give it the opportunity to report caveat or no support.
Reviewed-by: Dave Airlie <airlied@redhat.com>
It discards out the targets and internalformats that explicitly
mention (per-spec) that doesn't support filter types other than
NEAREST or NEAREST_MIPMAP_NEAREST. Those are:
* Texture buffers target
* Multisample targets
* Any integer internalformat
For the case of multisample targets, it was used the existing method
_mesa_target_allows_setting_sampler_parameter. This would scalate
better in the future if new targets appear that doesn't allow to
set sampler parameters.
We consider RENDERBUFFER to support LINEAR filters, because although
it doesn't support this filter for sampling, you can set LINEAR
on a blit operation using glBlitFramebuffer.
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_internalformat_query2 specification:
"- FRAMEBUFFER_RENDERABLE: The support for rendering to the resource via
framebuffer attachment is returned in <params>.
- FRAMEBUFFER_RENDERABLE_LAYERED: The support for layered rendering to
the resource via framebuffer attachment is returned in <params>.
- FRAMEBUFFER_BLEND: The support for rendering to the resource
via framebuffer attachment when blending is enabled is returned in
<params>."
For all of them,
"Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
If the resource is unsupported, NONE is returned."
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_internalformat_query2 specification:
"- TEXTURE_SHADOW: The support for using the resource with shadow samplers
is written to <params>.
- TEXTURE_GATHER: The support for using the resource with texture gather
operations is written to <params>.
- TEXTURE_GATHER_SHADOW: The support for using resource with texture gather
operations with shadow samplers is written to <params>."
For all of them,
"Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
If the resource or operation is not supported, NONE is returned."
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_internalformat_query2 specification:
"- TEXTURE_VIEW: The support for using the resource with the TextureView
command is returned in <params>.
Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
If the resource or operation is not supported, NONE is returned.
- VIEW_COMPATIBILITY_CLASS: The compatibility class of the resource when
used as a texture view is returned in <params>. The compatibility
class is one of the values from the /Class/ column of Table 3.X.2. If
the resource has no other formats that are compatible, the resource
does not support views, or if texture views are not supported, NONE is
returned."
Reviewed-by: Dave Airlie <airlied@redhat.com>
It will be used by the ARB_internalformat_query2 implementation to
implement the VIEW_COMPATIBILITY_CLASS <pname> query.
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_internalformat_query2 specification:
"- CLEAR_BUFFER: The support for using the resource with ClearBuffer*Data
commands is returned in <params>.
Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
If the resource or operation is not supported, NONE is returned."
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_internalformat_query2 specification:
"- TEXTURE_COMPRESSED: If <internalformat> is a compressed format
that is supported for this type of resource, TRUE is returned in
<params>. If the internal format is not compressed, or the type of
resource is not supported, FALSE is returned.
- TEXTURE_COMPRESSED_BLOCK_WIDTH: If the resource contains a compressed
format, the width of a compressed block (in bytes) is returned in
<params>. If the internal format is not compressed, or the resource
is not supported, 0 is returned.
- TEXTURE_COMPRESSED_BLOCK_HEIGHT: If the resource contains a compressed
format, the height of a compressed block (in bytes) is returned in
<params>. If the internal format is not compressed, or the resource
is not supported, 0 is returned.
- TEXTURE_COMPRESSED_BLOCK_SIZE: If the resource contains a compressed
format the number of bytes per block is returned in <params>. If the
internal format is not compressed, or the resource is not supported,
0 is returned.
(combined with the above, allows the bitrate to be computed, and may be
useful in conjunction with ARB_compressed_texture_pixel_storage)."
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_internalformat_query2 specification:
"- SIMULTANEOUS_TEXTURE_AND_DEPTH_TEST: The support for using the resource
both as a source for texture sampling while it is bound as a buffer for
depth test is written to <params>. For example, a depth (or stencil)
texture could be bound simultaneously for texturing while it is bound as
a depth (and/or stencil) buffer without causing a feedback loop, provided
that depth writes are disabled.
- SIMULTANEOUS_TEXTURE_AND_STENCIL_TEST: The support for using the resource
both as a source for texture sampling while it is bound as a buffer for
stencil test is written to <params>. For example, a depth (or stencil)
texture could be bound simultaneously for texturing while it is bound as
a depth (and/or stencil) buffer without causing a feedback loop,
provided that stencil writes are disabled.
- SIMULTANEOUS_TEXTURE_AND_DEPTH_WRITE: The support for using the resource
both as a source for texture sampling while performing depth writes to
the resources is written to <params>. For example, a depth-stencil
texture could be bound simultaneously for stencil texturing while it
is bound as a depth buffer. Feedback loops cannot occur because sampling
a stencil texture only returns the stencil portion, and thus writes to
the depth buffer do not modify the stencil portions.
- SIMULTANEOUS_TEXTURE_AND_STENCIL_WRITE: The support for using the resource
both as a source for texture sampling while performing stencil writes to
the resources is written to <params>. For example, a depth-stencil
texture could be bound simultaneously for depth-texturing while it is
bound as a stencil buffer. Feedback loops cannot occur because sampling
a depth texture only returns the depth portion, and thus writes to
the stencil buffer could not modify the depth portions.
For all of them,
"Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
If the resource or operation is not supported, NONE is returned."
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_internalformat_query2 specification:
"- IMAGE_TEXEL_SIZE: The size of a texel when the resource when used as
an image texture is returned in <params>. This is the value from the
/Size/ column in Table 3.22. If the resource is not supported for image
textures, or if image textures are not supported, zero is returned.
- IMAGE_COMPATIBILITY_CLASS: The compatibility class of the resource when
used as an image texture is returned in <params>. This corresponds to
the value from the /Class/ column in Table 3.22. The possible values
returned are IMAGE_CLASS_4_X_32, IMAGE_CLASS_2_X_32, IMAGE_CLASS_1_X_32,
IMAGE_CLASS_4_X_16, IMAGE_CLASS_2_X_16, IMAGE_CLASS_1_X_16,
IMAGE_CLASS_4_X_8, IMAGE_CLASS_2_X_8, IMAGE_CLASS_1_X_8,
IMAGE_CLASS_11_11_10, and IMAGE_CLASS_10_10_10_2, which correspond to
the 4x32, 2x32, 1x32, 4x16, 2x16, 1x16, 4x8, 2x8, 1x8, the class
(a) 11/11/10 packed floating-point format, and the class (b)
10/10/10/2 packed formats, respectively.
If the resource is not supported for image textures, or if image
textures are not supported, NONE is returned.
- IMAGE_PIXEL_FORMAT: The pixel format of the resource when used as an
image texture is returned in <params>. This is the value
from the /Pixel format/ column in Table 3.22. If the resource is not
supported for image textures, or if image textures are not supported,
NONE is returned.
- IMAGE_PIXEL_TYPE: The pixel type of the resource when used as an
image texture is returned in <params>. This is the value from
the /Pixel type/ column in Table 3.22. If the resource is not supported
for image textures, or if image textures are not supported, NONE is
returned."
Reviewed-by: Dave Airlie <airlied@redhat.com>
It will be used by the ARB_internalformat_query2 implementation to
implement the IMAGE_COMPATIBILITY_CLASS <pname> query.
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_internalformat_query2 specification:
"- SHADER_IMAGE_LOAD: The support for using the resource with image load
operations in shaders is written to <params>.
In this case the <internalformat> is the value of the <format> parameter
that would be passed to BindImageTexture.
- SHADER_IMAGE_STORE: The support for using the resource with image store
operations in shaders is written to <params>.
In this case the <internalformat> is the value of the <format> parameter
that is passed to BindImageTexture.
- SHADER_IMAGE_ATOMIC: The support for using the resource with atomic
memory operations from shaders is written to <params>."
For all of them:
"Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
If the resource or operation is not supported, NONE is returned."
Reviewed-by: Dave Airlie <airlied@redhat.com>
It will be used by the ARB_internalformat_query2 implementation
to implement queries related to the ARB_shader_image_load_store
extension.
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_internalformat_query2 specification:
"- VERTEX_TEXTURE: The support for using the resource as a source for
texture sampling in a vertex shader is written to <params>.
- TESS_CONTROL_TEXTURE: The support for using the resource as a source for
texture sampling in a tessellation control shader is written to <params>.
- TESS_EVALUATION_TEXTURE: The support for using the resource as a source
for texture sampling in a tessellation evaluation shader is written to
<params>.
- GEOMETRY_TEXTURE: The support for using the resource as a source for
texture sampling in a geometry shader is written to <params>.
- FRAGMENT_TEXTURE: The support for using the resource as a source for
texture sampling in a fragment shader is written to <params>.
- COMPUTE_TEXTURE: The support for using the resource as a source for
texture sampling in a compute shader is written to <params>."
For all of them,
"Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
If the resource or operation is not supported, NONE is returned."
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_internalformat_query2 specification:
"- SRGB_DECODE_ARB: The support for toggling whether sRGB decode happens at
sampling time (see EXT/ARB_texture_sRGB_decode) for the resource is
returned in <params>.
Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
If the resource or operation is not supported, NONE is returned."
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_internalformat_query2 specification:
"- SRGB_READ: The support for converting from sRGB colorspace on read
operations (see section 3.9.18) from the resource is returned in
<params>.
Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
If the resource or operation is not supported, NONE is returned.
- SRGB_WRITE: The support for converting to sRGB colorspace on write
operations to the resource is returned in <params>.
This indicates that writing to framebuffers with this internalformat
will encode to sRGB color spaces when FRAMEBUFFER_SRGB is enabled (see
section 4.1.8).
Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
If the resource or operation is not supported, NONE is returned."
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_internalformat_query2 specification:
"- COLOR_ENCODING: The color encoding for the resource is returned in
<params>. Possible values for color buffers are LINEAR or SRGB,
for linear or sRGB-encoded color components, respectively. For non-color
formats (such as depth or stencil), or for unsupported resources,
the value NONE is returned."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Specifically MIPMAP, MANUAL_GENERATE_MIPMAP and AUTO_GENERATE_MIPMAP <pname>
queries.
From the ARB_internalformat_query2 specification:
"- MIPMAP: If the resource supports mipmaps, TRUE is returned in <params>.
If the resource is not supported, or if mipmaps are not supported for
this type of resource, FALSE is returned.
- MANUAL_GENERATE_MIPMAP: The support for manually generating mipmaps for
the resource is returned in <params>.
Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
If the resource is not supported, or if the operation is not supported,
NONE is returned.
- AUTO_GENERATE_MIPMAP: The support for automatic generation of mipmaps
for the resource is returned in <params>.
Possible values returned are FULL_SUPPORT, CAVEAT_SUPPORT, or NONE.
If the resource is not supported, or if the operation is not supported,
NONE is returned."
Reviewed-by: Dave Airlie <airlied@redhat.com>
It is implemented combining the values returned by calls to the 32-bit
query _mesa_GetInternalformati32v.
The main reason is simplicity. The other option would be C&P how we
implemented the support of GL_MAX_{WIDTH/HEIGHT/DEPTH} and GL_SAMPLES.
Additionally, doing this way, we avoid adding checks on the code, as
are done by the call to the query itself.
MAX_COMBINED_DIMENSIONS is the only pname pointed on the spec of
needing a 64-bit query. We handle that possibility by packing the
returning value on the two first 32-bit integers of params. This
would work on the 32-bit query as far as the value is not greater
that INT_MAX. On the 64-bit query wrapper we unpack those values
in order to get the final value.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Implemented by calling GetIntegerv with the equivalent pname and
handling individually the exceptions related to dimensions.
All those pnames are used to get the maximum value for each dimension
of the given target. The only difference between this calls and
calling GetInteger with pnames like GL_MAX_TEXTURE_SIZE,
GL_MAX_3D_TEXTURE_SIZE, etc is that GetInternalformat allows to
specify a internalformat.
But at this moment, there is no reason to think that the values would
be different based on the internalformat. The spec already take that
into account, using these specific pnames as example on Issue 7 of
arb_internalformat_query2 spec.
So this seems like a hook to allow to return different values based on
the internalformat in the future.
It is worth to note that the piglit test associated to those pnames
are checking the returned values of GetInternalformat against the
values returned by GetInteger, and the test is passing with NVIDIA
proprietary drivers.
main/formatquery: support for MAX_{WIDTH/HEIGHT/DEPTH/LAYERS}
Implemented by calling GetIntegerv with the equivalent pname and
handling individually the exceptions related to dimensions.
All those pnames are used to get the maximum value for each dimension
of the given target. The only difference between this calls and
calling GetInteger with pnames like GL_MAX_TEXTURE_SIZE,
GL_MAX_3D_TEXTURE_SIZE, etc is that GetInternalformat allows to
specify a internalformat.
But at this moment, there is no reason to think that the values would
be different based on the internalformat. The spec already take that
into account, using these specific pnames as example on Issue 7 of
arb_internalformat_query2 spec.
So this seems like a hook to allow to return different values based on
the internalformat in the future.
It is worth to note that the piglit test associated to those pnames
are checking the returned values of GetInternalformat against the
values returned by GetInteger, and the test is passing with NVIDIA
proprietary drivers.
v2: use _mesa_has## instead of direct ctx->Extensions access (Nanley Chery)
Reviewed-by: Dave Airlie <airlied@redhat.com>
From arb_internalformat_query2 spec:
"IMAGE_FORMAT_COMPATIBILITY_TYPE: The matching criteria use for the
resource when used as an image textures is returned in
<params>. This is equivalent to calling GetTexParameter with <value>
set to IMAGE_FORMAT_COMPATIBILITY_TYPE."
Current implementation of GetTexParameter for this case returns a
field of a texture object, so the support of this pname was
implemented creating a temporal texture object and returning that
value.
It is worth to mention that right now that field is not reassigned
after initialization. So it is somehow hardcoded. An alternative
option would be return that value. That doesn't seems really scalable
though.
v2: use _mesa_has## instead of direct ctx->Extensions access (Nanley Chery)
Reviewed-by: Dave Airlie <airlied@redhat.com>
From arb_internalformat_query2 spec:
" If <internalformat> is not color-renderable, depth-renderable, or
stencil-renderable (as defined in section 4.4.4), or if <target>
does not support multiple samples (ie other than
TEXTURE_2D_MULTISAMPLE, TEXTURE_2D_MULTISAMPLE_ARRAY, or
RENDERBUFFER), <params> is not modified."
So there are cases where the buffer should not be modified. As the
64-bit query is a wrapper over the 32-bit query, we can't just copy
the values to the equivalent 32-bit buffer, as that would fail if the
original params contained values greater that INT_MAX. So we need to
copy-back only the values that got modified by the 32-bit query. We do
that by filling the temporal buffer by negatives, as the 32-bit query
should not return negative values ever.
Reviewed-by: Dave Airlie <airlied@redhat.com>
It just does a wrapping on the existing 32-bit GetInternalformativ.
We will maintain the 32-bit query as default as it is likely that
it would be the one most used.
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_internalformat_query2 spec:
"- INTERNALFORMAT_RED_SIZE
- INTERNALFORMAT_GREEN_SIZE
- INTERNALFORMAT_BLUE_SIZE
- INTERNALFORMAT_ALPHA_SIZE
- INTERNALFORMAT_DEPTH_SIZE
- INTERNALFORMAT_STENCIL_SIZE
- INTERNALFORMAT_SHARED_SIZE
For uncompressed internal formats, queries of these values return the
actual resolutions that would be used for storing image array components
for the resource.
For compressed internal formats, the resolutions returned specify the
component resolution of an uncompressed internal format that produces
an image of roughly the same quality as the compressed algorithm.
For textures this query will return the same information as querying
GetTexLevelParameter{if}v for TEXTURE_*_SIZE would return.
If the internal format is unsupported, or if a particular component is
not present in the format, 0 is written to <params>.
- INTERNALFORMAT_RED_TYPE
- INTERNALFORMAT_GREEN_TYPE
- INTERNALFORMAT_BLUE_TYPE
- INTERNALFORMAT_ALPHA_TYPE
- INTERNALFORMAT_DEPTH_TYPE
- INTERNALFORMAT_STENCIL_TYPE
For uncompressed internal formats, queries for these values return the
data type used to store the component.
For compressed internal formats the types returned specify how components
are interpreted after decompression.
For textures this query returns the same information as querying
GetTexLevelParameter{if}v for TEXTURE_*TYPE would return.
Possible values return include, NONE, SIGNED_NORMALIZED,
UNSIGNED_NORMALIZED, FLOAT, INT, UNSIGNED_INT, representing missing,
signed normalized fixed point, unsigned normalized fixed point,
floating-point, signed unnormalized integer and unsigned unnormalized
integer components. NONE is returned for all component types if the
format is unsupported."
Reviewed-by: Dave Airlie <airlied@redhat.com>
The new pnames accepted by the function are:
- INTERNALFORMAT_RED_SIZE
- INTERNALFORMAT_GREEN_SIZE
- INTERNALFORMAT_BLUE_SIZE
- INTERNALFORMAT_ALPHA_SIZE
- INTERNALFORMAT_DEPTH_SIZE
- INTERNALFORMAT_STENCIL_SIZE
It will be used by the ARB_internalformat_query2 implementation to
implement those pnames.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The new pnames accepted by the function are:
- INTERNALFORMAT_RED_SIZE
- INTERNALFORMAT_GREEN_SIZE
- INTERNALFORMAT_BLUE_SIZE
- INTERNALFORMAT_ALPHA_SIZE
- INTERNALFORMAT_DEPTH_SIZE
- INTERNALFORMAT_STENCIL_SIZE
- INTERNALFORMAT_RED_TYPE
- INTERNALFORMAT_GREEN_TYPE
- INTERNALFORMAT_BLUE_TYPE
- INTERNALFORMAT_ALPHA_TYPE
- INTERNALFORMAT_DEPTH_TYPE
- INTERNALFORMAT_STENCIL_TYPE
It will be used by the ARB_internalformat_query2 implementation to
implement those pnames.
Reviewed-by: Dave Airlie <airlied@redhat.com>
It will be used by the ARB_internalformat_query2 implementation
to check if the target is valid for those <pnames> that are said
in the spec that should return the same values than the
'glGetTexLevelParameter{if}v' function:
- INTERNALFORMAT_RED_SIZE
- INTERNALFORMAT_GREEN_SIZE
- INTERNALFORMAT_BLUE_SIZE
- INTERNALFORMAT_ALPHA_SIZE
- INTERNALFORMAT_DEPTH_SIZE
- INTERNALFORMAT_STENCIL_SIZE
- INTERNALFORMAT_SHARED_SIZE
- INTERNALFORMAT_RED_TYPE
- INTERNALFORMAT_GREEN_TYPE
- INTERNALFORMAT_BLUE_TYPE
- INTERNALFORMAT_ALPHA_TYPE
- INTERNALFORMAT_DEPTH_TYPE
- INTERNALFORMAT_STENCIL_TYPE
- IMAGE_FORMAT_COMPATIBILITY_TYPE
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_internalformat_query2 specification:
"The INTERNALFORMAT_SUPPORTED <pname> can be used to determine if
the internal format is supported, and the other <pnames> are defined
in terms of whether or not the format is supported."
v2: Consider also FBO base formats when checking if the internalformat is
supported.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Checks that the 'resource', as defined by the ARB_internalformat_query2
specification, is supported by the implementation for those 'pnames'
that require this check.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This would allow to use this method if you are just querying if it is
allowed, like for arb_internalformat_query2.
Reviewed-by: Dave Airlie <airlied@redhat.com>
It will be used by the ARB_internalformat_query2 implementation
to check if a certain compressed 'internalformat' is supported
by texture 'targets'.
Reviewed-by: Dave Airlie <airlied@redhat.com>
It will be used by the ARB_internalformat_query2 implementation
to check if the 'internalformat' passed is supported by texture
MULTISAMPLE 'targets'.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Equivalent to commit bda540 (that added GL_ARB_internalformat_query)
v2: include the new xml to to API_XML list at Makefile.am (Emil Velikov)
Reviewed-by: Dave Airlie <airlied@redhat.com>
The goal is to extend the GetInternalformativ query to implement the
ARB_internalformat_query2 specification, keeping the behaviour defined
by the ARB_internalformat_query if ARB_internalformat_query2 is not
supported.
v2: Don't require ARB_internalformat_query when profile is GLES3.
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the ARB_internalformat_query2 spec:
"If the particular <target> and <internalformat> combination do not make
sense, or if a particular type of <target> is not supported by the
implementation the "unsupported" answer should be given. This is not an
error."
This function checks if the <target> is supported by the implementation.
v2: Allow RENDERBUFFER targets also on GLES 3 profiles.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The ARB_internalformat_query2 specification defines which is the
reponse best representing "not supported" or "not applicable" for
each <pname>.
Queries for unsupported features, targets, internalformats, combinations
of: target and internalformat, target and pname, pname and internalformat,
do not return an error but the corresponding 'unsupported' response.
We will use that response as the default answer.
For SAMPLES the 'unsupported' response is to not modify the 'params' buffer.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Handles the cases where an error should be returned according
to the ARB_internalformat_query and ARB_internalformat_query2
specifications.
Reviewed-by: Dave Airlie <airlied@redhat.com>
At this point, all uses have been replaced by the more general hook
QueryInternalFormat, introduced by ARB_internalformat_query2.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Implements SAMPLES and NUM_SAMPLE_COUNTS queries using the new generic
driver call QueryInternalFormat, which is being introduced as replacement
of QuerySamplesForFormat to support ARB_internalformat_query2.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Currently, the number of integers returned in the response to
GetInternalFormativ is being tracked by a 'count' variable.
This is so only the modified elements from the temporary buffer are copied into
the original user buffer.
However, with the introduction of ARB_internalformat_query2, keeping track
of 'count' would complicate the code a lot, considering the high number of
queries.
So, we propose to forget about tracking count, and move all the 16 elements
in the temporary buffer, back to the user buffer (clamped to user buffer size
of course). This is basically a trade-off between performance and code clarity.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Use QueryInternalFormat instead of QuerySamplesForFormat to obtain the
highest supported sample. QuerySamplesForFormat is to be removed.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This effectively disables old QuerySamplesForFormat driver hook, since it is
never called by Mesa anymore.
v2: Call brw_query_samples_for_format() with a dummy buffer to calculate num
samples, to avoid modifying the original buffer.
Reviewed-by: Dave Airlie <airlied@redhat.com>
By default, we call back the driver's hook fallback function that has generic
implementations for the all the queries.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This new function queries different driver parameters for a particular target
and texture format. It is basically a driver hook to support
ARB_internalformat_query2.
Since ARB_internalformat_query2 introduced several new query parameters
over ARB_internalformat_query, having one driver hook for each parameter
is no longer feasible. So this is the generic entry-point for calls
to glGetInternalFormativ and glGetInternalFormati64v.
Reviewed-by: Dave Airlie <airlied@redhat.com>
When we find indirect indexing into an array, the current implementation
of the array spliiting optimization pass does not look further into the
expression tree. However, if the variable expression involves variable
indexing into other arrays, we can miss that these other arrays also have
variable indexing. If that happens, the pass will crash later on after
hitting an assertion put there to ensure that split arrays are in fact
always indexed via constants:
shader_runner: opt_array_splitting.cpp:296:
void ir_array_splitting_visitor::split_deref(ir_dereference**): Assertion `constant' failed.
This patch fixes the problem by letting the pass step into the variable
index expression to identify these cases properly.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89607
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
There is an old if statement (dated to 2011) that prevented doing
endian swap for colorformat, in case the buffer is marked as
PIPE_USAGE_STAGING.
This is now wrong because st_ReadPixels() reads into a destination
texture that is marked with PIPE_USAGE_STAGING. Therefore, even if
the texture is rendered correctly to the monitor, when reading it
back we get unswapped/wrong values.
This patch makes the check_rgba() function in gl-1.0-readpixsanity
piglit test pass in big-endian.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There is an old if statement (dated to 2011) that prevented doing
endian swap for colorformat, in case the buffer is marked
as PIPE_USAGE_STAGING.
This is now wrong because st_ReadPixels() reads into a destination
texture that is marked with PIPE_USAGE_STAGING. Therefore, even if
the texture is rendered correctly to the monitor, when reading it
back we get unswapped/wrong values.
This patch makes the check_rgba() function in gl-1.0-readpixsanity
piglit test pass in big-endian.
v2: removed duplicate call to r600_colorformat_endian_swap() inside
evergreen_init_color_surface_rat()
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
swr driver which is written in C++ needs access to some more
gallium utility functions than are currently exposed.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
OpenSWR is a new software rasterizer for x86 processors designed
for high performance and high scalablility on visualization workloads.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Texture is already allocated before calling this meta function. So,
the value of 'allocate_storage' passed to the function is always false.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
OpenGL ES 1.0 doesn't support using GL_STREAM_DRAW and both
ES 1.0 and 2.0 don't support GL_STREAM_READ in glBufferData().
So, handle it correctly by calling the _mesa_meta_begin()
before create_texture_for_pbo().
V2: Remove the changes related to allocate_storage. (Ian)
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For OpenGL, see commit 9a939ebb47.
Fixes:
* dEQP-VK.compute.indirect_dispatch.upload_buffer.empty_command
* dEQP-VK.compute.indirect_dispatch.gen_in_compute.empty_command
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Only one of these were recently introduced. However, since
we keep copy/pasting the same wrong indentation we should
probably just fix it.
Reviewed-by: Brian Paul <brianp@vmware.com>
We should not dereference shader before we have done the
null check.
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
From ARB_viewport_array spec:
" * On GL3-capable hardware the VIEWPORT_BOUNDS_RANGE should be at least
[-16384, 16383].
* On GL4-capable hardware the VIEWPORT_BOUNDS_RANGE should be at least
[-32768, 32767]."
This range is set using ctx->Const.MaxViewportWidth value, so just bump
those constants to 32k for gen7+ which can support OpenGL 4.0.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Don't use hardcoded ones because the driver can set different ones.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While Broadwell is very good about UINT formats, HSW is more restrictive.
Neither R8G8B8_UINT nor R16G16B16_UINT really exist on HSW. It should be
safe to just use the unorm formats.
Now that we are no longer using the pctx reference in the shader, drop
it and turn on shareable shaders.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
And use this for allocating bo's to hold the shader binary, rather than
accessing the dev via ctx ptr. One step towards making shaders sharable
across contexts.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Commit 65dfb30 added exec_list EmptyUniformLocations, but only
initialized the list if ARB_explicit_uniform_location was enabled,
leading to crashes if the extension was not available.
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.
In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions. The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.
Here's the problem scenario:
- Application calls a meta function that generates a name. The first
Gen will probably return 1.
- Application decides to use the same name for an object of the same
type without calling Gen. Many demo programs use names 1, 2, 3,
etc. without calling Gen.
- Application calls the meta function again, and the meta function
replaces the data. The application's data is lost, and the app
fails. Have fun debugging that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.
In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions. The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.
Here's the problem scenario:
- Application calls a meta function that generates a name. The first
Gen will probably return 1.
- Application decides to use the same name for an object of the same
type without calling Gen. Many demo programs use names 1, 2, 3,
etc. without calling Gen.
- Application calls the meta function again, and the meta function
replaces the data. The application's data is lost, and the app
fails. Have fun debugging that.
Fixes piglit tests:
- object-namespace-pollution glGetTexImage-compressed framebuffer
- object-namespace-pollution glGenerateMipmap framebuffer
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This enables later patches that will stop calling _mesa_GenFramebuffers
or _mesa_CreateFramebuffers which pollute the framebuffer namespace.
For framebuffers, the Bind call is still necessary.
sed -i -e 's/_mesa_GenFramebuffers/_mesa_CreateFramebuffers/' \
src/mesa/drivers/common/*.c
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This enables later patches that will stop calling _mesa_GenFramebuffers
or _mesa_CreateFramebuffers which pollute the framebuffer namespace.
For framebuffers, the Bind call is still necessary.
sed -i -e 's/_mesa_GenFramebuffers/_mesa_CreateFramebuffers/' \
src/mesa/drivers/dri/i965/*.c
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Some meta operations can be called recursively. Future changes (the
"Don't pollute the ... namespace" changes) will cause objects with
invalid names to be used. If a nested meta operation tries to restore
an object named 0xDEADBEEF, it will fail.
This also fixes another latent bug in meta. In a multithreaded,
multicontext application, one thread can delete an object that is bound
in another thread. That object continues to exist until it is unbound
(i.e., its refcount drops to zero). Meta unbinds objects all over the
place. As a result, the rebind in _mesa_meta_end could fail because the
object vanished!
See https://bugs.freedesktop.org/show_bug.cgi?id=92363#c8.
Using _mesa_reference_<object type> to save and restore the objects
prevents the refcount from going to zero.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fixing dd_function_table::BindFramebuffer will come later because that
change is probably not suitable for stable.
v2: Fix whitespace issue noticed by Topi.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Also change the name of the function to
_mesa_meta_framebuffer_texture_image. The function is basically a
wrapper around _mesa_framebuffer_texture (which is used to implement
glFramebufferTexture1D and friends), so it makes sense for it's name to
be similar to that.
The next patch will clean _mesa_meta_framebuffer_texture_image up
considerably.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(This is commit 4a1c8a3037 for vec4 mode.)
Using the push model for inputs is much more efficient than pulling
inputs - the hardware can simply copy a large chunk into URB registers
at thread creation time, rather than having the thread send messages to
request data from the L3 cache. Unfortunately, it's possible to have
more TES inputs than fit in registers, so we have to fall back to the
pull model in some cases.
However, it turns out that most tessellation evaluation shaders are
fairly simple, and don't use many inputs. An arbitrary cut-off of
24 vec4 slots (12 registers) should suffice. (I chose this instead of
the 32 vec4 slots used in the scalar backend to avoid regressing a few
Piglit tests due to the vec4 register allocator being too stupid to
figure out what to do. We probably ought to fix that, but it's a
separate issue.)
Improves performance in GPUTest's tessmark_x64 microbenchmark by
41.5394% +/- 0.288519% (n = 115) at 1024x768 on my Clevo W740SU
(with Iris Pro 5200).
Improves performance in Synmark's Gl40TerrainFlyTess microbenchmark by
38.3576% +/- 0.759748% (n = 42).
v2: Simplify abs/negate handling, as requested by Matt.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
in order to make some winsys interface changes easier
This distros should use new DRM if they want to use new Mesa:
Distro kernel mesa eol
SLES 10 2.6.16 6.4.2 2016-07
SLED 11 3.0 9.0.3 2022-03
RHEL 5 2.6.18 6.5.1 2017-03
RHEL 6 2.6.32 10.4.3 2020-11
Debian 6 2.6.32 7.7.1 2016-02
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This can increase perf for shaders that kill pixels (kill, alpha-test,
alpha-to-coverage).
v2: add comments
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
When reusing a depth attachment as a stencil, we need to propogate
the layered bit, otherwise we fail to complete the framebuffer.
discovered running ./bin/fbo-depth-array depth-layered-clear
on virgl on haswell.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Here is another threading issue with MANAGED buffers:
Thread 1: buffer creation
Thread 1: buffer lock
Thread 2: Draw call
Thread 1: writes data
Thread 1: Unlock
Without this patch, the buffer is initially dirty
and in the list of things to upload after its creation.
The draw call will then upload the data and unset the dirty flag,
and the Unlock won't trigger a second upload.
Fixes regression introduced by cc0114f30b:
"st/nine: Implement Managed vertex/index buffers"
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
d3d calls are protected by mutexes, however if app is doing in
two threads:
Thread 1: buffer Lock
Thread 2: Draw call
Thread 1: writes data
Thread 1: Unlock
Then before this patch, the Draw call would begin to upload
the buffer.
Solves this by moving the moment we add the buffer to the queue
of things to upload (We move it from Lock time to Unlock time).
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
When the semantic is Position (which can happen with index 0 only),
use the helper to get Position input.
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Add the virtio-gpu PCI ID for virtio 1.0 (according to the
specification, "the PCI Device ID is calculated by adding 0x1040 to the
Virtio Device ID")
Support for virtio 1.0 was added in qemu 2.4 (same time virtio-gpu
landed).
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This is a very rudimentary script that checks if any of the applied
cherry-picks have been referenced (fixed?) by another patch. With the
latter either missing the stable tag or hasn't yet been picked.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Pretty much all of these are enabled by default. Considering the recent
updates (see previous commits) one might as well list most/all of these
here.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Namely - opencl, osmesa (only the gallium flavour as it conflicts with
the classic one), surfaceless egl platform and a couple gallium drivers
(virgl and vc4).
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Namely:
b662d5282f mesa: Add clean-local rule to remove .lib links.
5c1aac17ad install-lib-links: don't depend on .libs directory
fece147be5 install-lib-links: remove the .install-lib-links file
With these in place, make distcheck now passes and a race condition has
been avoided.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
In builds with clang, there are several errors related to the enum
alu_op_flags like this:
src/gallium/drivers/r600/sb/sb_expr.cpp:887:8:
error: case value evaluates to -1610612736, which cannot be narrowed to
type 'unsigned int' [-Wc++11-narrowing]
These are due to the MSB being set in the enum. Fix these errors by
making the enum values unsigned as needed. The flags field that stores
this enum also needs to be unsigned.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Fix compiles with clang that have this C++11 error:
src/gallium/drivers/radeon/r600_pipe_common.h:662:34:
error: invalid suffix on literal; C++11 requires a space between literal
and identifier [-Wreserved-user-defined-literal]
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
MESA_DRI_MODULE_PATH is only getting set for classic DRI drivers and may or
may not be set correctly for gallium_dri.so depending on the makefile
include ordering. For Android 6 and earlier it is fine, but with build
system changes in AOSP master, it is not.
Move the path variables to a single place at the top level and introduce
MESA_DRI_MODULE_REL_PATH for Android 5 and later which require relative
paths. With this, there is a single variable to change.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
clang complains about date/time macros:
src/mesa/main/context.c:403:25: error: expansion of date or time macro is not reproducible [-Werror,-Wdate-time]
Disable this warning.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The makefile was implicitly picking up YACC_HEADER_SUFFIX from the Android
build system, but this variable is now gone. Add it locally to fix the
build with AOSP master.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
With the Android build system changes to ninja/kati, the use of
.SECONDEXPANSION is no longer supported. Fix this by avoiding rule specific
variables and using $(transform-generated-source).
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Commits a39a8fbbaa ("nir: move to compiler/") and eb63640c1d
("glsl: move to compiler/") broke Android builds. Fix them.
There is also a missing dependency between generated NIR headers and
several libraries. This isn't a new issue, but seems to have been
exposed by the NIR move.
Built with i915, i965, freedreno, r300g, r600g, vc4, and virgl enabled.
Cc: "11.2" <mesa-stable@lists.freedesktop.org>
Cc: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This function is currently broken for BE. I assume it's because of
util_pack_color(). Until I fix this path, I prefer to disable it so users
would be able to see correct colors on their desktop and applications.
Together with the two following patches:
- gallium/r600: Don't let h/w do endian swap for colorformat
- gallium/radeon: remove separate BE path in r600_translate_colorswap
it fixes BZ#72877 and BZ#92039
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Since the rework on gallium pipe formats, there is no more need to do
endian swap of the colorformat in the h/w, because the conversion between
mesa format and gallium (pipe) format takes endianess into account (see
the big #if in p_format.h).
v2: return ENDIAN_NONE only for four 8-bits components
(V_0280A0_COLOR_8_8_8_8)
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
After further testing, it appears there is no need for
separate BE path in r600_translate_colorswap()
The only fix remaining is the change of the last if statement, in the 4
channels case. Originally, it contained an invalid swizzle configuration
that never got hit, in LE or BE. So the fix is relevant for both systems.
This patch adds an additional 120 available visuals for LE and BE,
as seen in glxinfo
v2:
Tested for regressions by running piglit gpu.py with CAICOS (r600g) on
x86-64 machine. No regressions found.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
After 3ecd357d81, it may be possible for
the VS to get assigned all of the URB space.
On Ivy Bridge, this will cause the offset for the other stages to be
16, which cannot be packed into the ConstantBufferOffset field of
3DSTATE_PUSH_CONSTANT_ALLOC_*.
Instead we can set the offset to zero if the stage size is zero.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
It appears that it actually needs to be aligned to the datum size, so it
was 1 when testing with R8, but it can be as high as 16 with RGBA32.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
This allows us to avoid doing some unneeded work on the meta paths where we
know that the image view will be used for exactly one thing. The meta
paths also sometimes do things that aren't quite valid like setting the
array slice on a 3-D texture and we want to limit the number of paths that
need to be able to sensibly handle the lies.
If you have an out-of-tree build, gen8_pack.h and friends will not be in
the same folder as genX_pack.h so this will be a problem. We fixed
out-of-tree earlier by adding the genxml folder to the includes for the
vulkan driver. However, this is not a good long-term solution because we
want to use it in ISL as well.
The two extensions are identical, and are largely taking bits of already
existing desktop functionality. We continue to do a poor job of
supporting the 'precise' keyword, just like we do on desktop.
This passes the relevant dEQP tests that I could find.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Could be exposed on earlier GLES versions if we supported EXT_sRGB, but
we don't, for now.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Consecutive tiles are separated by the size of the tile, not by the
logical tile width.
v2: Remove extra subtraction (Ville)
Add parenthesis (Jason)
v3: Update the unit tests for the function
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This will now never occur. The empty if-else part would have already
been removed leaving an empty if-endif part.
No shader-db changes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On BDW,
total instructions in shared programs: 8448571 -> 8448367 (-0.00%)
instructions in affected programs: 21000 -> 20796 (-0.97%)
helped: 116
HURT: 0
v2: Remove spurious attempt to combine the if_block with the (removed!)
else_block. Suggested by Matt and Curro. Correct the comment
describing what the new pass does. Suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This provides a trivial simplification now, and it makes some future
changes more straight forward.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
'git diff -w' is a bit more illustrative. A couple declarations were
moved, the continue was removed, and the code was reindented. This will
simplify future changes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Both logic and indentation suggests that the ; were not intended here.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The same code appeared in both branches; pull it above the if statement.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The caller already computes it. Now that we have stage specific
functions, it's really easy to pass this in.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The caller already computes it. Now that we have stage specific
functions, it's really easy to pass this in.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Now that each stage is directly calling brw_nir_lower_io(), and we have
per-stage helper functions, it makes sense to just call the relevant one
directly, rather than going through multiple switch statements.
This also eliminates stupid function parameters, such as the two that
only apply to vertex attributes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
These functions are both giant switch statements where most cases don't
overlap at all. Let's put the bulk of the work in per-stage helpers.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Most cases already call nir_lower_io explicitly for input and output
lowering. This catch all isn't very useful anymore - we can just add it
to the remaining cases.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This simplifies things. Every caller of brw_nir_lower_io() immediately
calls brw_postprocess_nir(). The only real change this will have is
that we get an extra brw_nir_optimize() call when compiling compute
shaders, but that seems fine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We've now hit literally every case other than geometry shaders (and
compute shaders, but those are a no-op). So, let's just move geometry
shaders over too and be done with it.
The only advantage to doing this at link time was to save the expense
of running the pass on recompiles. But we're already running a lot of
passes, and the extra code complexity isn't worth it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Shorter than compiler->scalar_stage[MESA_SHADER_GEOMETRY], which can
help with line-wrapping.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The Vulkan driver wants to be able to delete fragment outputs that are
beyond key.nr_color_regions; this is a lot easier if we lower outputs at
specialization time rather than link time.
(Rationale added to commit message by Ken)
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Without this, on SIMD 16 the send instruction destination will appear
to write more than one destination register, causing the simulator to
report an error.
Of course, the send instruction can actually write more than one
destination register regardless of the type set for the destination,
so this is a bit strange.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reduce the amount of duplicated code by re-using
nvc0_program_validate(). While we are at it, change the prototype
to return void and remove nvc0_compute.h which is now useless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
No reason to not validate those global buffers and this might avoid
fails if someone try to use the global memory from compute programs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Since commit d1314de293 we ignore
damage passed to SwapBuffersWithDamage.
Wayland 1.10 now has functionality that allows us to properly
process those damage rectangles, and a way to query if it's
available.
Now we can use wl_surface.damage_buffer and interpret the incoming
damage as being in buffer co-ordinates.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Allows us to transform
add res src0 src1
mov.sat dst -res
into
add.sat dst -src0 -src1
No shader-db changes.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Allows us to transform
mul res src0 src1
mov.sat dst -res
into
mul.sat dst src0 -src1
instructions in affected programs: 45246 -> 45054 (-0.42%)
helped: 162
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's not correct to CSE these multiplies
mul.sat dst1, -a, b
mul.sat dst2, a, b
by emitting a negated MOV from dst1 to dst2:
mul.sat dst1, -a, b
mov dst2, -dst1
Take 2.0*2.0 for example. The first multiply would produce 0.0 and the
second would produce 1.0.
Fixes bad generated code in 18 to 22 shaders:
instructions in affected programs: 432 -> 464 (7.41%)
helped: 4
HURT: 18
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Because I changed the swizzle check, I also need to adapt the return
values for each check.
It's basically almost the same as before, we just cross between STD and
STD_REV, and cross between ALT and ALT_REV
This fixes the rgba test in gl-1.0-readpixsanity (piglit) and also
fixes tri-flat (mesa demos).
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously loops like
do {
// ...
} while (false);
that did not have any other loop-branch instructions would not be
unrolled. This is commonly used to wrap multiline preprocessor macros.
This produces IR like
(loop (
...
break
))
Since limiting_terminator was NULL, the loop unroller would
throw up its hands and say, "I don't know how many iterations. How
can I unroll this?"
We can detect this another way. If there is no limiting_terminator
and the only loop-branch is a break as the last IR, there's only one
iteration.
On my very old checkout of shader-db, this removes a loop from Orbital
Explorer, but it does not otherwise affect the shader. The loop removed
is the one the compiler inserts surrounding the switch statement.
This change does prevent some seriously bad code generation in some
patches to meta shaders that I recently sent out for review.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
RGBA8 and BGRA8 unorm formats are compatible with the various
mem_copy functions. Their sRGB counterparts are also compatible
because they're also color-renderable (of importance when the
specified resource is a readbuffer) and they share the same
physical layout.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously we allocated 4kB of push constant space for VS, GS, and PS
(for a total of 12kB) no matter what. This works, but doesn't fully
utilize the space - we have 16kB or 32kB of space.
This makes anv use the same method as brw - divide up the space evenly
among all active shader stages. This means HS and DS would get space,
if those shader stages existed.
In the future, we can probably do better by inspecting how many push
constants each shader stage uses, and weight things accordingly. But
this is strictly better than the old code, and ideally we'd justify
a fancier solution with actual performance data.
Rather than keeping separate {vs,hs,ds,gs}_start fields, we now store an
array indexed by the shader stage (MESA_SHADER_*). The 3DSTATE_URB_*
commands are also sequentially numbered. This makes it easy to just
emit them in a loop.
This simplifies the code a little, and also will make it easier to add
more credible HS and DS code later.
Not called from any other file. Remove _mesa_ prefix and update comments.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
To match the convention of other device driver functions.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The change "mesa/readpix: Don't clip in _mesa_readpixels()" caused a
few piglit regressions. The failing tests use glReadPixels to read
from the front color buffer. The problem is we were trying to read
from a non-existant front color buffer. The front color buffer is
created on demand in st/mesa. Since the missing buffer bounds were
effectively 0 x 0 the glReadPixels was totally clipped and returned
early.
The fix involves creating the real front color buffer when we're about
to try reading from it.
Tested with llvmpipe and VMware driver on Linux, Windows.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94253
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94254
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94257
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Unfortunately, glslang gives us cull/clip distance and GS streams even if
the shader doesn't use it whenever a shader is declared as version 450.
This is a glslang bug, but we can easily enough ignore it for now.
The descriptor sizes array gives the total number of each type of
descriptor that will ever be allocated from the pool, not the total amount
that may be in any particular set. In our case, this simply means that we
have to sum a bunch of things up and there we go.
We can't use a global descriptor pool like we were because it's not
thread-safe. For now, we'll allocate them on-the-fly and that should work
fine. At some point in the future, we could do something where we
stack-allocate them or allocate them out of one of the state streams.
The current code in r600_translate_colorswap uses the swizzle information
to determine which colorswap to use.
This works for BE & LE when the nr_channels is <4, but when nr_channels==4
(e.g. PIPE_FORMAT_A8R8G8B8_UNORM), this method can not be used for both BE
and LE, because the swizzle info is the same for both of them.
As a result, r600g doesn't support 24bit color formats, only 16bit, which
forces the user to choose 16bit color in X server.
This patch fixes this bug by separating the checks for LE and BE and
adapting the swizzle conditions in the BE part of the checks.
Tested on an Evergreen GPU (Cedar GL FirePro 2270) running inside POWER7
Big-Endian Machine.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
CC: "11.2" "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Before the luminance stride was based on the size of GL_FLOAT
which is just the type constant (0x1406). Change it to use the
size of GLfloat.
Reviewed-by: Brian Paul <brianp@vmware.com>
"st/mesa: overhaul vertex setup for clearing, glDrawPixels, glBitmap"
added a vertex shader declaring IN[0] and IN[2], but not IN[1].
Drivers relying on tgsi_shader_info can't handle holes in declarations,
because tgsi_shader_info doesn't track that.
This is just a quick workaround meant for stable that will work for vertex
shaders.
This fixes radeonsi DrawPixels and CopyPixels crashes.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
So the result is of float type if we're implementing the float
overload of imageAtomicExchange. This is the only back-end change
required to support OES_shader_image_atomic AFAICT.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is basically just the same atomic functions exposed by
ARB_shader_image_load_store, with one exception:
"highp float imageAtomicExchange(
coherent IMAGE_PARAMS,
float data);"
There's no float atomic exchange overload in the original
ARB_shader_image_load_store or GL 4.2, so this seems like new
functionality that requires specific back-end support and a separate
availability condition in the built-in function generator.
v2: Move image availability predicate logic into a separate static
function for clarity. Had to pull out the image_function_flags
enum from the builtin_builder class for that to be possible.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Descriptor pools are an optimization that lets applications allocate
descriptor sets through an externally synchronized object (that is,
unlocked). In our case it's also plugging a memory leak, since we
didn't track all allocated sets and failed to free them in
vkResetDescriptorPool() and vkDestroyDescriptorPool().
Actually OP_SELP doesn't need to be a compare instruction. Instead we
just need to set the NOT modifier when building the instruction.
While we are at it, fix the dst register type and use a GPR.
Suggested by Ilia Mirkin.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Silences a warning reported by the svga3d device.
v2: also null-out the index buffer pointer
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
I've had people ask about the design of the pack functions, for example,
why aren't we using bitfields. I wrote up a bit of background on why and
how we ended up with the current design and we might as well keep that
with the code.
As with anv_CmdCopyBufferToImage, compressed textures require special
handling during copies.
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
This extension is identical to GL_OES_texture_border_clamp. But dEQP has
tests that want the EXT variant.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Only minor differences to the existing ARB_texture_border_clamp support.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This gives us the chance to pack the binding table down to just what the
shaders actually need. Some applications use very large descriptor sets
and only ever use a handful of entries. Compacted binding tables should be
much more efficient in this case. It comes at the down-side of having to
re-emit binding tables every time we switch pipelines, but that's
considered an acceptable cost.
We ignore unused dimensions in the isl surface; do the same for the
resulting anv_image.
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
"Result is the same as computing the sum of the absolute values of
OpDPdx and OpDPdy on P."
We were doing sum of absolute values of OpDPdx of P and OpDPdx of NULL.
We don't need to look at the stage flags, as we don't really support any
fine-grained, stage-level synchronization. We have to do two
PIPE_CONTROLs in case we're both flushing and
invalidating.
Additionally, if we do end up doing two PIPE_CONTROLs, the first,
flusing one also has to stall and wait for the flushing to finish, so we
don't re-dirty the caches with in-flight rendering after the second
PIPE_CONTROL invalidates.
We would assert on unused dimensions (eg extent.depth for
VK_IMAGE_TYPE_2D) not being 1, but the specification doesn't put any
constraints on those. For example, for VK_IMAGE_TYPE_1D:
"If imageType is VK_IMAGE_TYPE_1D, the value of extent.width must be
less than or equal to the value of
VkPhysicalDeviceLimits::maxImageDimension1D, or the value of
VkImageFormatProperties::maxExtent.width (as returned by
vkGetPhysicalDeviceImageFormatProperties with values of format,
type, tiling, usage and flags equal to those in this structure) -
whichever is higher"
We'll fix up the arguments to isl to keep isl strict in what it expects.
We need to make sure GEM understands that we're writing to the BO, in
case it needs to synchronize with other rings (blitter use in display
server, for example).
This reverts commit c136672c59.
We still have the intermittent missing flush for VkEvent in certain
vulkancts cases:
piglit.deqp-vk.api.command_buffers.execute_large_primary
piglit.deqp-vk.api.command_buffers.submit_count_non_zero,
Let's reenable the snooping until we figure out the root cause.
Change the name of the .so to libvulkan_intel.so and add an installable
icd with the installed paths. Keep the icd file with build-tree paths,
but rename to dev_icd.json to make it clear that it's for development
purposes.
These were originally added to reduce compiler warnings but aren't really
needed. Getting rid of them reduces the diff between the Vulkan branch and
master, so we might as well.
The race we were seeing on cherryview was caused by the multi-submit
problem with fences. We can now turn snooping off again an rely on
clflush and we intended.
We hash the input SPIR-V, specialization constants, entrypoint and the
shader key using SHA1 to determine a unique identifier for the
combination. A VkPipelineCache is then a hash table mapping these
identifiers to the corresponding prog_data and kernel data.
Long ago, the blit code used to handle clearing and blitting.
- Fix any comments that refer to clearing.
- Rename shader var 'attr' to 'tex_pos'. The name 'attr' is an artifact
of the time when the shader was used for blitting as well as clearing.
The clear code lived in anv_meta_clear.c. The resolve code in
anv_meta_resolve.c. Only the blit code lived in anv_meta.c, alongside
the shareed meta code.
This is just a copy-paste patch. No change in behavior.
Aparently there are some issues in symbol resolution if an application
packages its own loader and you have a system-installed one. I don't
really understand the details, but it's not onorous to add.
The immediate write from PIPE_CONTROL is 64-bits at least on BDW. This
used to work on 64-bit archs because the compiler would align the following
anv_state struct up for us. However, in 32-bit builds, they overlap and it
causes problems.
A few packets have dwords in them that are all MBZ and we failed to
write those. This change makes sure we iterate through all dwords and
write them all.
The sub-determinate implementation pattern fixed by
6a7e2904e0 has a second instance in the
same file.
With the previous algorithm, when row and j are both 3, the index
overruns the array. This only impacts the stack on 32 bit builds.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This huge commit switches us over to using a simple xml format (genxml)
for defining our command streamer commands and a python script for
generating the pack headers we use in the driver.
The tests assertion-failed in vkCmdClearDepthStencilImage because the
isl surface lacked ISL_SURF_USAGE_DEPTH_BIT.
Fixes: https://gitlab.khronos.org/vulkan/mesa/issues/26
Fixes: dEQP-VK.pipeline.timestamp.transfer_tests.host_stage_with_clear_depth_stencil_image_method
Fixes: dEQP-VK.pipeline.timestamp.transfer_tests.transfer_stage_with_clear_depth_stencil_image_method
Don't translate VkImageCreateInfo::usage into an isl_surf_usage bitmask.
Instead, translate anv_image::usage, which is a superset of
VkImageCreateInfo::usage.
For-Issue: https://gitlab.khronos.org/vulkan/mesa/issues/26
Previously, in order to get things working, we just always shadowed
variables. Now, we rewrite derefs whenever it's safe to do so and only
shadow if we have an in or out variable that we write or read to
respectively.
Parameterize build_asin() on the fit coefficients so the
implementation can be shared while still using different polynomials
for asin and acos. Also switch back to implementing acos in terms of
asin -- The improvement obtained from cancelling out the pi/2 terms
was negligible compared to the approximation error.
vtn_handle_type() creates a signed type regardless of the value of the
signedness flag, which usually doesn't make much of a difference
except when the type is used as base sampled type of an image type,
what will cause the base type of the NIR image variable to be
inconsistent with its format and cause an assertion failure in the
back-end (most likely only reproducible on Gen7), and may change the
semantics of the image intrinsic subtly (e.g. UMIN may become IMIN).
We use the simple batch helper to submit a batch at driver startup time
which holds all the state that never changes. We don't have a whole lot
and once we enable tesselation there'll be even less. Even so, it's a
simple mechanism and reduces our steady state batch sizes a bit.
The gen7 pipeline has a useful helper function for this, let's use it in
gen8_pipeline.c too. The gen7 function has an off-by-one bug though: we
have to compute log2(size / 1024) - 1, but we divide by 2048 instead so
as to avoid the case where size is less than 1024 and we'd return -1.
anv_image::needs_sampler_surface_state was a redundant member,
identical to (usage & VK_IMAGE_USAGE_SAMPLED_BIT). Likewise for the
other needs_* members.
When generating a sub-determinate matrix, a 3-element swizzle array was
indexed with clever inline boolean logic. Unfortunately, when i and j
are both 3, the index overruns the array, smashing the next variable on
the stack.
For 64 bit builds, the alignment of the 3-element unsigned array leaves
32 bits of spacing before the next local variable, hiding this bug. On
i386, a subcolumn pointer was smashed then dereferenced.
anv_descriptor_set_destroy uses the descriptor sets's set_layout member
to iterate the set's buffer views. However, the set_layout reference
may have previously been freed.
On 64 bit builds, this bug generated valgrind errors but did not affect
CTS test results. On 32 bit builds, it reliably produces assertions and
memory corruption.
This is kind-of silly. We *really* need to do a better job of making sure
all objects have all their default values set. We probably also want to,
eventually, put everything into the BO (to save memory) and, more
specifically, make the GPU write the "ready" flag. That way GetFenceStatus
won't ever have to call into the kernel.
After a long discussion with Eric Anholt and Owen Taylor, I learned that
X11 is basically always sRGB as that's what the scanout hardware does and X
doesn't modify anything. Therefore, we should just always expose sRGB
formats.
The GPU does most of this for us as long as we set up tight bounds for
the buffers, which we do. Additionally, we range check dynamically
buffers in the shader. With that it's safe to turn on robustBufferAccess.
We don't actually support that format yet because ISL doesn't have an enum
for it. We need to beef up the formats table to allow for tiled-only
formats.
There's an intermittent flushing problem with VkEvent that we need to
root cause. For now, using the snooping feature keeps the memory pools
up to date with GPU writes and fixes the problem.
We can't use the more fine-grained load and store fence commands (lfence
and mfence), since clflush is only guaranteed to be ordered with respect
to mfence.
The attachments should be const, but the driver's function signatures
are generally not const-friendly.
Drop the const because it conflicts with upcoming
anv_cmd_buffer_resolve_subpass().
The adjusted polynomial coefficients come from the numerical
minimization of the L2 norm of the relative error. The old
coefficients would give a maximum relative error of about 15000 ULP in
the neighborhood around acos(x) = 0, the new ones give a relative
error bounded by less than 2000 ULP in the same neighborhood.
SKL has a workaround which requires either some weird programming of buffer 3,
OR, just never using buffer 0. Since we don't actually use multiple constant
buffers, it's easier to just not use 0.
Only SKL requires this workaround, but there is no harm in applying it to all
platforms. The big change here is that buffer #0 is relative to dynamic state
base normally (depending upon ISTPM), where buffer 1-3 is a GPU virtual address.
Remove all the fine-grained cleanup in
anv_device_init_meta_clear_state(). Instead, if anything fails during
initialization, simply call anv_device_finish_meta_clear_state() and let
it clean up the partially initialized anv_meta_state::clear.
This handles multisample color images that have a floating-point or
normalized format and have a single array layer.
This does not yet handle integer formats nor multisample array images.
As far as I can tell, this patch sets all pipeline multisample state
except:
- alpha to coverage
- alpha to one
- the dispatch count for per-sample dispatch
The dEQP "precision" test tries to verify that the reference functions
float sinh(float a) { return ((exp(a) - exp(-a)) / 2); }
float cosh(float a) { return ((exp(a) + exp(-a)) / 2); }
float tanh(float a) { return (sinh(a) / cosh(a)); }
produce the same values as the built-ins. We simplified away the
multiplication by 0.5 in the numerator and denominator, and apparently
this causes them not to match for exactly 1 out of 13,632 values.
So, put it back in, fixing the test, but making our code generation
(and precision?) worse.
The extent previously was supposed to match the mip at a given level
under the assumption that the base address would be that of the mip
as well.
Now however, the base address only matches the offset of the
containing tile. Therefore, enlarge the extent to match that of
phys_slice0, so that we don't draw/fetch in out of bounds territory.
This solution isn't perfect because the base adress isn't always at
the first tile, therefore the assumed valid memory region by the HW
contains some number of invalid tiles on two edges.
Use a custom VkBufferImageCopy with the user-provided struct as
the base. A few fields are modified when the iview is uncompressed
and the underlying image is compressed.
When creating an uncompressed ImageView on an compressed Image, the
SurfaceFormat is updated to match the ImageView's. The surface
dimensions must also change so that the HW sees the same size image
instead of a 4x larger one.
Fixes the following error which results from running many VulkanCTS
compressed tests in one shot:
ResourceError (vk.queueSubmit(queue, 1, &submitInfo, *m_fence):
VK_ERROR_OUT_OF_DEVICE_MEMORY at
vktPipelineImageSamplingInstance.cpp:921)
Makes all compressed format tests with a height > 1 pass.
Aligns with formula's presented in Vulkan spec concerning CopyBufferToImage.
18.4 Copying Data Between Buffers and Images
This won't conflict with valid API usage, because:
1) Users are not allowed to create an uncompressed ImageView with a
compressed Image.
see: VkSpec - 11.5 Image Views - VkImageViewCreateInfo's Valid Usage box
2) If users create a differently formatted compressed ImageView with a
compressed Image, the block dimensions will still match.
see: VkSpec - 28.3.1.5 Format Compatibility Classes - Table 28.5
For an uncompressed ImageView of a compressed Image, the
dimensions and offsets are all divided by the appropriate
block dimensions.
We are not yet using an uncompressed ImageView for a
compressed Image, but will do so in a future commit.
Enable ETC support for BDW+. In Vulkan, an array lookup on
surface_format[] is used to determine HW support for certain
formats. In contrast, Mesa dynamically populates an array
which reports this information.
3D surfaces in Skylake are stored with ISL_DIM_LAYOUT_GEN4_2D. Any
delta in the logical z offset causes an equivalent delta in the
surface's array layer.
Test isl_surf_get_image_intratile_offset_el() in the tests:
test_bdw_2d_r8g8b8a8_unorm_512x512_array01_samples01_noaux_tiley0
test_bdw_2d_r8g8b8a8_unorm_1024x1024_array06_samples01_noaux_tiley0
When calculating row pitch, the row's width in samples must be divided
by the format's block width. The commit below accidentally removed the
division.
commit eea2d4d059
Author: Chad Versace <chad.versace@intel.com>
Date: Tue Jan 5 14:28:28 2016 -0800
Subject: isl: Don't align phys_slice0_sa.width twice
The if/then/else block was bogus, as it can only take a scalar
condition, and we need to select component-wise. The GLSL IR
implementation of atan2 handles this by looping over components,
but I decided to try and do it vector-wise, and messed up.
For now, just bcsel. It means that we do the atan1 math even if
all components hit the quick case, but it works, and presumably
at least one component will hit the expensive path anyway.
We were botching this for negative numbers - floor of a negative rounds
the wrong way. Additionally, both results are supposed to retain the
sign of the original.
To fix this, just take the abs of both values, then put the sign back.
There's probably a better way to do this, but this works for now.
The table has this marked as unsupported on all gens, but I don't really
believe that given how early it is in the table. I've tested and it seems
to work on Broadwell. The Bspec says that it sould be renderable on SKL+
but alpha blending is questionable.
Side note: We really need to audit the format table again.
The X component of the offset is set to the layer index times layer
height which is obviously bogus, return the vertical offset of the
slice as Y component instead. Fixes a few image load/store tests that
use 1D arrays on SKL when forcing it to fall back to untyped reads and
writes.
It's mostly the same and contains some non-trivial logic, so it really
should be shared. Also, we're about to make some modifications here that
we would really like to share.
For unspills (scratch reads), we can just set WE_all all the time because
we always unspill into a new GRF. For spills, we have two options: If the
instruction has a 32-bit-per-channel destination and "normal" regioning,
then we just do a regular write and it will interleave channels from
different control-flow paths properly. If, on the other hand, the the
regioning is non-normal, then we have to unspill, run the instruction, and
spill afterwards. In this second case, we need to do the spill with
we_ALL.
Perform a copy when the copy_size matches the HW limit (max_copy_size).
Otherwise the current behavior is that we fail the following assertion:
assert(height < max_surface_dim);
because the values are equal.
MinimumArrayElement carries the same meaning for BDW and SKL.
Suggested by Jason.
No regressions in dEQP-VK.pipeline.image.view_type.cube_array.*
Fixes a number of cube tests, including cube_array_base_slice
and cube_base_slice tests.
This makes it act like the address mode is set to TEXCOORDMODE_CUBE
whenever this sampler is combined with a cube surface. This *should* be
what we need for Vulkan. Interestingly, the PRM contains a programming
note for this field that says simply, "This field must be set to
CUBECTRLMODE_PROGRAMMED". However, emprical evidence suggests that it does
what the PRM says it does and OVERRIDE is just fine.
These fields are ignored for non-cube surfaces. For cube surfaces
these fields should be enabled when using TEXCOORDMODE_CLAMP and
TEXCOORDMODE_CUBE.
TODO: Determine if these are the only two modes used in Vulkan.
Some fields of the surface state template were dependent on the
surface type, which is dependent on the usage of the image view, which
wasn't known until the bottom of the function after the template had
been constructed. This caused failures in all image load/store CTS
tests using cubemaps. Refactor the surface state calculation into a
function that is called once for each required usage.
We don't want to do this in the long-run but it's needed for passing the
NoContraction tests at the moment. Eventually, we want to plumb this
through NIR properly.
Relocations are 64 bits on Gen8+. Most CTS tests that send
non-trivial work to the GPU would fail when run from a single deqp-vk
invocation because they were effectively relying on reloc presumed
offsets to be wrong so the kernel would come and apply relocations
correctly.
Before we were asuming that a deref would either be something in a block or
something that we could pass off to NIR directly. However, it is possible
that someone would choose to load/store/copy a split structure all in one
go. We need to be able to handle that.
Previously, we were creating nir_deref's immediately. Now, instead, we
have an intermediate vtn_access_chain structure. While a little more
awkward initially, this will allow us to more easily do structure splitting
on-the-fly.
If we're continuing a render pass, make sure we don't emit the depth and
stencil buffer addresses before we set the state base addresses.
Fixes crucible func.cmd-buffer.small-secondaries
This currently sets the base and size of all push constants to the
entire push constant block. The idea is that we'll use the base and size
to eventually optimize the amount we actually push, but for now we don't
do that.
This allows us to first generate atomic operations for shared
variables using these opcodes, and then later we can lower those to
the shared atomics intrinsics with nir_lower_io.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Previously we were receiving shared variable accesses via a lowered
intrinsic function from glsl. This change allows us to send in
variables instead. For example, when converting from SPIR-V.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Phi handling is somewhat intrinsically tied to the CFG. Moving it here
makes it a bit easier to handle that. In particular, we can now do SSA
repair after we've done the phi node second-pass. This fixes 6 CTS tests.
This is a port of Matt's GLSL IR lowering pass to NIR. It's required
because we translate SPIR-V directly to NIR, bypassing GLSL IR.
I haven't introduced a lower_ldexp flag, as I believe all current NIR
consumers would set the flag. i965 wants this, vc4 doesn't implement
this feature, and st_glsl_to_tgsi currently lowers ldexp
unconditionally anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
struct.pack('i', val) interprets `val` as a signed integer, and dies
if `val` > INT_MAX. For larger constants, we need to use 'I' which
interprets it as an unsigned value.
This patch makes us use 'I' for all values >= 0, and 'i' for negative
values. This should work in all cases.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is untested and probably broken.
We already passed the atan2 CTS tests before implementing this opcode.
Presumably, glslang or something was giving us a plain Atan opcode
instead of Atan2. I don't know why.
We were diligently setting Wayland buffers as non-busy, but nowhere in
the code did we set them to busy when submitted to the server. This
meant that acquire_next_image would only ever find the same buffer in
a loop, over and over.
Signed-off-by: Daniel Stone <daniels@collabora.com>
In acquire_next_image, we are waiting for a wl_buffer::release to arrive
and release one of the buffers in our swapchain. Most compositors don't
explicitly flush release events, so we may need to perform a roundtrip
instead, to ensure the event arrives.
Signed-off-by: Daniel Stone <daniels@collabora.com>
SPIR-V only has one ImageQuerySize opcode that has to work for both
textures and storage images. Therefore, we have to special-case that one a
bit and look at the type of the incoming image handle.
Image intrinsics always take a vec4 coordinate and always return a vec4.
This simplifies the intrinsics a but but also means that they don't
actually match the incomming SPIR-V. In order to compensate for this, we
add swizzling movs for both source and destination to get the right number
of components.
The Vulkan spec requires all allocations that happen for device creation to
happen with SCOPE_DEVICE. Since meta calls into other things that allocate
memory, the easiest way to do this is with an allocator.
The border color packet is specified as a 64-byte aligned address relative
to dynamic state base address. The way the packing functions are currently
set up, we need to provide it with (offset >> 6) because it just shoves the
bits in where the PRM says they go and isn't really aware that it's an
address.
The __gen_fixed helper properly clamps the value and also handles negative
values correctly. Eventually, we need to make the scripts generate this
and use it for more things.
Instead of emitting the continue before the loop body we emit it
afterwards. Then, once we've finished with the entire function, we run
nir_repair_ssa to add whatever phi nodes are needed.
The efficiency should be approximately the same. We do a little more work
per phi node because we have to sort the predecessors. However, we no
longer have to walk the blocks a second time to pop things off the stack.
The bigger advantage, however, is that we can now re-use the phi placement
and per-block SSA value tracking in other passes.
Right now, we have phi placement code in two places and there are other
places where it would be nice to be able to do this analysis. Instead of
repeating it all over the place, this commit adds a helper for placing all
of the needed phi nodes for a value.
genX_image_view_init allocates up to 3 separate SURFACE_STATE structures,
and populates each from a single template. Stop mutating the template
between each final SURFACE_STATE.
SF_CLIP_VIEWPORT does not clamp Z values. It only scales and shifts
them. Clamping to VkViewport::minDepth,maxDepth is instead handled by
CC_VIEWPORT.
Fixes dEQP-VK.renderpass.simple.depth on Broadwell.
vkCmdBeginRenderPass, vkCmdNextSubpass, and vkBeginCommandBuffer with
VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT, all *setup* the
command buffer for recording commands for some subpass. But only the
first two, vkCmdBeginRenderPass and vkCmdNextSubpass, can *start*
a subpass.
Therefore, calling anv_cmd_buffer_begin_subpass() inside
vkCmdBeginCommandBuffer is misleading. Clarify its purpose by renaming
it to anv_cmd_buffer_set_subpass() and adding comments.
This should improve cache residency for render targets.
Pre-patch, vkCmdBeginRenderPass emitted all the meta clears for
VK_ATTACHMENT_LOAD_OP_CLEAR before any subpass began. Post-patch,
vCmdBeginRenderPass and vkCmdNextSubpass emit only the clears needed for
that current subpass.
This prepares for moving the clear ops from the start of the render pass
into each subpass.
Pipeline N will be used to clear color attachment N of the current
subpass. Currently meta color clears still create a throwaway subpass
with exactly one attachment, so currently only pipeline 0 is used.
This is an ugly hack to workaround the compiler's current inability to
dynamically set the render target index in the render target write
message.
Add anv_graphics_pipeline_create_info::color_attachment_count. If
non-negative, then it overrides the color attachment count in the
pipeline's subpass. Useful for meta. (All the hacks for meta!)
The functions will soon handle clears unrelated to
VK_ATTACHMENT_LOAD_OP_CLEAR, namely vkCmdClearAttachments. So remove
"load" from their name:
emit_load_color_clear -> emit_color_clear
emit_load_depthstencil_clear -> emit_depthstencil_clear
Rewrite anv_cmd_buffer_clear_attachments, which emits the top-of-pass
clears, to use the data provided in anv_cmd_state::attachments. This
prepares for deferring each attachment clear to the first subpass that
uses the attachment.
This array contains attachment state when recording a renderpass instance.
It's populated on each call to anv_cmd_buffer_set_pass.
The data is currently set but unused. We'll use it later to defer each
attachment clear to the subpass that first uses the attachment.
We were just hard-coding everything to a vec4. This meant we weren't
handling shadow samplers at all and integer things were getting the wrong
return type.
If its the command buffer's first call to vkBeginCommandBuffer, we must
*initialize* the command buffer's state. Otherwise, we must *reset* its
state. In both cases, let's use anv_ResetCommandBuffer.
From the Vulkan 1.0 spec:
If a command buffer is in the executable state and the command buffer
was allocated from a command pool with the
VK_COMMAND_POOL_CREATE_RESET_COMMAND_BUFFER_BIT flag set, then
vkBeginCommandBuffer implicitly resets the command buffer, behaving
as if vkResetCommandBuffer had been called with
VK_COMMAND_BUFFER_RESET_RELEASE_RESOURCES_BIT not set. It then puts
the command buffer in the recording state.
vkResetCommandPool currently destroys its command buffers. The Vulkan
1.0 spec requires that it only reset them:
Resetting a command pool recycles all of the resources from all of
the command buffers allocated from the command pool back to the
command pool. All command buffers that have been allocated from the
command pool are put in the initial state.
SPIR-V makes a distinction between "modulus" and "remainder" for both
floating-point and signed integer variants. The difference is primarily
one of which source they take their sign from. The "remainder" opcode for
integers is equivalent to the C/C++ "%" operation while the "modulus"
opcode is more mathematically correct (at least for an unsigned divisor).
This commit adds corresponding opcodes to NIR.
We have an issue with occlusion queries (PIPE_CONTROL depth writes)
after using the pipeline with the VS disabled. We work around it by
using a depth cache flush PIPE_CONTROL before doing a depth write.
Fixes dEQP-VK.query_pool.*
The original objective was to disallow UBO and SSBO variables from the
variable lists. This was accidentally broken in b208620fd when fixing some
other interface issues.
Connor's original shallow-copy plan works great except that a couple of the
decorations apply to a matrix which may be some levels down in an array.
We weren't properly unpacking that. This fixes most of the remaining SSBO
and UBO layout tests.
It is possible to end up with unreachable blocks if, for instance, you have
an "if (...) { break; } else { continue; } unreachable()". In this case,
the unreachable block does not show up in the dominance tree so it never
gets visited. Instead, we go and visit all of those in follow-on pass.
The helper didn't help much. It looks like a leftover from past
code-reuse. Now it's called from exactly one location,
gen7_CmdBeginRenderPass(). So fold it into its caller.
This exposes a case where we want to anv_CmdCopyBufferToImage() on an
image that wasn't created with VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT and
end up using uninitialized color_rt_surface_state from the meta image
view.
If VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT is not set in
pBeginInfo->flags, we don't have a render pass or framebuffer. Change
the condition that guard looking up render pass and framebuffer to test
for VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT instead of
VK_COMMAND_BUFFER_LEVEL_SECONDARY.
Fixes all remaining crashes in dEQP-VK.api.command_buffers.*.
These are working as well as Broadwell and Cherryiew. The recent merge
from mesa master brings in Kabylake device info and that should be all
we need to enable that.
This is needed because compute push constant data is replicated per
invocation. For gen7, this can be up to 64. With a push constant data
max of 128 bytes, this is 8k of data. We need additional space for
local-id payloads, so we are going with 16k for now.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
For gen9 SURFTYPE_CUBE, the RENDER_SURFACE_STATE's Depth,
MinimumArrayElement, and RenderTargetViewExtent is in units of full
cubes and so must be divided by 6.
Fixes 'dEQP-VK.pipeline.image.view_type.cube_array.cube_array.*'.
Now all of 'dEQP-VK.pipeline.image.*' passes.
Drop the temporary variables for RENDER_SURFACE_STATE's Depth and
RenderTargetViewExtent. Instead, assign them in-place.
This simplifies the next commit, which fixes gen9 cube surfaces.
SKL needs this to make sure we flush the push constants. It gets a
little tricky, since we also need to emit binding tables before push
constants, since that may affect the push constants (dynamic buffer
offsets and storage image parameters). This patch splits emitting
binding tables from emitting the pointers so that we can emit push
constants after binding tables but before emitting binding table
pointers.
When looping through VkBufferImageCopy regions, for each region we
incremented the offset into the VkBuffer assuming the format size was 4.
Fixes CTS tests dEQP-VK.pipeline.image.view_type.cube_array.3d.* on
Skylake.
For tiled 3D surfaces, the array pitch must aligned to the tile height.
From the Skylake BSpec >> RENDER_SURFACE_STATE >> Surface QPitch:
Tile Mode != Linear: This field must be set to an integer multiple of
the tile height
Fixes CTS tests 'dEQP-VK.pipeline.image.view_type.3d.format.r8g8b8a8_unorm.*'.
Fixes Crucible tests 'func.miptree.r8g8b8a8-unorm.aspect-color.view-3d.*'.
Previously, we were setting it to true at the top of the switch statement.
However, this causes all of the cases to get executed until you hit a
break. Instead, you want to be not executing at the start, start executing
when you hit your case, and end at a break.
QPitch is usually expressed as rows of surface elements (where a surface
element is an compression block or a single surface sample. Skylake 1D
is an outlier; there QPitch is expressed as individual surface
elements.
For 1D surfaces and for surfaces with Yf or Ys tiling, the hardware
ignores SurfaceVerticalAlignment and SurfaceHorizontalAlignment.
Moreover, the anv_halign[] and anv_valign[] lookup tables may not even
contain the surface's actual alignment values. So don't do the lookup
for those surfaces.
Instead of trying to crawl through predecessor chains and build phi nodes,
we just do a poor-man's out-of-ssa on the spot. The into-SSA pass will
deal with putting the actual phi nodes in for us.
This is not really a cache yet, but it allows us to share one state
stream for all pipelines, which means we can bump the block size without
wasting a lot of memory.
gen7_filter_tiling() should filter out only tiling flags that are
incompatible with the surface. It shouldn't make performance decisions,
such as forcing linear for 1D; that's the role of the caller.
struct isl_format_layout contained two near-redundant members: bpb (bits
per block) and bs (block size). There do exist some hardware formats for
which bpb != 8 * bs, but Vulkan does not use them. Therefore we don't
need bpb.
Previously, anv_image_view had a anv_format pointer that we used for
everything. This commit replaces that pointer with a VkFormat enum copied
from the API and an isl_format. In order to implement RGB formats, we have
to use a different isl_format for the actual surface state than the obvious
one from the VkFormat. Separating the two helps us keep things streight.
It now calls get_isl_format to get both linear and tiled views of the
format and determines linear/tiled properties from that. Buffer properties
are determined from the linear format.
This commit does two things. First, it introduces choose_* functions for
chosing formats and aspects. Second, it changes the copy (not blit) code
to use appropreately sized UINT formats for everything except depth. There
are two main reasons for this: First, it means that compressed and other
non-renderable texture upload should "just work" because it won't be
tripping over non-renderable formats. Second, it allows us to easly copy
an RGB buffer to and from an RGBX image because the formats will get
switched over to their UINT variants and the shader will deal with the
extra channel for us.
We've been trying to move away from anv_format for a while and this should
help with the transition. There are cases (mostly in meta) where we need
the original format for the image and not the isl_format. These will be
moved over to the new vk_format and everythign else will use the isl_format
from the particular anv_surface.
Many formats are not power-of-two bytes per pixels and we need the
non-power-of-two align macro here.
This reverts the revert from 4f9a211b, but keeps the change from a827b553
that fixed the yuv if-else mix-up.
This avoids making a lot of small allocations and handles allocation
failure correctly.
Fixes dEQP-VK.api.object_management.alloc_callback_fail.* failures.
Failure during instance creation will leave instance->wayland_wsi
undefined. When we then try to clean that up we crash. Set
instance->wayland_wsi to NULL on failure and only clean it up if it's
non-NULL.
Fixes part of dEQP-VK.api.object_management.alloc_callback_fail.*
isl always aligned the row pitch to the surface's image alignment. This
was sometimes wrong when the surface backed a VkBuffer. For a VkBuffer,
the surface's row pitch is set by VkBufferImageCopy::bufferRowLength,
whose required alignment is only that of the VkFormat.
In particular, VkBuffer rows are packed in many dEQP and Crucible tests.
And packed rows are rarely aligned to the surface's image alignment.
Fixes: dEQP-VK.pipeline.image.view_type.2d.format.r8g8b8a8_unorm.size.13x13
They were completely bogus before. For one thing, OpDecorationGroup
created a value of type undef rather than decoration_group. Also
OpGroupMemberDecorate didn't properly apply the decoration to the different
members of the different groups. It *should* be correct now but there's no
good way to test it yet.
The kernel is going to give us whole pages anyway, so allocating part of a
page doesn't help. And this ensures that we can always work with whole
pages.
As per the spec:
minMemoryMapAlignment is the minimum required alignment, in bytes, of
host-visible memory allocations within the host address space. When
mapping a memory allocation with vkMapMemory, subtracting offset bytes
from the returned pointer will always produce a multiple of the value of
this limit.
First off, it now uses isl formats instead of anv_format. Also, it
properly handles integer vs. floating-point default channels and can
properly handle alpha-only channels. (Not sure if those are allowed).
Thanks to the addition of nir_clone, we now have a num_elements field in
nir_constant which we weren't setting. Also, constants have to be parented
to the variable they initialize, so we have to make a copy.
This is done by passing the entrypoint name into spirv_to_nir. It will
then process the shader as if that were the only entrypoint we care about.
Instead of returning a nir_shader, it now returns a nir_function.
When aligning to isl_format_layout::bs (which is the number of bytes in
the pixel), use isl_align_npot() instead of isl_align(), because
isl_align() works only for power-of-2 alignment.
Fixes assertion in
dEQP-VK.pipeline.image.view_type.1d.format.r16g16b16_sfloat.size.512x1.
This reverts commit b33f5d3889,
and also removes the (empty) case statements for the new built-ins.
It doesn't look like glslang has updated yet, so updating the header
just breaks everything, as we no longer agree on opcode numbers.
If we're going to hav valgrind verify state streams then we need to ensure
that once we choose a pointer into a block we always use that pointer until
the block is freed. I was trying to do this with the "current_map" thing.
However, that breaks down because you have to use the map from the block
pool to get to the stream_block to get at current_map. Instead, this
commit changes things to track the stream_block by pointer instead of by
offset into the block pool.
When I first did the valgrindifying for stream allocators, I misunderstood
some things about valgrind's expectations for NOACCESS and UNDEFINED.
First off, valgrind expects things to be marked NOACCESS before you
allocate out of them. Since our blocks came from a pool backed by a
mmapped memfd, they came in as UNDEFINED; we needed to mark them as
NOACCESS. Also, I didn't realize that VALGRIND_MEMPOOL_CHANGE only updated
the mempool allocation state and didn't actually change definedness; we had
to add a VALGRIND_MAKE_MEM_UNDEFINED to get rid of the NOACCESS on the
newly allocated portion.
You can't just add a new source to a phi because use/def information won't
get updated properly. Instead, you have to use one of the core helpers.
Some day, we may want to add a nir_phi_instr_add_src helper.
Previously, nir_dominance.c didn't properly handle unreachable blocks.
This can happen if, for instance, you have something like this:
loop {
if (...) {
break;
} else {
break;
}
}
In this case, the block right after the if statement will be unreachable.
This commit makes two changes to handle this. First, it removes an assert
and allows block->imm_dom to be null if the block is unreachable. Second,
it properly skips unreachable blocks in calc_dom_frontier_cb.
The current data structure doesn't handle much that we couldn't handle
before. However, this will be absolutely crucial for doing swith
statements. Also, this should fix structured continues.
anv_block_pool_init calls anv_block_pool_grow which checks
device->info.has_llc to see if it needs to set caching parameters.
If we don't set device->info early enough, this reads an undefined value
which is probably 0 and not what we want on llc platforms.
Found with valgrind.
This is kind of a hack, but it makes vtn_ssa_value insert a load if the
value requested is actually a deref. This shouldn't happen normally but,
thanks to the impedence mismatch of the NIR function parameter model vs.
the SPIR-V model, this can happen for function arguments.
Otherwise, we have a problem when we go to print functions with arguments
because their names get added to the hash table during declaration which
happens after we print the prototype.
Previously this wasn't a problem. However, with the new API update,
descriptor sets can now be sparse so the client doesn't have to provide an
entry for every binding. This means that it's possible for a binding to be
uninitialized other than the memset. In that case, we want to have a null
array of immutable samplers.
Now that we have a helper in the builder for system values and a helper in
core NIR to get the intrinsic opcode, there's really no point in having
things split out into a helper function. This commit "modernizes" this
pass to use helpers better and look more like newer passes.
The function calculates the offset to a subimage within the surface, in
units of surface samples.
All unit tests pass with `make check`. (Admittedly, though, there are
too few unit tests).
The plan all along was to eventualyl move isl out of the Vulkan
directory, because I intended i965 and anvil to share it.
A small problem I encountered when attempting to write unit tests for
isl precipitated the move. I discovered that it's easier to get isl
unit tests to build if I remove the extra, unneeded dependencies
injected by src/vulkan/Makefile.am. And the easiest way to remove those
unneeded dependencies is to move isl out of src/vulkan. (Unit tests come
in subsequent commits).
This is an artifact of the way the separate samplers/textures series ended
up getting sent out and rebased. This should fix a number of CTS tests
involving geometry shaders.
The code generated may be vec4 or simd8 depending on how we start the
compiler.
To run the GS in SIMD8, set the INTEL_SCALAR_GS environment variable.
This was added in:
commit 36fd653817
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Wed Mar 11 23:14:31 2015 -0700
i965: Add scalar geometry shader support.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Unfortunately, this also means that we need to use a slightly different
algorithm for assign_constant_locations. The old algorithm worked based on
the assumption that each read of a uniform value read exactly one float.
If it encountered a MOV_INDIRECT, it would immediately bail and push the
whole thing. Since we can now read ranges using MOV_INDIRECT, we need to
be able to push a series of floats without breaking them up. To do this,
we use an algorithm similar to the on in split_virtual_grfs.
This commit moves us to an instruction based model rather than a
register-based model for indirects. This is more accurate anyway as we
have to emit instructions to resolve the reladdr. It's also a lot simpler
because it gets rid of the recursive reladdr problem by design.
One side-effect of this is that we need a whole new algorithm in
move_uniform_array_access_to_pull_constants. This new algorithm is much
more straightforward than the old one and is fairly similar to what we're
already doing in the FS backend.
Now that we have MOV_INDIRECT opcodes, we have all of the size information
we need directly in the opcode. With a little restructuring of the
algorithm used in assign_constant_locations we don't need param_size
anymore. The big thing to watch out for now, however, is that you can have
two ranges overlap where neither contains the other. In order to deal with
this, we make the first pass just flag what needs pulling and handle
assigning pull constant locations until later.
Instead of using reladdr, this commit changes the FS backend to emit a
MOV_INDIRECT whenever we need an indirect uniform load. We also have to
rework some of the other bits of the backend to handle this new form of
uniform load. The obvious change is that demote_pull_constants now acts
more like a lowering pass when it hits a MOV_INDIRECT.
While we're at it, we also add support for the possibility that the
indirect is, in fact, a constant. This shouldn't happen in the common case
(if it does, that means NIR failed to constant-fold something), but it's
possible so we should handle it.
This is what image_view does. Also, we really need to do this so that we
can properly handle the combined offsets from the buffer and from
pCreateInfo.
This fixes some of the nonzero offset buffer view CTS tests.
When building RENDER_SURFACE_STATE, the driver set
SurfaceType = anv_image::surface_type, which was calculated during
anv_image_init(). This was bad because the value of
anv_image::surface_type was taken from a gen-specific header,
gen8_pack.h, even though the anv_image structure is used for all gens.
Replace anv_image::surface_type with a gen-specific lookup function,
anv_surftype(), defined in gen${x}_state.c.
The lookup function contains some useful asserts that caught some nasty
bugs in anv meta, which were fixed in the previous commit.
Meta unconditionally used VK_IMAGE_VIEW_TYPE_2D in the functions below.
This caused some out-of-bound memory accesses.
anv_CmdCopyImage
anv_CmdBlitImage
anv_CmdCopyBufferToImage
anv_CmdClearColorImage
Fix it by adding a new function, anv_meta_get_view_type().
Regarding the subimages within a surface, sometimes isl called them
"images" and sometimes "LODs". This patch make isl consistently refer to
them as "images". I choose the term "image" over "LOD" because LOD is
an misnomer when applied to 3D surfaces. The alignment applies to each
individual 2D subimage, not to the LOD as a whole.
This patch changes no behavior. It's just a manually performed,
case-insensitive, replacement s/lod/image/ that maintains correct
indentation. any behavior.
Our hardware requires an LOD for all texelFetch commands even if they are
on buffer textures. GLSL IR gives us an LOD of 0 in that case, but the LOD
is really rather meaningless. This commit allows other NIR producers to be
more lazy and not provide one at all.
Previously, meta would pass null shaders in for the VS when it intended to
disable the VS. However, this meant that we didn't know what inputs we had
and would dead-code things in the FS. In order to solve this, we
hard-coded a number. Now meta passes in a VS even if it plans to disable
the stage so this is no longer needed.
Don't validate the baseArrayLayer and layerCount of cube images. This
allows us to remove a bloated lookup table and an unneeded struct
definition (anv_image_view_info).
Similar to anv_cmd_buffer_push_constants, but handles the compute
pipeline, which requires different setup from the other stages.
This also handles initializing the compute shader local IDs.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Replace
anv_image_create_info::force_tiling
anv_image_create_info::tiling
with the bitmask
anv_image_create_info::isl_tiling_flags
This allows us to drop the function
anv_image.c:choose_isl_tiling_flags().
Fixes assertion in vkCreateImage when VkFormat is combined depthstencil.
Fixed many vulkancts tests that use combined depthstencil. For example,
fixes dEQP-VK.pipeline.depth.format.d16_unorm_s8_uint.compare_ops.\
not_equal_less_or_equal_not_equal_greater.
We're required to expose a host-visible, coherent memory type. On big
core GPUs that share, LLC, we can expose one such memory type that's
also cached. However, on non-LLC GPUs we can't both be cached and
coherent. Thus, we expose both the required coherent type and the cached
but non-coherent combination.
Regular objects are created I915_CACHING_CACHED on LLC platforms and
I915_CACHING_NONE on non-LLC platforms. However, userptr objects are
always created as I915_CACHING_CACHED, which on non-LLC means
snooped. That can be useful but comes with a bit of overheard. Since
we're eplicitly clflushing and don't want the overhead we need to turn
it off.
Pre-Skylake, RENDER_SUFFACE_STATE.SurfaceVerticalAlignment is in units
of surface samples. A surface sample is equivalent to a pixel in all
surfaces except interleaved multisample surfaces.
In Skylake, it is in units of surface elements. A surface element is
equivalent to a surface sample except for compressed formats, in which
case the element is a compression block.
In anv_image_create(), stop asserting that VkImageCreateInfo::extent
does not exceed the hardware limits for the given SURFTYPE. The
assertions were incorrect because they did not take into account the
hardware gen. Anyways, these types of assertions belong in isl, not
anvil.
Remove the surface layout calculations in anv_image_make_surface(). Let
isl_surf_init() do the heavy lifting.
Fixes 8 Crucible tests and regresses none. (hw=Broadwell and
crucible@33d91ec).
This is a big code push. The patch is about 3000 lines.
Function isl_surf_init() calculates the physical layout of a surface.
The implementation is "complete" (but untested) for all 1D, 2D, 3D, and
cube surfaces for gen4 through gen9, except:
* gen9 1D surfaces
* gen9 Ys multisampled surfaces
* auxiliary surfaces (such as hiz, mcs, ccs)
Rename legacy Y tiling from ISL_TILING_Y to ISL_TILING_Y0 in order to
clearly distinguish it from Yf and Ys. Using ISL_TILING_Y to denote
legacy Y tiling would lead to confusion with i965, because i965 uses
I195_TILE_Y to denote *any* Y tiling.
This allows us to filter based on preprocessor directives. We could build
a partial preprocessor into the generator, but we would likely get it
wrong. This allows us to filter out, for instance, windows-specific WSI
stuff.
By and large, this is just moving enum values around. However, it also
removed VK_UNSUPPORTED which we were returning a number of places. Those
places now return VK_ERROR_INCOMPATABLE_DRIVER.
This made for an unfortunately large amount of work since we were using it
fairly heavily internally. However, gl_shader_stage does basically the
same things, so it's not too bad.
This packet is a different size on gen8 and we hit an assertion when we
try to merge a gen9 size dword array from the pipeline with the gen8
sized array we create from dynamic state.
Use a static assert in the merge macro and fix this issue by using different
wm_depth_stencil arrays on gen8 and gen9.
The old code didn't work correctly if you had member decorations after
non-member decorations. Since glslang never gave us any of those, it
wasn't properly tested.
The stuff to take descriptor sets and turn them into binding tables and
sampler tables is still in anv_cmd_buffer.c. We may want to consider
putting it in anv_descriptor_set.c eventually.
This file contains code that can be shared across gens modulo recompiling.
In particular, we can share STATE_BASE_ADDRESS setup and handling of the
vkPipelineBarrier call. Not sharing STATE_BASE_ADDRESS setup has already
been a source of bugs and the gen7 and gen8 implementations of
PipelineBarrier were line-for-line identical.
Incidentally, this should fix MOCS settings for dynamic and surface state
on Haswell.
There are various bits which move around between Haswell and Ivy Bridge
that we weren't taking into account. This also makes us actually set the
StencilWriteEnable in a sane way.
Previously, we were including gen7_pack.h, gen75_pack.h, and gen8_pack.h
in anv_private.h. As we add more gens, this is going to become untenable.
This commit moves things around so that we only use the pack headers when
and if we need them.
This gets tricky in a few places because we have to pass vtn_sampled_image
values through OpAccessChain, but it works ok. At some point, it probably
needs to be cleaned up but it doesn't occur to me exactly how to do that at
the moment. We'll see how this approach goes.
This commit adds the capability to NIR to support separate textures and
samplers. As it currently stands, glsl_to_nir only sets the sampler and
leaves the texture alone as it did before and nir_lower_samplers assumes
this. However, backends can, if they wish, assume that they are separate
because nir_lower_samplers sets both texture and sampler index (they are
the same in this case).
In anv_surface and anv_image_create_info, replace member
'uint8_t tile_mode' with 'enum isl_tiling'.
As a nice side-effect, this patch also reduces bug potential because the
hardware enum values for tile modes are unstable across hardware
generations.
There is a bug in some versions of the i915 kernel driver where it will
return immediately if the timeout is negative (it's supposed to wait
indefinitely). We've worked around this in mesa for a few months but never
implemented the work-around in the Vulkan driver.
I rediscovered this bug again while working on Ivy Bridge becasuse the
drive in my Ivy Bridge currently has Fedora 21 installed which has one of
the offending kernels.
Alignment units, i and j, match the compressed format block
width and height respectively.
v2: Don't assert against HALIGN* and VALIGN* enums (Chad)
Reviewed-by: Chad Versace <chad.versace@intel.com>
A non-compressed texture is a 1x1x1 block. Compressed
textures could have values which vary in different
dimensions WxHxD.
Reviewed-by: Chad Versace <chad.versace@intel.com>
cpp (chars-per-pixel) is an integer that fails to give useful data
about most compressed formats. Instead, rename it to "bs" which
stands for block size (in bytes).
v2: Rename vk_format_for_bs to vk_format_for_size (Chad)
Use "block size" instead of "bs" in error message (Chad)
Reviewed-by: Chad Versace <chad.versace@intel.com>
The new mechanism should be able to handle SSBOs as well as properly handle
emitting surface state on gen7 where we need different strides depending on
shader stage.
We never *actually* supported them, we just used them for binding UBOs.
Now that we have BufferInfo and we aren't supporting texture buffers yet,
we should get rid of them until we can do them properly.
Tested by Crucible "func.depthstencil.stencil_triangles.*" in
commit c194292d5eadb84e9d7489fc01ce0b653cdd4ca5 (HEAD -> master)
Author: Chad Versace <chad.versace@intel.com>
Date: Wed Nov 4 16:19:24 2015 -0800
Subject: func.depthstencil: Remove stencil clear workaround for Mesa
The anv_state is supposed to be a flyweight so we're not really saving
anything by using a pointer. Also, we were creating one, setting a pointer
to it, and then having it go out-of-scope which is bad.
Remove members
num_color_clear_attachments
has_depth_clear_attachment
has_stencil_clear_attachment
The new clear code in anv_meta_clear.c does not use them.
Fixes Crucible test "func.clear.load-clear.attachments-8".
The old clear code, when clearing attachments for
VK_ATTACHMENT_LOAD_OP_CLEAR, suffered from some fundamental bugs. The
bugs were not fixable with the old code's approach.
- It assumed that a VkRenderPass contained at most one depthstencil
attachment.
- It tried to clear all attachments (color and the sole
depthstencil) with a single instanced draw call, using the VUE
header's RenderTargetArrayIndex to specify the instance's target
color attachment. But the RenderTargetArrayIndex does not select
entries in the binding table; it only selects an array index of
a singled layered surface.
- If at least one attachment of VkRenderPass had
VK_ATTACHMENT_LOAD_OP_CLEAR,
then the old code cleared *all* attachments. This was
a consequence of using a single draw call and single pipeline for
the clear.
The new clear code fixes those bugs by making a separate draw call for
each attachment, and using one pipeline when clearing color attachments
and a different pipeline for depth attachments.
The new code, like the old code, does not clear stencil attachments. It
is left as a FINISHME.
Consistently rename bitmasks of Vulkan dynamic state to 'dynamic_mask'.
anv_meta_saved_state::dynamic_flags -> dynamic_mask
anv_meta_save(dynamic_state) -> dynamic_mask
As the functions are now exposed in anv_meta.h, let's rename them
to clarify that they are meta functions.
anv_cmd_buffer_save -> anv_meta_save
anv_cmd_buffer_restore -> anv_meta_restore
When emitting the binding table for the fragment shader stage, we no
longer "walk all of the attachments, [inserting only] the color
attachments into the binding table". Instead, we iterate only over the
subpass's color attachments, which is the minimal possible iteration.
While killing the comment, also rename the variable 'attachments' to
'color_count', as it's no longer a count of all framebuffer attachments
but only the subpass's color attachment count.
Right now, Broadweel and Ivy Bridge are the only supported platforms.
Hopefully, this reduces the chances that someone will try the driver on
unsupported hardware and be confused that it doesn't work.
What we had before was kind of a hack where we made certain untrue
assumptions about the incoming data. This new support, while it still
doesn't support indirects properly (that will come), at least pulls the
offsets and strides from SPIR-V like it's supposed to.
My original understanding of VkAttachmentDescription::loadOp,
stencilLoadOp was incorrect. Below are all possible combinations:
VkFormat | loadOp=clear stencilLoadOp=clear
---------------+---------------------------
color | clear-color ignored
depth-only | clear-depth ignored
stencil-only | ignored clear-stencil
depth-stencil | clear-depth clear-stencil
This reverts commit 24bcc89c8f.
Now that we have the new vulkan_resource_index intrinsic, these variants of
the classic UBO/SSBO instrinsics aren't needed.
Instead of just stomping on the mode, it now validates asserts that the
previously set mode is correct and only changes it if needed. We need to
do this because, in geometry shaders, there are some builtins that can be
either an input or an output depending on context. We can get that
information from the SPIR-V source but we can't throw it away.
Now that we've done the refactoring upstream, it's much easier to to get
hooked up. We haven't tested things well enough to know that we're setting
up the GPU state correctly for them yet but at least we can compile them now.
If the user called vkDestroyDevice but never called
vkEnumeratePhysicalDevices, then the driver tried to ralloc_free() an
unitialized anv_physical_device.
Fixes test 'dEQP-VK.api.device_init.create_instance_name_version'.
Now that we have a decent interface in upstream mesa, we can get rid of all
our hacks. As of this commit, we no longer use any fake GL state objects
and all of shader compilation is moved into anv_pipeline.c. This should
make way for actually implementing a shader cache one of these days.
As a nice side-benifit, this commit also gains us an extra 300 passing CTS
tests because we're actually filling out the texture swizzle information
for vertex shaders.
The geometry shader support is currently completely untested. As I go
through and re-factor the compiler, I'd rather not refactor dead code that
I don't have a way to know if I broke. Let's just remove it for now. We
can put it back in easily enough later and then we'll do it properly.
Prefix all anv_cmd_state dirty bit tokens with ANV_CMD_DIRTY. For
example:
old -> new
ANV_DYNAMIC_VIEWPORT_DIRTY -> ANV_CMD_DIRTY_DYNAMIC_VIEWPORT
ANV_CMD_BUFFER_PIPELINE_DIRTY -> ANV_CMD_DIRTY_PIPELINE
Change type of anv_cmd_state::dirty and ::compute_dirty from uint32_t to
the self-documenting type anv_cmd_dirty_mask_t.
The Vulkan spec allows VkGraphicsPipelineCreateInfo::pDepthStencilState
to be NULL when the pipeline's subpass contains no depthstencil
attachment (see spec quote below). anv_pipeline_init_dynamic_state()
required it unconditionally.
This path fixes anv_pipeline_init_dynamic_state() to access
pDepthStencilState only when there is a depthstencil attachment.
From the Vulkan spec (20 Oct 2015, git-aa308cb)
pDepthStencilState [...] may only be NULL if renderPass and subpass
specify a subpass that has no depth/stencil attachment.
The Vulkan spec (20 Oct 2015, git-aa308cb) states that some fields of
VkGraphicsPipelineCreateInfo are required under certain conditions.
Add a new function, anv_pipeline_validate_create_info() that asserts the
requirements hold.
The assertions helped me discover bugs in Crucible and anv_meta.c.
The Vulkan spec (20 Oct 2015, git-aa308cb) requires that
VkGraphicsPipelineCreateInfo::renderPass be a valid handle. To satisfy
that, define a static dummy render pass used for all meta operations.
Most of the shader key setup we did was for pre-Sandybridge and the stuff
for SNB+ wasn't in the key setup. That stuff still isn't there but at
least we've left ourselves notes for now.
Previously, we were trying to handle them later when loading. However, at
that point, you've already lost information and it's harder to handle
certain corner-cases. In particular, if you have a shader that does
gl_PerVertex.gl_Position.x = foo
we have trouble because we see the .x and we don't know that we're in
gl_Position. If we, instead, handle it in OpAccessChain, we have all the
information we need and we can silently re-direct it to the appropreate
variable. This also lets us delete some code which is a nice side-effect.
The ability to dump an arbitrary miplevel or array slice of an anv_image to
a file is very useful for debugging. Nothing inside of the driver calls
this right now, but it's very useful to call from GDB.
Aparently, we had the dynamic state array in the pipeline backwards.
Instead of enabling the bits in the pipeline, it disables them and marks
them as "dynamic".
When we get to the end of the _vtn_load/store_varaible recursion, we may
have one link left in the deref chain if there is a vector component select
on the end. In this case, we need to truncate the deref chain early so
that, when we make the copy for the load, we don't get the extra deref.
The final deref will be handled by the vector extract/insert that comes
later.
Now that we have anv_nir_apply_pipeline_layout, we can hand the backend
compiler intrinsics and texture instructions that use a flat buffer index
just like it wants. There's no longer any reason for any of these hacks.
This patch reworks a bunch of stuff in the way we do descriptor set
layouts. Our previous approach had a couple of problems. First, it was
based on a misunderstanding of arrays in descriptor sets. Second, it
didn't properly handle descriptor sets where some bindings were missing
stages. The new apporach should be correct and also makes some operations,
particularly those on the hot-path, a bit easier.
We use the descriptor set layout for four things:
1) To determine the map from bindings to the actual flattened descriptor
set in vkUpdateDescriptorSets().
2) To determine the descriptor <-> binding table entry mapping to use in
anv_cmd_buffer_flush_descriptor_sets().
3) To determine the mappings of dynamic indices.
4) To determine the (set, binding, array index) -> binding table entry
mapping inside of shaders.
The new approach is directly taylored towards these operations.
Requested by developers outside Intel.
During the driver's pre-release development, let's make the README easy
to find for external experimenters. Keep it at the top of the source
tree.
The surface_format_info struct changed in mesa but the copied-and-pasted
version didn't get updated on the last mesa master merge. This both fixes
the bug and should prevent this in the future.
This was a remnant of the object tagging implementation we had at one
point. We haven't used it for a long time so there's no good reason to
keep it around.
Version 0.170.2 removes most of the error enums. In many cases, I had to
replace an error with a less accurate (or even incorrect) one.
In other cases, the error path is replaced with an assertion.
In C, functions with no arguments require a void argument.
build_nir_clear_fragment_shader() lacked that.
Fixes:
anv_meta.c:70:1: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
If anv_image_get_surface_for_aspect_mask() is given a combined
depthstencil aspect mask, and the image has a stencil surface but no
depth surface, then return the stencil surface.
Hacks on hacks.
Eliminates lots of warnings due to anv_meta.c's inclusion of nir.h.
I like the extra warnings, and they should probably get fixed. However,
git-grep reveals that no other Mesa directory uses -Wextra. Building
Vulkan produces a lot of compiler warnings from core Mesa headers that
no other Mesa developer sees, and hence no other Mesa developer will
fix.
This prepares for merging VkAttachmentView into VkImageView. The two
surface states are:
anv_image_view::color_rt_surface_state:
RENDER_SURFACE_STATE when using image as a color render target.
anv_image_view::nonrt_surface_state;
RENDER_SURFACE_STATE when using image as a non render target.
No Crucible regressions.
In make_image_for_buffer(), use VK_IMAGE_USAGE_SAMPLED_BIT when
transferring from the buffer and use VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT
when transferring to the buffer.
Remove anv_color_attachment_view and anv_depth_stencil_view, merging
them into anv_attachment_view. This prepares for merging
VkAttachmentView into VkImageView.
Push the members of struct anv_surface_view into anv_image_view and
anv_buffer_view, then remove struct anv_surface_view. Observe that
anv_surface_view::range is not needed for anv_image_view, and so was
dropped there.
This prepares for the merge of VkAttachmentView into VkImageView. Remove
the common parent of anv_buffer_view and anv_image_view (that is,
anv_surface_view) will make the merge easier.
Rename all anv_*_view variables to follow this convention:
- sview -> anv_surface_view
- bview -> anv_buffer_view
- iview -> anv_image_view
- aview -> anv_attachment_view
- cview -> anv_color_attachment_view
- ds_view -> anv_depth_stencil_attachment_view
This clarifies existing code. And it will reduce noise in the upcoming
commits that merge VkAttachmentView into VkImageView.
For a given struct anv_descriptor, all members are NULL (in which case
the descriptor is empty) or exactly one member is non-NULL.
To make struct anv_descriptor better reflect its set of valid states,
convert the struct into a tagged union.
anv_meta no longer uses GLSL shaders, and the build system no longer
converts them to SPIR-V. So remove anv_meta_spirv_autogen.h from
Makefile.am.
(cherry picked from commit 2fc8122f66)
While upgrading Mesa to the new 0.170.2 API, it's convenient to have all
three headers available in the tree:
- vulkan-0.138.2.h, the old one
- vulkan-0.170.2.h, the new one
- vulkan.h, the one in transition
From the LunarG SDK at tag sdk-0.9.1, import vulkan.h as
vulkan-0.170.2.h. This header is the first provisional header with the
addition of minor fixes.
We assert that the block offset we got while walking the list of blocks is
actually a multiple of the block size. If something goes wrong and the GPU
decides to stomp on the surface state buffer we can end up getting
corruptions in our list of blocks. This assertion makes such corruptions a
crash with a meaningful message rather than an infinite loop.
This was very useful to get us up-and-going. However, now that we can use
NIR directly for meta shaders, we don't need this anymore and we might as
well drop the glslc dependency.
These should all eventually be up-streamed. However, since they currently
have no upstream users, they would just bitrot there. We'll keep them
local for the time being.
We want to start using the surface state block pool for binding tables and
binding tables. In order to do this, we need to be able to set surface
state base address to the address of a block and surface state base address
has a 4K alignment requriement.
Binding tables will have a negative offset and we need a way to express
that. Besides, the chances of a state offset being larger than 2 GB is so
remote it's not worth thinking about.
Previously, there were a number of things we were doing wrong:
1) We weren't flushing the wl_display so dead-looping clients weren't
guaranteed to work.
2) We were sending the frame event after calling wl_surface.commit() so it
wasn't getting assigned to the correct frame
3) We weren't actually setting fifo_ready to false.
Unfortunately, we never noticed because (3) was hiding the other two. This
commit fixes all three and clients that use FIFO mode are now properly
refresh-rate limited.
Move the bulk of the function body to a new function
anv_physical_device_get_format_properties(). This allows us to reuse the
function when implementing anv_GetPhysicalDeviceImageFormatProperties()
without calling into the public entry point.
It's no longer used outside anv_batch_chain so we certainly don't need to
be exporting. Inside anv_batch_chain, it's only used twice and it can be
replaced by a single line so there's really no point.
Previously, our location handling was focussed on either no location
(usually implicit 0) or a builting. Unfortunately, if you gave it a
location, it would blow it away and just not care. This worked fine with
crucible and our meta shaders but didn't work with the CTS. The new code
uses the "data.explicit_location" field to denote that it has a "final"
location (usually from a builtin) and, otherwise, the location is
considered to be relative to the base for that shader stage.
This allows us to allocate from either side of the block pool in a
consistent way. If you use the previous block_pool_alloc function, you
will get offsets from the start of the pool as normal. If you use the new
block_pool_alloc_back function, you will get a negative index that
corresponds to something in the "back" of the pool.
This has the unfortunate side-effect of making it so that we can't have a
block pool bigger than 1GB. However, that's unlikely to happen and, for
the sake of bi-directional block pools, we need to negative offsets.
We don't have any locking issues yet because we use the pool size itself as
a mutex in block_pool_alloc to guarantee that only one thread is resizing
at a time. However, we are about to add support for growing the block pool
at both ends. This introduces two potential races:
1) You could have two block_pool_alloc() calls that both try to grow the
block pool, one from each end.
2) The relocation handling code will now have to think about not only the
bo that we use for the block pool but also the offset from the start of
that bo to the center of the block pool. It's possible that the block
pool growing code could race with the relocation handling code and get
a bo and offset out of sync.
Grabbing the device mutex solves both of these problems. Thanks to (2), we
can't really do anything more granular.
Partially implement the below functions for 3D images:
vkCmdCopyBufferToImage
vkCmdCopyImageToBuffer
vkCmdCopyImage
vkCmdBlitImage
Not all features work, and there is much for performance improvement.
Beware that vkCmdCopyImage and vkCmdBlitImage are untested. Crucible
proves that vkCmdCopyBufferToImage and vkCmdCopyImageToBuffer works,
though.
Supported:
- copy regions with z offset
Unsupported:
- copy regions with extent.depth > 1
Crucible test results on master@d452d2b are:
pass: func.miptree.r8g8b8a8-unorm.*.view-3d.*
pass: func.miptree.d32-sfloat.*.view-3d.*
fail: func.miptree.s8-uint.*.view-3d.*
The field's meaning depends on SURFACE_STATE::SurfaceType.
Make that correlation explicit by switching on VkImageType.
For good measure, add some PRM quotes too.
Calling vkCreateImage() with VK_IMAGE_TYPE_3D now succeeds and computes
the surface layout correctly. However, 3D images do not yet work for
many other Vulkan entrypoints.
Previously, we simply had a big blob of stuff for "driver constants". Now,
we have a very specific data structure that contains the driver constants
that we care about.
There were merge conflicts in spirv.h that got missed because they were in
a comment and so it still compiled. This gets rid of them and we should be
on-par with upstream spirv->nir.
Pulling in libwayland causes undefined symbols in applications that are
linked against vulkan alone. Ideally, we would like to dlopen a platform
support library or something like that. For now, this works and should get
crucible running again.
We do this for two reasons: First, because it allows us to simplify WSI and
compiling in/out support for a particular platform is as simple as calling
or not calling the platform-specific init function. Second, the
implementation gives us a place for a given chunk of the WSI to stash
stuff in the instance.
Unfortunately, this is a very large commit and removes the old LunarG WSI
extension. This is because there are a couple of entrypoints that have the
same name between the two extensions so implementing them both is
impractiacl.
Support is still incomplete, but this is enough to get vkcube up and going
again.
Now that we don't compile GLSL, we can roll back a few more hacks and
unexport some things from the backend compiler.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Since we switched away from calling brwCreateContext() there's a bit of
hacky support we can now delete. This reduces our diff to upstream master.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
We used to always just do a one-level fallback from genX_* to anv_*
entry points. That worked for gen7 and gen8 where all entry points were
either different or could be made anv_* entry points (eg
anv_CreateDynamicViewportState). We're about to add gen9 and now need to
be able to fall back to gen8 entry points for most things.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
We'll change the dispatch mechanism again in a later commit. Stop using
the driver_layer function pointers and just use the public entry points.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Print out file and line number and translate the error code to the
symbolic name.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Create R8_UINT VkAttachmentView and VkImageView for the stencil data.
This fixes a crash, but the pixels in the destination image are still
incorrect. They are not properly tiled.
Fixes crashes in Crucible tests func.miptree.s8-uint.aspect-stencil.* as
of crucible-7471449. Test results improve 'lost' -> 'fail'.
In Vulkan, VertexId and InstanceId will be zero-based and new intrinsics,
VertexIndex and InstanceIndex, will be added for non-zer-based. See also,
Khronos bug #14255
From now on, the majority of SPIR-V improvements should happen on the spirv
branch which will also be public. It will be frequently merged into the
vulkan driver.
This *should* ensure that the cursor gets properly advanced in all cases.
We had a problem before where, if the cursor was created using
nir_after_cf_node on a non-block cf_node, that would call nir_before_block
on the block following the cf node. Instructions would then get inserted
in backwards order at the top of the block which is not at all what you
would expect from nir_after_cf_node. By just resetting to after_instr, we
avoid all these problems.
I've been promissed in a bug that this will be fixed in a future version of
the header. However, in the interest of my branch building, I'm adding
these changes in myself for the moment.
We don't actually use it to create all the instructions but we do use it
for insertion always. This should make things far more consistent for
implementing extended instructions.
We do control-flow handling as a two-step process. The first step is to
walk the instructions list and record various information about blocks and
functions. This is where the acutal nir_function_overload objects get
created. We also record the start/stop instruction for each block. Then
a second pass walks over each of the functions and over the blocks in each
function in a way that's NIR-friendly and actually parses the instructions.
Previously, they were hidden behind a #ifdef __cplusplus so C wouldn't find
them. This commit simpliy moves the #ifdef and adds #ifdef's around
constructors.
Instead of having functions to add values and set various things, we just
have a function that does a few asserts and then returns the value. The
caller is then responsible for setting the various fields.
Stencil buffers have strange pitch. The PRM says:
The pitch must be set to 2x the value computed based on width,
as the stencil buffer is stored with two rows interleaved.
anv_cmd_buffer_clear_attachments() skipped the clear renderpass if no
color attachments needed to be cleared, even if a depth attachment
needed to be cleared.
In anv_depth_stencil_view, replace the members
bo
depth_offset
depth_stride
depth_format
depth_qpitch
stencil_offset
stencil_stride
stencil_qpitch
with the single member
const struct anv_image *image
The removed members duplicated data in anv_image::depth_surface and
anv_image::stencil_surface.
The format of the view itself and of the view's image may differ.
Moreover, if the view's format has no depth aspect but the image's
format does, we must not program the depth buffer. Ditto for stencil.
gen8_graphics_pipeline_create had a bunch of stuff in it that's already set
up by anv_pipeline_init. The duplication was causing double-initialization
of a state stream and made valgrind very angry.
This field in surface state is a bool, WriteOnlyCache is an enum from
GEN8.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
You wouldn't think the TileWalk mode matters when TiledSurface is
false. However, it has to be TILEWALK_XMAJOR. Make it so.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
With all the previous commits in place, we can now drop in support for
multiple platforms. First up is gen7 (Ivybridge).
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
This is a more logical place for it, between geometry front end state
and pixel backend state.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
This commit moves all occurances of gen8 specific state into a gen8
substruct. This clearly identifies the state as gen8 specific and
prepares for adding gen7 state structs. In the process we also rename
the field names to exactly match the command or state packet name,
without the 3DSTATE prefix, eg:
3DSTATE_VF -> gen8.vf
3DSTATE_WM_DEPTH_STENCIL -> gen8.wm_depth_stencil
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
The clear pipeline didn't have a vertex shader and relied on the clear
shader being hardcoded by the compiler to accept one attribute. This
necessitated a few special cases in the 3DSTATE_VS setup. Instead,
always provide a vertex shader, even if we disable VS dispatch.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Many of of these fields aren't used for buffer surfaces, so leave them
out for brevity.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
This adds VALIGN_2 and VALIGN_4 defines for IVB and HSW
RENDER_SURFACE_STATE.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
I'd prefer to move anv_CreateAttachmentView() as well, but it's a little
too much generic code to just duplicate for each gen. For now, we'll
add a anv_color_attachment_view_init() to dispatch to the gen specific
implementation, which we then call from anv_CreateAttachmentView().
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
We'll probably want to move some code back into a shared init function,
but this gets one GEN8 surface state initialization out of anv_image.c.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
We need this for generating surface state on the fly for dynamic buffer
views.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
We're going to have to do this differently for earlier gens, so lets do
it in place only.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Since the extra dword in MI_BATCH_BUFFER_START added in gen8 is at the
end of the struct, we can emit the gen8 packet on all gens as long as we
set the instruction length correctly.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
We used to use a manual GEN8_MI_BATCH_BUFFER_START_pack() call, but this
refactors the code to use anv_batch_emit();
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
We'll organize gen specific code in three files per gen: pipeline,
cmd_buffer and state, eg:
gen8_cmd_buffer.c
gen8_pipeline.c
gen8_state.c
where gen8_cmd_buffer.c holds all vkCmd* entry points, gne8_pipeline.c
all gen specific code related to pipeline building and remaining state
code (sampler, surface state, dynamic state) in gen8_state.c.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
During vkCreateRenderPass, count the number of clear ops and store them
in new members of anv_render_pass:
uint32_t num_color_clear_attachments
bool has_depth_clear_attachment
bool has_stencil_clear_attachment
Cacheing these 8 bytes (including padding) reduces the number of times
that anv_cmd_buffer_clear_attachments needs to loop over the pass's
attachments.
In struct anv_saved_state, each member's type was a pointer to an Anvil
struct and each member's name was prefixed with "old" except cb_state,
which was a Vulkan handle whose name lacked "old".
The source image's format was incorrectly used for both the source view
and destination view. For vkCmdCopyImage to correctly translate formats,
the destination view's format must be that of the destination image's.
Change type of anv_render_pass_attachment::format from VkFormat to const
struct anv_format*. This elimiates the repetitive lookups into the
VkFormat -> anv_format table when looping over attachments during
anv_cmd_buffer_clear_attachments().
Stop creating a temporary VkImageCreateInfo with overriden
format=VK_FORMAT_S8_UINT. Instead, just pass the format override
directly to anv_image_make_surface().
Stencil formats are often a special case. To reduce the number of lookups
into the VkFormat-to-anv_format translation table when working with
stencil, expose the table's entry for VK_FORMAT_S8_UINT as global
variable anv_format_s8_uint.
Change type of anv_surface_view::format from VkFormat to const struct
anv_format*. This reduces the number of lookups in the VkFormat ->
anv_format table.
This moves the translation of VkFormat to anv_format from
anv_fill_buffer_surface_state() to its caller.
A prep commit to reduce more VkFormat -> anv_format translations.
Store the original VkFormat as anv_format::vk_format. This will be used
to reduce format indirection, such as lookups into the VkFormat ->
anv_format translation table.
We need to make sure we use the VkImage infrastructure for creating
dmabuf images.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
We already query the device in various ways here and we can just also
get the aperture size. This avoids keeping an extra drm fd open
during the life time of the driver.
Also, we need to use explicit 64 bit types for the aperture size, not
size_t.
This lets us hit an assert if we exceed the block pool size instead of
GPU hanging.
Signed-off-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
The anv_block_pool data structure suffered from the exact same race as the
state pool. Namely, that the uniqueness of the blocks handed out depends
on the next_block value increasing monotonically. However, this invariant
did not hold thanks to our block "return" concept.
The previous algorithm had a race because of the way we were using
__sync_fetch_and_add for everything. In particular, the concept of
"returning" over-allocated states in the "next > end" case was completely
bogus. If too many threads were hitting the state pool at the same time,
it was possible to have the following sequence:
A: Get an offset (next == end)
B: Get an offset (next > end)
A: Resize the pool (now next < end by a lot)
C: Get an offset (next < end)
B: Return the over-allocated offset
D: Get an offset
in which case D will get the same offset as C. The solution to this race
is to get rid of the concept of "returning" over-allocated states.
Instead, the thread that gets a new block simply sets the next and end
offsets directly and threads that over-allocate don't return anything and
just futex-wait. Since you can only ever hit the over-allocate case if
someone else hit the "next == end" case and hasn't resized yet, you're
guaranteed that the end value will get updated and the futex won't block
forever.
We have pools, so we should be using them. Also, I think this will help
keep valgrind from getting confused when we have to end up fighting with
system allocations such as those from malloc/free and mmap/munmap.
Figuring out whether or not to do a copy requires knowing the length of the
final batch_bo. This gets set by anv_batch_bo_finish so we have to do it
afterwards. Not sure how this was even working before.
Previously, the command buffer implementation was split between
anv_cmd_buffer.c and anv_cmd_emit.c. However, this naming convention was
confusing because none of the Vulkan entrypoints for anv_cmd_buffer were
actually in anv_cmd_buffer.c. This changes it so that anv_cmd_buffer.c is
what you think it is and the internals are in anv_batch_chain.c.
Previously, the caller of emit_state_base_address was doing this. However,
putting it directly in emit_state_base_address means that we'll never
forget the flush at the cost of one PIPE_CONTROL at the top every batch
(that should do nothing since the kernel just flushed for us).
This is more generic and doesn't imply that it emits MI_BATCH_BUFFER_END.
While we're at it, we'll move NOOP adding from bo_finish to
end_batch_buffer.
Instead of walking the list of batch and surface buffers, we simply keep
track of all known batch and surface buffers as we build the command
buffer. Then we use this new list to construct the validate list.
The algorighm we used previously required us to call add_bo in a particular
order in order to guarantee that we get the initial batch buffer as the
last element in the validate list. The new algorighm does a recursive walk
over the buffers and then re-orders the list. This should be much more
robust as we start to add circular dependancies in the relocations.
Before, we were doing this thing where we had one big relocation list for
the whole command buffer and each subbuffer took a chunk out of it. Now,
we store the actual relocation list in the anv_batch_bo. This comes at the
cost of more small allocations but makes a lot of things simpler.
Previously anv_batch.relocs was an actual relocation list. However, this
is limiting if the implementation of the batch wants to change the
relocation list as the batch progresses.
This update fixes cases where a 48-bit address field was split into
two parts:
__gen_address_type MemoryAddress;
uint32_t MemoryAddressHigh;
which cases this pack code to be generated:
dw[1] =
__gen_combine_address(data, &dw[1], values->MemoryAddress, dw1);
dw[2] =
__gen_field(values->MemoryAddressHigh, 0, 15) |
0;
which breaks for addresses above 4G.
This update also fixes arrays of structs in commands and structs, for
example, we now have:
struct GEN8_BLEND_STATE_ENTRY Entry[8];
and the pack functions now write all dwords in the packet, making
valgrind happy.
Finally, we would try to pack 64 bits of blend state into a uint32_t -
that's also fixed now.
Previously, we were crawling through the anv_cmd_buffer datastructure to
pull out batch buffers and things. This meant that every time something in
anv_cmd_buffer changed, we broke aub dumping. However, aub dumping should
just dump the stuff the kernel knows about so we really don't need to be
crawling driver internals.
This used to happen magically in cmd_buffer_new_surface_state_bo. However,
according to Ken, STATE_BASE_ADDRESS is very gen-specific so we really
shouldn't have it in the generic data-structure code.
The CTS passes in NULL names right now. It's not too hard to support that
as just "main". With this, and a patch to vulkancts, we now pass all 6
tests.
Jason started the task by creating anv_cmd_buffer.c and anv_cmd_emit.c.
This patch finishes the task by renaming all other files except
gen*_pack.h and glsl_scraper.py.
This completes the FINISHME to trim unneeded data from anv_buffer_view.
A VkExtent3D doesn't make sense for a VkBufferView.
So remove the member anv_surface_view::extent, and push it up to the two
objects that actually need it, anv_image_view and anv_attachment_view.
This removes nearly all the remaining raw Anvil<->Vulkan casts from the
C source files. (File compiler.cpp still contains many raw casts, and
I plan on ignoring that).
As far as I can tell, the only remaining raw casts are:
anv_attachment_view -> anv_depth_stencil_view
anv_attachment_view -> anv_color_attachment_view
The GLSL layer above is still hacky, so we're really just moving the
hack into GLSL-to-NIR. I'd rather not go all the way and make GLSL
support the Vulkan binding model too, since presumably we'll be
switching to SPIR-V exclusively, and so working on proper GLSL support
will be a waste of time. For now, doing this keeps it working as we add
SPIR-V->NIR support though.
The Vulkan spec does not specify that the free function provided to
CreateInstance must handle NULL properly so we do it in the wrapper. If
this ever changes in the spec, we can delete the extra 2 lines.
While updating vkDestroyObject, I discovered that vkDestroyPass reliably
crashes. That hasn't been an issue yet, though, because it is never
called.
In vkCreateRenderPass:
- Don't allocate empty attachment arrays.
- Ensure that pointers to empty attachment arrays are NULL.
- Store VkRenderPassCreateInfo::subpassCount as
anv_render_pass::subpass_count.
In vkDestroyRenderPass:
- Fix loop bounds: s/attachment_count/subpass_count/
- Don't call anv_device_free on null pointers.
vkDestroyColorAttachmentView
vkDestroyDepthStencilView
These functions are not in the 0.132 header, but adding them will help
us attain the type-safety API updates more quickly.
Oops. My recent commits added new destructors, but forgot to teach
vkDestroyObject about them. They are:
vkDestroyFence
vkDestroyEvent
vkDestroySemaphore
vkDestroyQueryPool
vkDestroyBuffer
This means that now the internal version of glslangValidator is
required. This includes some changes due to the sampler/texture rework,
but doesn't actually enable anything more yet. We also don't yet handle
UBO's correctly, and don't handle matrix stride and row major/column
major yet.
During anv_physical_device_init(), we opend the DRM device to do some
queries, then promptly closed it. Now we keep it open for the lifetime
of the anv_physical_device so that we can query it some more during
vkGetPhysicalDevice*Properties() [which will happen in follow-up
commits].
Because in a follow-up patch I need to do some non-trival teardown on
anv_physical_device. Currently, however, anv_physical_device_finish() is
currently a no-op that's just called in the right place.
Also, rename function fill_physical_device -> anv_physical_device_init
for symmetry.
Don't assume that $(top_srcdir)/.git is a directory. It may be a
gitlink file [1] if $(top_srcdir) is a submodule checkout or a linked
worktree [2].
[1] A "gitlink" is a text file that specifies the real location of
the gitdir.
[2] Linked worktrees are a new feature in Git 2.5.
Cc: "10.6, 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
(cherry picked from commit 75784243df)
The Vulkan spec says that pPhysicalDeviceCount is an out parameter if
pPhysicalDevices is NULL; otherwise it's an inout parameter.
Mesa incorrectly treated it unconditionally as an inout parameter, which
could have lead to reading unitialized data.
Don't enumerate devices in vkCreateInstance(). That's where global,
device-independent initialization should happen. Move device enumeration
to the more logical location, vkEnumeratePhysicalDevices().
Function fill_physical_device() has a 'path' parameter, and struct
anv_physical_device has a 'path' member. Sometimes these are used;
sometimes hardcoded "/dev/dri/renderD128" is used instead.
Be consistent. Hardcode "/dev/dri/renderD128" in exactly one location,
during initialization of the physical device.
Before values are pushed or annotated with a name, decoration, etc.,
they need to have an invalid type, NULL name, NULL decoration, etc.
ralloc zero's everything by accident, so this wasn't an issue in
practice, but we should be explicitly zero'ing it.
Unfortunately, this requires some non-trivial changes to the driver. Now
that the primitive restart index isn't given explicitly by the client, we
always use ~0 for everything like D3D does. Unfortunately, our hardware is
awesome and a 32-bit version of ~0 doesn't match any 16-bit values. This
means, we have to set it to either UINT16_MAX or UINT32_MAX depending on
the size of the index type. Since we get the index type from
CmdBindIndexBuffer and the rest of the VF packet from the pipeline, we need
to lazy-emit the VF packet.
This was missing and was causing the driver to not work with
execlists. Presumably we get a different initial hw context with
execlists enabled, that has sample mask 0 initially.
Set this to 0xffff for now. When we add MS support, we need to take the
value from VkPipelineMsStateCreateInfo::sampleMask.
The new headers use stdbool for enable/disable fields which
implicitly converts expressions like (flags & 8) to 0 or 1.
Also handles MBO (must-be-one) fields by setting them to one,
corrects a bspec typo (_3DPRIM_LISTSTRIP_ADJ -> LINESTRIP) and
makes a few enum values less clashy.
Convert the table from the direct mapping
VkImageViewType -> SurfaceType
into a mapping to an info struct
VkImageViewType -> struct anv_image_view_info
This accounts for a number differences between the generated headers and
the hand-written header. Not all reformatting is done in this commit but
it does make the headers much more diffable.
In theory, no functional change.
Structs in the old version were specified as
typedef struct VkSomeThing_
{
type field; // comment
} VkSomeThing;
However, in the generated headers, you have
typedef struct {
type field;
} VkSomeThing;
This commit also removes some unneeded whitespaces.
We should be asserting that the parent decoration didn't hand us
a member if the child decoration did, but different child decorations
may obviously have different members.
Since SSA values now have their own types, it's more convenient to make
'type' only used when we want to look up an actual SPIR-V type, since
we're going to change its type soon to support various decorations that
are handled at the SPIR-V -> NIR level.
Previously, the caller of vtn_handle_type had to handle actually inserting
the type. However, this didn't really work if the type was decorated in
any way.
The way the code currently works is that anv_format::cpp is the cpp of
anv_format::surface_format.
Me and Kristian disagree about how the code *should* work. Despite that,
I think it's in our discussion's best interest to document how the code
*currently* works. That should eliminate confusion.
If and when the code begins to work differently, then we'll update the
anv_format comments.
This prepares for upcoming miptree support.
anv_surface is a proxy for color surfaces, depth surfaces, and stencil
surfaces. Embed two instances of anv_surface into anv_image: the
primary surface (color or depth), and an optional stencil surface.
All function signatures that matched this pattern,
old: f(const VkImageCreateInfo *, const struct anv_image_create_info *)
were rewritten as
new: f(const struct anv_image_create_info *)
The format table defined cpp = 0 for stencil-only formats. The real cpp
is 1.
When code begins to lie, especially about stencil buffers, code becomes
increasingly fragile as time progresses, and the damage becomes
increasingly hard to undo. (For precedent, see the painful history of
stencil buffer cpp in the git log for gen6 and gen7 in the i965 driver).
Let's undo the stencil buffer cpp lie now to avoid future pain.
In the format table, set cpp = 1 for VK_FORMAT_S8; replace checks for
cpp == 0; and delete all comments about the hack.
From my experience with intel_mipmap_tree.c, I learned that for struct's
like anv_image and intel_mipmap_tree, which have sprawling
multi-function construction codepaths, it's easy to mistakenly use
unitialized struct members during construction.
Let's eliminate the risk of using unitialized anv_image members during
construction. Fill the struct at the function bottom instead of
piecemeal throughout the constructor.
anv_format::surface_format was incorrect for Vulkan depth formats.
For example, the format table mapped
VK_FORMAT_D24_UNORM -> .surface_format = D24_UNORM_X8_UINT
VK_FORMAT_D32_FLOAT -> .surface_format = D32_FLOAT
but should have mapped
VK_FORMAT_D24_UNORM -> .surface_format = R24_UNORM_X8_TYPELESS
VK_FORMAT_D32_FLOAT -> .surface_format = R32_FLOAT
The Crucible test func.depthstencil.basic passed despite the bug, but
only because it did not attempt to texture from the depth surface.
The core problem is that RENDER_SURFACE_STATE.SurfaceFormat and
3DSTATE_DEPTH_BUFFER.SurfaceFormat are distinct types. Considering them
as enum spaces, the two enum spaces have incompatible collisions.
Fix this by adding a new field 'depth_format' to struct anv_format.
Refer to brw_surface_formats.c:translate_tex_format() for precedent.
I misinterpreted anv_format::format as a VkFormat. Instead, it is
a hardware surface format (RENDER_SURFACE_STATE.SurfaceFormat). Rename
the field to 'surface_format' to make it unambiguous.
The brw_create_nir function takes a GLSL or ARB shader and turns it into a
NIR shader. The guts of the optimization and lowering code is now split
into a new brw_process_shader function.
When we're done emitting the code for a loop, we need to visit the new
break block, which is the merge block of the current loop, rather than
the old merge block, which is the merge block of the loop containing the
one we just emitted code for.
We compute the right mask and thread width max parameters as part of
pipeline creation and set them accordingly at vkCmdDispatch() and
vkCmdDispatchIndirect() time. These parameters depend only on the local
group size and the dispatch width of the program so we can figure this
out at pipeline create time.
This reverts commits
e17ed04 * vk/image: Don't double-allocate stencil buffers
1ee2d1c * vk/image: Teach anv_image_choose_tile_mode about WMAJOR
and instead adds a comment to describe the subtlety of how we create
images for stencil only formats.
The check in batch_bo_finish should catch any undefined values in the batch
but isn't that great for debugging. The checks in the various emit
functions will help get better granularity.
- Reduce the number of table lookups in anv_image_create from 4 to 1.
- Add field for surface alignment.
- Shorten field names tile_width, tile_height -> width, height.
This avoids the full brw context initialization and just sets up context
constants, initializes extensions and sets a few driver vfuncs for the
front-end GLSL compiler.
Dynamic color/blend state can be NULL in case we're not rendering to
color targets (only output to depth and/or stencil). Initialize
cmd_buffer->cb_state to NULL so we can reliably detect whether it's been
set or not.
GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell:
total instructions in shared programs: 2742062 -> 2681339 (-2.21%)
instructions in affected programs: 1514770 -> 1454047 (-4.01%)
helped: 5813
HURT: 1120
The gained programs are ARB vertext programs that were previously going
through the vec4 backend. Now that we have prog_to_nir, ARB vertex
programs can go through the scalar backend so they show up as "gained" in
the shader-db results.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
This supports the three Vulkan border color types for float color
formats. The support for integer formats is a little trickier, as we
don't know the format of the texture at this time.
According to the bspec, you're supposed to emit a PIPE_CONTROL with a CS
stall and a render target flush prior to chainging STATE_BASE_ADDRESS. A
little experimentation, however, shows that this is not enough. It also
appears as if you have to flush the texture cache after chainging base
address or things won't propagate properly.
This commit allows for us to create a whole new surface state buffer when
the old one runs out of room. We simply re-emit the state base address for
the new state, re-emit binding tables, and keep going.
Before, we were emitting surface states up-front when binding tables were
updated. Now, we wait to emit the surface states until we emit the binding
table. This makes meta simpler and should make it easier to deal with
swapping out the surface state buffer.
We need to make sure we use the right index into dynamic offset
array. Dynamic descriptors can be present or not in different stages and
to get the right offset, we need to compute the index at
vkCreateDescriptorSetLayout time.
We do this by creating a surface state on the fly that incorporates the
dynamic offset. This patch also refactor the descriptor set layout
constructor a bit to be less clever with switch statement fall
through. Instead of duplicating the subtle code to update the sampler
and surface slot map, we just use two switch statements.
We rely on these being initialized to NULL so meta can reliably detect
whether or not they've been set. ds_state is also allowed to not be
present so we need a well-defined value for that.
This mega-commit primarily does two things. First, is to turn anv_batch
into a better abstraction of a batch. Instead of actually having a BO, it
now has a few pointers to some piece of memory that are used to add data to
the "batch". If it gets to the end, there is a function pointer that it
can call to attempt to grow the batch.
The second change is to start using chained batch buffers. When the end of
the current batch BO is reached, it automatically creates a new one and
ineserts an MI_BATCH_BUFFER_START command to chain to it. In this way, our
batch buffers are effectively infinite in length.
This also drops the share create_surface_state helper and moves filling
out SURFACE_STATE directly into anv_image_view_init() and
anv_color_attachment_view_init().
If a meta operation is called before the pipeline is set, this can cause
uses of undefined values. They *should* be harmless, but we might as well
shut up valgrind on this one too.
Previously, we just blasted out whatever VB's we had marked as "dirty"
regardless of which ones were used by the pipeline. Given that the stride
of the VB is embedded in the pipeline this can cause problems. One problem
is if the pipeline doesn't use the given VB binding we emit a bogus stride.
Another problem is that we weren't properly resetting the dirty bits when
the pipeline changed.
Previously, we waited until later and did a pass through the used surfaces
and did the relocations then. This lead to doing double-relocations which
was causing us to get bogus surface offsets.
Since the binding table pointer is only 16 bits, we can only have 64kb
of binding table state allocated at any given time. With a block size of
1kb, that amounts to just 64 command buffers, which is not enough.
This commit adds support for the LunarG GLSL back-door as well as detecting
regular GLSL and SPIR-V. The SPIR-V path doesn't exist yet, so that will
cause an assert-fail.
The *_init functions work basically the same as the Vulkan entrypoints
except that they act on an already-created view and take an optional
command buffer option. If a command buffer is given, the surface state is
allocated out of the command buffer's state stream.
This reverts commit b6ab076d6b.
It turns out setting USE_MEMFD to 0 is really bad because it means we can't
resize the pool. Besides, valgrind SVN handles memfd so we really don't
need this fallback for valgrind anymore.
The binding table pointers packet only allows for a 16-bit binding table
address so all binding tables have to be in the first 64 KB of the surface
state BO. We solve this by adding a slave block pool that pulls off the
first 64 KB worth of blocks and reserves them for binding tables.
Clear inherits the render targets from the current render pass. This
means we need to fill out the binding table after switching to meta
bindings. However, meta copies etc happen outside a render pass and
break when we try to fill in the render targets. This change fills the
render targets only for meta clear.
We leave the block pool untracked so that reads/writes to freed blocks
will get caught and do the tracking at the state pool/stream level. We
have to do a few extra gymnastics for streams because valgrind works in
terms of poitners and we work in terms of separate map and offset.
Fortunately, the users of the state pool and stream should always be using
the map pointer provided in the anv_state structure. We just have to
track, per block, the map that was used when we initially got the block.
Then we can make sure we always use that map and valgrind should stay
happy.
We now always copy the entire struct unless pData is NULL and
unconditionally write back the struct size. It's not clear this is
useful if the structs may grow over time, but it seems to be the
expected behaviour for now.
I've been promissed in a bug that this will be fixed in a future version of
the header. However, in the interest of my branch building, I'm adding
these changes in myself for the moment.
We don't actually use it to create all the instructions but we do use it
for insertion always. This should make things far more consistent for
implementing extended instructions.
We do control-flow handling as a two-step process. The first step is to
walk the instructions list and record various information about blocks and
functions. This is where the acutal nir_function_overload objects get
created. We also record the start/stop instruction for each block. Then
a second pass walks over each of the functions and over the blocks in each
function in a way that's NIR-friendly and actually parses the instructions.
Instead of having functions to add values and set various things, we just
have a function that does a few asserts and then returns the value. The
caller is then responsible for setting the various fields.
If we don't do this then recursive meta is completely broken. What happens
is that the outer meta call may change the bindings pointer and the inner
meta call will change it again and, when it exits set it back to the
default. However, the outer meta call may be relying on it being left
alone so it uses the non-meta descriptor sets instead of its own.
Previously, the GLSL_VK_SHADER macro didn't work if the shader contained
commas outside of parentheses due to the way the C preprocessor works.
This commit fixes this by making it variadic again and doing it correctly
this time.
This changes the way descriptor sets and layouts work so that we fill
out binding table contents at the time we bind descriptor sets. We
manipulate the binding table contents and sampler state in a shadow-copy
in anv_cmd_buffer. At draw time, we allocate the actual binding table
and sampler state and flush the anv_cmd_buffer copies.
This new utility, glsl_scraper.py scrapes C files for instances of the
GLSL_VK_SHADER macro, pulls out the shader source, and compiles it to
SPIR-V. The compilation is done using glslValidator. The result is then
placed into another C file as arrays of dwords that can be easiliy handed
to a Vulkan driver.
This way we can pass in a vertex shader and yet have the pipeline emit an
empty 3DSTATE_VS packet. We need this for meta because we need to trick
the compiler into not deleting our inputs but at the same time disable the
VS so that we can use a rectlist. This should go away once we actually get
SPIR-V.
This is rather crude but it at least makes sure that all the render targets
get flushed at the end of the pass. We probably actually want to do
somthing based on image layout traansitions, but this will work for now.
This is by no means a complete solution to the binding table problems.
However, it does make texturing actually work. Before, we were texturing
from the render target since they were both starting at 0.
We don't need to point back to the memory object the bo came from.
Pointing directly to a bo lets us bind images and buffers to other
bos - like our allocator bos.
GL_OES_gpu_shader5 not started (based on parts of GL_ARB_gpu_shader5, which is done for some drivers)
GL_OES_primitive_boundingbox not started
GL_OES_sample_shading not started (based on parts of GL_ARB_sample_shading, which is done for some drivers)
GL_OES_sample_variables not started (based on parts of GL_ARB_sample_shading, which is done for some drivers)
GL_OES_shader_image_atomic not started (based on parts of GL_ARB_shader_image_load_store, which is done for some drivers)
GL_OES_shader_io_blocks not started (based on parts of GLSL 1.50, which is done)
GL_OES_shader_multisample_interpolation not started (based on parts of GL_ARB_gpu_shader5, which is done)
GL_OES_tessellation_shader not started (based on GL_ARB_tessellation_shader, which is done for some drivers)
GL_OES_texture_border_clamp not started (based on GL_ARB_texture_border_clamp, which is done)
GL_OES_texture_buffer not started (based on GL_ARB_texture_buffer_object, GL_ARB_texture_buffer_range, and GL_ARB_texture_buffer_object_rgb32 that are all done)
GL_OES_texture_cube_map_array not started (based on GL_ARB_texture_cube_map_array, which is done for all drivers)
GL_OES_texture_stencil8 DONE (all drivers that support GL_ARB_texture_stencil8)
GL_OES_texture_storage_multisample_2d_array DONE (all drivers that support GL_ARB_texture_multisample)
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94073">Bug 94073</a> - Miscompilation of abs_vec3_vert_xvary_ref.vert in WebGL conformance</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94193">Bug 94193</a> - [llvmpipe] Line antialiasing looks different when GL_LINE_STIPPLE is enabled with pattern 0xffff</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94195">Bug 94195</a> - [llvmpipe] Does not build with LLVM 3.7.x on Windows</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94481">Bug 94481</a> - softpipe - access violation in img_filter_2d_nearest</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94595">Bug 94595</a> - [Mesa AMD&swrast] Texture views attached as framebuffers return their viewed tecture's color encoding and render incorrectly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94954">Bug 94954</a> - test_vec4_copy_propagation fails in `make check`</li>
</ul>
<h2>Changes</h2>
<p>Anuj Phogat (1):</p>
<ul>
<li>i965: Fix assert conditions for src/dst x/y offsets</li>
</ul>
<p>Ben Widawsky (2):</p>
<ul>
<li>i965: Make sure we blit a full compressed block</li>
<li>i965/skl: Add two missing device IDs</li>
</ul>
<p>Brian Paul (1):</p>
<ul>
<li>mesa: fix incorrect viewport position when GL_CLIP_ORIGIN = GL_LOWER_LEFT</li>
</ul>
<p>Chris Forbes (1):</p>
<ul>
<li>i965/blorp: Fix hiz ops on MSAA surfaces</li>
</ul>
<p>Christian König (1):</p>
<ul>
<li>radeon/uvd: disable MPEG1</li>
</ul>
<p>Christian Schmidbauer (1):</p>
<ul>
<li>st/nine: specify WINAPI only for i386 and amd64</li>
</ul>
<p>Daniel Czarnowski (3):</p>
<ul>
<li>egl_dri2: NULL check for xcb_dri2_get_buffers_reply()</li>
<li>egl_dri2: set correct error code if swapbuffers fails</li>
<li>egl: support EGL_LARGEST_PBUFFER in eglCreatePbufferSurface(...)</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>mesa/fbobject: propogate Layered when reusing attachments.</li>
</ul>
<p>Derek Foreman (1):</p>
<ul>
<li>egl/wayland: Try to use wl_surface.damage_buffer for SwapBuffersWithDamage</li>
</ul>
<p>Dongwon Kim (1):</p>
<ul>
<li>egl: move Null check to eglGetSyncAttribKHR to prevent Segfault</li>
</ul>
<p>Emil Velikov (10):</p>
<ul>
<li>docs: add sha256 checksums for 11.1.2</li>
<li>get-pick-list.sh: Require explicit "11.1" for nominating stable patches</li>
<li>cherry-ignore: do not pick nv50/ir commit</li>
<li>automake: add nine to make distcheck</li>
<li>install-gallium-links: port changes from install-lib-links</li>
<li>automake: add more missing options for make distcheck</li>
<li>mesa; add get-extra-pick-list.sh script into bin/</li>
<li>egl/x11: check the return value of xcb_dri2_get_buffers_reply()</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94955">Bug 94955</a> - Uninitialized variables leads to random segfaults (valgrind log, apitrace attached)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94994">Bug 94994</a> - OSMesaGetProcAdress always fails on mangled OSMesa</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=95026">Bug 95026</a> - Alien Isolation segfault after initial loading screen/video</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=95133">Bug 95133</a> - X-COM Enemy Within crashes when entering tactical mission with Bonaire</li>
</ul>
<h2>Changes</h2>
<p>Brian Paul (1):</p>
<ul>
<li>gallium/util: initialize pipe_framebuffer_state to zeros</li>
</ul>
<p>Chad Versace (1):</p>
<ul>
<li>dri: Fix robust context creation via EGL attribute</li>
</ul>
<p>Egbert Eich (1):</p>
<ul>
<li>dri2: Check for dummyContext to see if the glx_context is valid</li>
</ul>
<p>Emil Velikov (5):</p>
<ul>
<li>docs: add sha256 checksums for 11.1.3</li>
<li>cherry-ignore: add non-applicable "fix of a fix"</li>
<li>cherry-ignore: ignore st_DrawAtlasBitmaps mem leak fix</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=75165">Bug 75165</a> - compute.c:464:49: error: function definition is not allowed here</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79783">Bug 79783</a> - Distorted output in obs-studio where other vendors "work"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89969">Bug 89969</a> - nouveau: add support for chunk decoding in order to support vaapi (st/va)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90348">Bug 90348</a> - Spilling failure of b96 merged value</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91596">Bug 91596</a> - EGL_KHR_gl_colorspace (v2) causes problem with Android-x86 GUI</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91806">Bug 91806</a> - configure does not test whether assembler supports sse4.1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92363">Bug 92363</a> - [BSW/BDW] ogles1conform Gets test fails</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92438">Bug 92438</a> - Segfault in pushbuf_kref when running the android emulator (qemu) on nv50</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92595">Bug 92595</a> - [HSW,BDW,SKL][GLES 3.1 CTS] Big difference in the results for the ES31-CTS.shader_bitfield_operation.* tests</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92687">Bug 92687</a> - Add support for ARB_internalformat_query2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92706">Bug 92706</a> - glBlitFramebuffer refuses to blit RGBA to RGB with MSAA</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92709">Bug 92709</a> - "LLVM triggered Diagnostic Handler: unsupported call to function ldexpf in main" when starting race in stuntrally</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92743">Bug 92743</a> - Centroid shouldn't have to match between the FS and the VS</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92759">Bug 92759</a> - [Regression, bisected] Visuals without alpha bits are not sRGB-capable</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93264">Bug 93264</a> - Tonga VM Faults since llvm ScheduleDAGInstrs: Rework schedule graph builder.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93266">Bug 93266</a> - gl_arb_shading_language_420pack does not allow binding of image variables</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93300">Bug 93300</a> - Two Worlds 2 renders water incorrectly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93339">Bug 93339</a> - glLinkProgram() should fail when a varying is never written to in a previous stage</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93599">Bug 93599</a> - Strange green flashes with "Metro: Last Light Redux" + "Metro 2033 Redux" with Intel Mesa driver</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93648">Bug 93648</a> - Random lines being rendered when playing Dolphin (geometry shaders related, w/ apitrace)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93650">Bug 93650</a> - GL_ARB_separate_shader_objects is buggy (PCSX2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93667">Bug 93667</a> - Crash in eglCreateImageKHR with huge texture size</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93717">Bug 93717</a> - Meta mipmap generation can corrupt texture state</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93722">Bug 93722</a> - Segfault when compiling shader with a subroutine that takes a parameter</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93731">Bug 93731</a> - glUniformSubroutinesuiv segfaults when subroutine uniform is bound to a specific location</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93761">Bug 93761</a> - A conditional discard in a fragment shader causes no depth writing at all</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93790">Bug 93790</a> - [HSW] Use after free with compute programs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93957">Bug 93957</a> - [HSW] Mishandling of sample count when using an attachment-less framebuffer (assertion error)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93961">Bug 93961</a> - virgl build failure after 2016-02-01 changes - no previous prototype for 'virgl_drm_winsys_create'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93989">Bug 93989</a> - build: flex-2.5.39 seems to be failing for glsl_lexer.ll</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94016">Bug 94016</a> - make check MesaExtensionsTest.AlphabeticallySorted regression</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94019">Bug 94019</a> - [bisected] 3D acceleration broken with gallium/radeon: just get num_tile_pipes from the winsys</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94073">Bug 94073</a> - Miscompilation of abs_vec3_vert_xvary_ref.vert in WebGL conformance</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94081">Bug 94081</a> - [HSW] compute shader shared var + atomic op = fail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94091">Bug 94091</a> - Tonga unreal elemental segfault since radeonsi: put image, fmask, and sampler descriptors into one array</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94100">Bug 94100</a> - [HSW] compute indirect dispatch with 0 work groups causes gpu hang</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94186">Bug 94186</a> - Crash when launching glxinfo and World of Warcraft with RV790</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94188">Bug 94188</a> - define (or undef) defined behaves stupidly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94193">Bug 94193</a> - [llvmpipe] Line antialiasing looks different when GL_LINE_STIPPLE is enabled with pattern 0xffff</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94481">Bug 94481</a> - softpipe - access violation in img_filter_2d_nearest</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94524">Bug 94524</a> - Wrong gl_TessLevelOuter interpretation for isolines</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94595">Bug 94595</a> - [Mesa AMD&swrast] Texture views attached as framebuffers return their viewed tecture's color encoding and render incorrectly</li>
</ul>
<h2>Changes</h2>
@@ -78,7 +289,7 @@ Microsoft Visual Studio 2013 or later is now required for building
on Windows.
Previously, Visual Studio 2008 and later were supported.
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92850">Bug 92850</a> - Segfault loading War Thunder</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93767">Bug 93767</a> - Glitches with soft shadows and MSAA in Knights of the Old Republic 2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94955">Bug 94955</a> - Uninitialized variables leads to random segfaults (valgrind log, apitrace attached)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94994">Bug 94994</a> - OSMesaGetProcAdress always fails on mangled OSMesa</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=95026">Bug 95026</a> - Alien Isolation segfault after initial loading screen/video</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=95133">Bug 95133</a> - X-COM Enemy Within crashes when entering tactical mission with Bonaire</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=95164">Bug 95164</a> - GLSL compiler (linker I think) emits assertion upon call to glAttachShader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=95251">Bug 95251</a> - vdpau decoder capabilities: not supported</li>
</ul>
<h2>Changes</h2>
<p>Boyuan Zhang (1):</p>
<ul>
<li>radeon/uvd: alignment fix for decode message buffer</li>
</ul>
<p>Brian Paul (2):</p>
<ul>
<li>st/mesa: fix sampler view leak in st_DrawAtlasBitmaps()</li>
<li>gallium/util: initialize pipe_framebuffer_state to zeros</li>
</ul>
<p>Chad Versace (1):</p>
<ul>
<li>dri: Fix robust context creation via EGL attribute</li>
</ul>
<p>Egbert Eich (1):</p>
<ul>
<li>dri2: Check for dummyContext to see if the glx_context is valid</li>
</ul>
<p>Emil Velikov (5):</p>
<ul>
<li>docs: add sha256 checksums for 11.2.1</li>
<li>docs: update the sha256 checksums for 11.2.1</li>
<li>cherry-ignore: remove duplicate commit</li>
<li>cherry-ignore: ignore the GetSamplerParameterIuiv{EXT,OES} fixups</li>
<li>Update version to 11.2.2</li>
</ul>
<p>Eric Anholt (4):</p>
<ul>
<li>vc4: Fix subimage accesses to LT textures.</li>
<li>vc4: Add support for rendering to cube map surfaces.</li>
<li>vc4: Fix tests for format supported with nr_samples == 1.</li>
<li>vc4: Make sure we recompile when sample_mask changes.</li>
</ul>
<p>Frederic Devernay (1):</p>
<ul>
<li>glapi: fix _glapi_get_proc_address() for mangled function names</li>
</ul>
<p>Ilia Mirkin (2):</p>
<ul>
<li>nvc0: fix retrieving query results into buffer for timestamps</li>
<li>nouveau/video: properly detect the decoder class for availability checks</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>i965/fs: Properly report regs_written from SAMPLEINFO</li>
</ul>
<p>Jonathan Gray (1):</p>
<ul>
<li>egl/x11: authenticate before doing chipset id ioctls</li>
</ul>
<p>Jose Fonseca (1):</p>
<ul>
<li>winsys/sw/xlib: use correct free function for xlib_dt->data</li>
</ul>
<p>Kenneth Graunke (3):</p>
<ul>
<li>i965: Fix clear code for ignoring colormask for XRGB formats on Gen9+.</li>
<li>glsl: Convert lower_vec_index_to_swizzle to a rvalue visitor.</li>
<li>glsl: Lower vector_extracts to swizzles after lower_vector_derefs.</li>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.