Basically, when the conditions of a csel diverge, we scalarize to avoid
going into weird code paths during emit. We could be doing better, but
this case can't occur organically from GLSL as far as I can, though it
does fix lowered atan2.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
A previous commit by Tomeu aborted RA early, which solves the memory
corruption issue, but then generates an incorrect compile. This fixes
that.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This handles the usual case. 8-bit register access parallels 16-bit
access, but with one major caveat: in 8-bit mode, only half of the
register file is actually (directly) accessible as sources. In
particular, for each 16-bit integer register (hrN), we can only index a
*single* 8-bit integer (qrN), corresponding to the lower 8-bits. To get
the upper 8-bits, it is required to do an explicit shift. For example,
to add the bytes of a 16-bit integer hr0.x and get the result as an
8-bit qr0, you'd need to do something like:
ilsr hr1.x, hr0.x, #8
iadd qr0.x, qr0.x, qr1.x
This scheme diverges from 32-bit registers, in that both the upper and
lower halves of a 32-bit register are individually accessible as a pair
of half registers. For contrast, to add the lower and upper 16-bits of a
32-bit integer r0.x, you can just:
iadd hr0.x, hr0.x, hr1.x
Since hr1.x = upper 16-bit of r0.x.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Meanwhile, we're forced to disable dest_override, since it's not yet
clear how this interacts with other bitnesses (it'll likely need to be
overhauled in any case).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
We silently ignored certain bits of the mask, which causes issues when
disassembly 8/64-bit ops.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
In preparation for 8-bit and 64-bit operands, let's not reinforce the
32-bit-centric biases in the ISA.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Since it is dependent on the tile mode (ie. disabled for smaller mipmap
levels), we should handle it a similar way to fd_resource_level_linear().
The code previously mostly did the right thing because the old helper
took the tile mode.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Best to keep it encapsulated in the helper which returns layer/level
offset (and actually use that helper everywhere) rather than spreading
the logic around the code.
Also add a helper to find UBWC offset, to complete the encapsulation.
Signed-off-by: Rob Clark <robdclark@chromium.org>
If someone is importing a buffer, we can't really know the state of it's
contents, so assume it is valid.
Signed-off-by: Rob Clark <robdclark@chromium.org>
There are still some fallbacks we'll need to handle before we can enable
UBWC by default. I think we may need to fallback to uncompressed if
image atomic operations are used. And we still need to sort out how to
handle image and sampler views of compressed resources if the image/
sampler view is using a format that does not support compression. (I
think the latter should hopefully be uncommon outside of deqp/piglit.)
But at least this gets us to the point where supertuxkart works properly
with UBWC enabled ;-)
Signed-off-by: Rob Clark <robdclark@chromium.org>
A few fixes that get UBWC working for the games/benchmarks where I
noticed problems before (in particular and manhattan, and stk (modulo
image support for UBWC when compute shaders are used for post-process
effects):
+ fix the size of the UBWC meta buffer (ie, the offset to color
pixel data) that is returned by ->fill_ubwc_buffer_sizes()
+ correct size/layout for 8 and 16 byte per pixel formats
+ limit the supported formats.. Note all formats that can be
tiled can be compressed.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Fixes dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 and .20
ca3eb5db66 went from silently truncating
the constant state, which was also the wrong thing to do, to an assert.
Which then showed up in a couple of dEQPs. Actually there is nothing
wrong with larger constant file so just drop the assert.
Signed-off-by: Rob Clark <robdclark@chromium.org>
with that we can simplify code where nir vectors are created
v2: merge both lines in nir_vec
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now that dlist compilation again knows if it is inside glBegin/glEnd,
we can leave the decision if aliasing should occur to the vertex attribute
setter functions instead of doing that at glArrayElement time.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
We have to use _mesa_inside_dlist_begin_end instead of
_mesa_inside_begin_end to see if we are inside a glBegin/glEnd block in
case of display lists.
So split the is_vertex_position function used in vertex attribute processing
into a imm and dlist variant and use the appropriate _mesa_inside_begin_end
variant.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
That seems to be lost somewhere. Is needed for correct outside begin/end
detection in display list compilation. And is needed for correct aliasing
in dlists restablished in the next changes.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Is no longer used, so we have less occasions where NewState is non zero.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Now this part of gl_context state is unused and can be removed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
In glArrayElement, use the bitmask trick to just walk the enabled
vao arrays. This should be about equivalent in execution time to
walk the prepare aelt_context list. Finally this will allow us to
reduce the _mesa_update_state calls in a few patches.
v2: Add comments.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
In the glArrayElement implementation, use glVertexAttrib*NV type
functions for fixed function attributes. We do the same in display
execution when the list is replayed using immediate mode attribute
functions. Using a single set of function pointers enables to
use a unified loop to walk the vertex array attributes.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
For access to glArrayElement methods factor out a function to
get the table lookup index for normalized/integer/double access.
The function will be used in the next patch at least twice.
v2: Use vertex_format_to_index instead of NORM_IDX.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
This new pass (which isn't even compile-tested) attempts to determine
the ALU type of all the SSA values in a function impl. It takes a
greedy approach and assigns intness or floatness to everything it thinks
can possibly contain an int or a float. Some values will be labled as
both int and float and some will be labled as neither and it is up to
the caller to decide what to do with this information. However, for a
"nice" shader where the original source contained no bit-casts and no
implicit bit-casts were introduced by optimizations, there shouldn't be
any overlap in the two sets save for the odd CSEd zero constant.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
These add a lot of complexity, and I currently can't measure any
performance benefit from having them. In the past, I seem to recall
seeing a benefit in drawoverhead scores, but currently it looks like
dropping them is either a wash or 1-2% faster.
Drop them to simplify allocations.
The STATE_BASE_ADDRESS "Size" fields can only hold 0xfffff in pages,
and 0xfffff * 4096 = 4294963200, which is 1 page shy of 4GB.
So we can't use the top page.
Both drivers are feature-complete and should be running more-or-less at
perf at this point. Drop the warning.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Just don't emit the transform array at all if there are no transforms
v2:
- Don't use len(array) > 0 (Dylan)
- Keep using ARRAY_SIZE to make the generated C code easier to read
(Jason).
st/mesa's PBO upload path binds a vertex shader that doesn't use any
textures, but leaves the existing sampler views bound in place. This
was tricking us into thinking the PBO destination might be bound for
texturing in some cases. In Civilization VI, this fixes a false self-
dependency issue that was preventing CCS_E compression on upload.
Fixing this slightly improves frame times.
This makes nm not required, but used if found. In general I imagine that
this means that on windows nm wont be found, and on other platforms it
will.
v2: - fix gbm and egl symbols check tests to only be run if nm is found
- reword commit message to reflect the code change
Reviewed-by: Eric Anholt <eric@anholt.net>
So that it can be implicitly disabled on windows, where it doesn't
compile.
v2: - Use an auto-option rather than automagic.
- fix shader_cache check (== -> !=)
v4: - Use new with_shader_cache instead of get_option('shader-cache')
elsewhere in the meson build
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows them to default to false on windows, but default to true
elsewhere. As a side effect turning off shared-glapi now automatically
turns off gles. Shared glapi remains a boolean defaulting to true.
v5: - new in this version
Reviewed-by: Eric Anholt <eric@anholt.net>
Somewhere down in the depths of the mingw headers 'interface' is
defined, change it to iface like a similar patch did.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows the identifier to be used even if shared-glapi isn't build,
which simplifies a bunch of things.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Because the new raw/struct intrinsics are buggy with LLVM 8
(they weren't marked as source of divergence), we fallback to the
old instrinsics for atomic buffer operations only. This means we need
to apply the indexing workaround for GFX9. The load/store
operations still use the new LLVM 8 intrinsics.
The fact that we need another workaround is painful but we should
be able to clean up that a bit once LLVM 7 support will be dropped.
This fixes a GPU hang with AC Odyssey and some rendering problems
with Nioh.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110573
Fixes: 31164cf5f7 ("ac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Not all package managers / users will install perl into /usr/bin,
but /usr/bin/env /should/ always be present.
Using /usr/bin/env means that we can't give the -w argument to Perl,
so I added `use warnings' in the script.
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Found while running Talos Principle.
As far as I can tell running a draw call with a pipeline having push
constants without the application having called vkCmdPushConstants
gives undefined push constant values.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Since 09f1de97a7 "anv,i965: Lower away image derefs in the driver"
the backend compiler is not expected to handle any derefs, so let's
assert on it.
This helps identifying problems when a deref is not lowered and
"leaks" into the backend compiler.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The MASK macro is used in the RANGE macro, and it should
return the pre-bitset word mask for the (b) value.
i.e.
BITSET_MASK(0) should be undefined since it's meaningless.
BITSET_MASK(31) should give 0x7fffffff
BITSET_MASK(32) should give 0xffffffff
BITSET_MASK(33) should give 0x00000001
BITSET_MASK(64) should give 0xffffffff
However then BITSET_RANGE ends up broken for cases where
it's (b) value is the 0,32,64 value as in that case the lower
mask would be 0 not 0xffffffff.
This fixes the unit tests that I've added, and my code that
uses bitsets.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: bb38cadb1c "More GLSL code"
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
The last test here currently fails as there is a bug in bitset.h
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This has a couple of hardcoded vec4 limits in it, change them
to the proper sizing to avoid future issues.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is a port of Danylo's eca4a6548d
which fixed the hang on i965. It fixes GPU hangs in his new Piglit
test, arb_blend_func_extended-dual-src-blending-discard-without-src1.
I avoided my own review feedback here, and decided to simply adjust
3DSTATE_PS_BLEND rather than BLEND_STATE_ENTRY[0]. It has never been
clear to me which the hardware uses in every case. However, whacking
the enable in 3DSTATE_PS_BLEND seems to be sufficient to fix the hang,
and that packet is already dynamic, so it's easy to handle. I'd rather
avoid making BLEND_STATE_ENTRY[0] dynamic unless I have to.
It is an input but it comes in as part of the shader payload and doesn't
count towards the limits.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
During a rebase, it seems I accidentally broke the contents-menu,
leading to a duplicate link to freedesktop.org. This was obviously not
intended. Let's fix this.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 7eee13c467 ("docs: use dl/dd instead of blockquote for
freedesktop link")
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Support nir_op_ftrunc by turning it into a mov with a round to integer
output modifier.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Merge the following into `meson-main`/`meson-loader-classic-dri`/
`meson-gallium-swr`:
- meson-vulkan
- meson-gallium-drivers-other
- meson-gallium-st-other
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
[ Michel Dänzer ]
* Rebase and fix up commit log.
* Don't set VULKAN_DRIVERS in meson-loader-classic-dri.
* Remove extraneous whitespace.
* Squash in follow-up fixes.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
[ anholt]
* Add a note why nine and swrast landed where they did.
* Switch from s/meson-vulkan/meson-main/ to
s/meson-loader-classic-dri/meson-main/ which I think was the original
intent
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (anholt changes)
Acked-by: Dylan Baker <dylan@pnwbakers.com>
- Add GBM_BO_IMPORT_FD_MODIFIER to documentation of supported foreign
object types
- Add newline before documentation block
- Improve language
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Stone <daniels@collabora.com>
- make sure compute shader derivatives are exposed for all extensions
- unify duplicated code
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
It's done by:
- decrease the number of frames in flight by 1
- flush before throttling in SwapBuffers
(instead of wait-then-flush, do flush-then-wait)
The improvement is apparent with Unigine Heaven.
Previously:
draw frame 2
wait frame 0
flush frame 2
present frame 2
The input lag is 2 frames.
Now:
draw frame 2
flush frame 2
wait frame 1
present frame 2
The input lag is 1 frame. Flushing is done before waiting, because
otherwise the device would be idle after waiting.
Nine is affected because it also uses the pipe cap.
This roughly mirrors what we get from autotools. There's a few
differences, though:
1. The "exec_prefix" output has been dropped. Meson doesn't support
this, so it makes no sense here.
2. The "llvm-config" output has been dropped. Meson abstracts dependency
discovery a bit more than our autotools build-system does, so it's
not easy to get this information as-is.
3. HUD extra stats, SWR archs, Shared/Static libs and CFLAGS / CXXFLAGS /
LDFLAGS has been dropped. These can be inspected by "meson configure".
4. How we set defines works quite differently in our Meson build-system,
and the result isn't quite the same. In particular, the DEFINES output
has been dropped, to avoid having to refactor the code too much.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109326
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Variables are cheap, and there's little reason for the dri and gallium
drivers to work on the same variable for the driver list. So let's split
these in two separate lists instead.
This makes it easier to inspect these after-the fact, for instance
for generating a summary of build-settings.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
This way we can mark the dri_drivers and dri_link arrays as temporary,
as all knowledge about them are contained in a single build-file with
clearly visible limited life-span.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
We just need to do a sequence of commands to flush the cache.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Wire up support to sample from the fb (and force GMEM rendering when we
have fb reads). The existing GLSL IR lowering for blend_equation_advanced
does the rest.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Lower load_output to txf_ms_fb and add support for the new texture fetch
instruction.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Apparently we never hit this path. Or at least haven't for a rather
long time. But in either case (load_deref or load_frag_coord), we can
just directly use the intrinsic's ssa dest. So stop passing the
nir_variable (which would be NULL in the load_frag_coord case) around
and instead just use &intr->dest.ssa.
(This ofc means we need to setup the cursor to insert *after* the
instruction, which seems to be another bug of the original
implementation.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
And a comment.. since we are mixing units of bytes/dwords/vec4,
hopefully this will avoid some unit confusion.
Signed-off-by: Rob Clark <robdclark@chromium.org>
It isn't quite as simple as not running the pass, since with packed
varyings we get load_ubo for block==0 (ie. the "real" uniforms). So
instead run the pass normally but decline to lower anything in
block > 0
Signed-off-by: Rob Clark <robdclark@chromium.org>
Since we emit UBO regions INDIRECTly (ie. not copied into cmdstream but
emit by EXT_SRC_ADDR) we need to keep them 4*vec4 aligned. Which the
code already mostly did, except for aligning the first UBO region itself
(ie. the one after block==0 which is the "real" uniforms).
Fixes: 893425a607 freedreno/ir3: Push UBOs to constant file
Fixes: 3c8779af32 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS
Signed-off-by: Rob Clark <robdclark@chromium.org>
Otherwise we zero out the state again, but all the UBO loads that we
could lower are already lowered. End result is that we didn't emit the
uniforms for lowered UBO access in any case where multiple shader
variants are used.
Fixes: 893425a607 freedreno/ir3: Push UBOs to constant file
Fixes: 3c8779af32 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS
Signed-off-by: Rob Clark <robdclark@chromium.org>
This is useful to normalize the numbers written into the output file
as those number are accumulated over a period of time and number of
frames.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2: switch to VkBase{In,Out}Structure
v3: Add timestamps at begin/end of primary command buffers to estimate
gpu time spent per submission (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v2)
This significantly reworks how numbers displayed are computed. We
accumulate operations written into command buffers and add those to
the device when submitted to a queue. These collected values are then
used to compute per frame overlay data.
We also accumulate the data over the sampling fps period to produce
numbers for that period of time.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This will be used to copy chains of structures so that we can alterate
some of them.
v2: Drop vk_util.h include (Eric)
Use VkBaseInStructure directly (Eric)
v3: Drop --platforms= param to generator script, instead produce a
file with #ifdef based what platforms are compiled.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
nir_opt_algebraic is currently one of the most expensive NIR passes,
because of the many different patterns we've added over the years. Even
though patterns are already sorted by opcode, there are still way too
many patterns for common opcodes like bcsel and fadd, which means that
many patterns are tried but only a few actually match. One way to fix
this is to add a pre-pass over the code that scans it using an automaton
constructed beforehand, similar to the automatons produced by lex and
yacc for parsing source code. This automaton has to walk the SSA graph
and recognize possible pattern matches.
It turns out that the theory to do this is quite mature already, having
been developed for instruction selection as well as other non-compiler
things. I followed the presentation in the dissertation cited in the
code, "Tree algorithms: Two Taxonomies and a Toolkit," trying to keep
the naming similar. To create the automaton, we have to perform
something like the classical NFA to DFA subset construction used by lex,
but it turns out that actually computing the transition table for all
possible states would be way too expensive, with the dissertation
reporting times of almost half an hour for an example of size similar to
nir_opt_algebraic. Instead, we adopt one of the "filter" approaches
explained in the dissertation, which trade much faster table generation
and table size for a few more table lookups per instruction at runtime.
I chose the filter which resulted the fastest table generation time,
with medium table size. Right now, the table generation takes around .5
seconds, despite being implemented in pure Python, which I think is good
enough. Based on the numbers in the dissertation, the other choice might
make table compilation time 25x slower to get 4x smaller table size, but
I don't think that's worth it. As of now, we get the following binary
size before and after this patch:
text data bss dec hex filename
11979455 464720 730864 13175039 c908ff before i965_dri.so
text data bss dec hex filename
12037835 616244 791792 13445871 cd2aef after i965_dri.so
There are a number of places where I've simplified the automaton by
getting rid of details in the LHS patterns rather than complicate things
to deal with them. For example, right now the automaton doesn't
distinguish between constants with different values. This means that it
isn't as precise as it could be, but the decrease in compile time is
still worth it -- these are the compilation time numbers for a shader-db
run with my (admittedly old) database on Intel skylake:
Difference at 95.0% confidence
-42.3485 +/- 1.375
-7.20383% +/- 0.229926%
(Student's t, pooled s = 1.69843)
We can always experiment with making it more precise later.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
According to RadeonSI, this seems to be required by the hardware
to avoid GPU hangs. I think I just forgot to set that bit when I
implemented VK_EXT_transform_feedback.
This fixes a GPU hang with Space Engineers and DXVK.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110291
Fixes: b4eb029062 ("radv: implement VK_EXT_transform_feedback")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
valgrind crashes when we try to initialize host logging. This
env var can be used to disable logging.
v2: rebase onto "svga: move host logging to winsys".
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Neha Bhende <bhenden@vmware.com>
All other pages has the heading as ghe first thing in the article. Let's
clean this up for consistency.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The FAQ is the only article we have that uses a centered heading, which
makes it look odd compared to the other articles. Let's drop the
centering for consistency.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
HTML already have a way of doing automatically ordered lists, so let's
use that instead of open-coding one.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This markup seems to assume paragraphs survive across block-elements,
which isn't the case. Let's rectify that.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
It's illegal to nest block-level elements such as <pre> inside <p> in
HTML. This means that when the paragraphs gets closed after a <pre>-tag,
we end up closing a non-existent tag, so the browser inserts a dummy
<p>-tag. This is entirely pointless, so let's just close these tags
before the <pre>-tag instead.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
paragraphs can't contain lists, and attempting to close them after
the list just cause an extra, empty paragraph to be created. We don't
want that, so let's close the paragraphs before the list intead.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
A list-item must be openened before it can be closed. So let's replace
this closing tag with an opening tag.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The blockquote happens to match the indentation of the other lists for
most browsers, but this isn't a guarantee. Let's instead use a
definition-list, which is more strongly connected to a list, so it's
more likely to have the same indention.
This also makes sure that we don't have similar padding on the
right-hand side, in case we change the text-size.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
<b>-tags aren't allowed in the root of <body>, so let's replace these
with <h2>-tags with some CSS to make them appear as bold.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Even in preformatted blocks, ampersands should be escaped. Let's correct
this, in case of strict parsers.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The '>'-symbol should usually be escaped to avoid confusing strict
parsers. While it's very unlikely to cause issues as-is, let's quite it
for good measure.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
It previously used var->type instead of deref_instr->type and didn't
handle 64-bit outputs.
This fixes lots of transform feedback CTS tests involving transform
feedback and geometry shaders (mostly
dEQP-VK.transform_feedback.fuzz.random_geometry.*)
v2: fix writemask widening when comp != 0
v3: fix 64-bit variables when comp != 0, again
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
It's generally nicer to do this in terms of em units, as that scales
better with text-sizes, if we ever decide to change them.
The result is slightly larger than before, but only by a couple of
pixels.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
With "display: flex;" we can make this a bit more automatic, not
requiring a bunch of values to be of specific values to get the right
centering.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This is a bit tidier than to set a background on the h1-text, requiring
it to be full height and all.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
While it's legal to omit the last semicolon in a CSS block, it's
generally not considered good style, as it makes it harder to add new
lines.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
These attributes has been commented out since 2005; I don't think
there's a big chance of them making a return as-is.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Tabs has been around as the indention style of this file since it was
created. Some newer CSS has added double-spaces, but let's keep it
consistent.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This error code typically indicated that a buffer object that was referenced
by the command stream was being used for CPU access by another client.
The correct action here is to retry after a while. Use usleep() until we
have proper kernel support for this wait.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The file vmwgfx_drm.h was a bit outdated. Update to a recent version,
including defines supporting coherent memory.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Some constant- and texture upload buffer data may bounce in malloced
buffers before being transferred to hardware buffers. In the case of
texture upload buffers this seems to be an oversight. In the case of
constant buffers, code comments indicate that we want to avoid mapping
hardware buffers for reading when copying out of buffers that need
modification before being passed to hardware. In this case we avoid
data bouncing for upload manager buffers but make sure buffers that
we read out from stay in malloced memory.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We didn't have the path using this command enabled as
typically we take an alternate path using DMA uploads.
Emable it so that we can exercise that code-path by turning off
the DMA path.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The vmwgfx kernel module has a compatibility mode for user-space that is
not guest-backed resource aware. Add an environment variable to facilitate
testing of this mode on guest-backed aware kernels: if the environment
variable SVGA_FORCE_HOST_BACKED is defined, the driver will use host-backed
operation.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
For consistency with ac_build_llvm8_buffer_{load,store}_common
helpers and that will help a bit for removing the vec3 restriction.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Per the Vulkan spec 1.1.107, the predicate is a 32-bit value. Though
the AMD hardware treats it as a 64-bit value which means it might
fail to discard.
I don't know why this extension has been drafted like that but this
definitely not fit with AMD. The hardware doesn't seem to support
a 32-bit value for the predicate, so we need to implement a workaround.
This fixes an issue when DXVK enables conditional rendering with RADV,
this also fixes the Sasha conditionalrender demo.
Fixes: e45ba51ea4 ("radv: add support for VK_EXT_conditional_rendering")
Reported-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The hardware actually rounds before conversion. This now matches
what values are used when performing fast clears vs slow clears.
This fixes a rendering issue with Far Cry 3&4. This also fixes
a bunch of CTS tests that use a 8-bit UNORM format (only when
the 512*512 image size hint is manually disabled).
Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I want to be able to do BITSET_TEST() != BITSET_TEST() and this isn't
currently possible because BITSET_TEST() returns a random bit. Compare
to zero to get an actual Boolean.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I'm not sure what triggered this, but building with
scons platform=windows toolchain=crossmingw machine=x86 build=profile
with MinGW g++ 7.3 or 7.4 causes an internal compiler error.
We can work around it by forcing -O1 optimization.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
It's needed by the next pbuffer fix, which changes the behavior of
draw_buffer_enum_to_bitmask, so it can't be used to help with error
checking.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
It has been noted that the lima GP has a limit of 512 instructions,
after which the shaders don't work and fail silently.
This commit adds a check to make the shader compilation abort when the
shader exceeds this limit, so that we get a clear reason for why the
program will not work.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Small buffer subdatas which are essentially doing a memcpy were getting
bogged down by all the overhead of creating new transfers.
Signed-off-by: David Riley <davidriley@chromium.org>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Intersecting transfer queue entries allow for the possibility of
extending an existing transfer instead of creating a new one (and all
the associated mappign/unmapping).
Signed-off-by: David Riley <davidriley@chromium.org>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Recently we added checks to try and deny multisampled shader images.
Unfortunately, this messed up imageBuffers, which have sample_count = 0,
which are also used in PBO download, causing us hit CPU map fallbacks.
Fixes: b15f5cfd20 iris: Do not advertise multisampled image load/store.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
If no view is bound we still should reset the override to 0
and array mode.
This should fix misrendering in firefox WebRender since
the pbo sampler was removed.
Fixes: 1250383e36 (st/mesa: remove sampler associated with buffer texture in pbo logic)
This #include is needed for `NULL`, which is used on all OSes, not just Linux.
Reported-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Fixes: 316964709e "util: add os_read_file() helper"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
If we don't update this for all primitive-types, we end up rendering
slightly offset points and lines up until the point where the first
triangle gets drawn. This is obviously not correct, and violates
OpenGL's repeatability rule.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: ca9c413647 ("softpipe: Respect gl_rasterization_rules in
primitive setup.")
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Use a different arrangement of constants to allow more ffma.
A vec4 backend will now use 3 fma for yuv_to_rgb. On freedreno/ir3, it is
down from 10 to 7 alu (4 fma, 3 mul, 3 add to 7 fma). Other backends
shouldn't be hurt.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Since softpipe doesn't truely support multisample, I've not added softpipe
to the "Enhanced per-sample shading" even though with the advertised GLSL
level ARB_gpu_shader5 is advertised.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This will enable calls to the interpolateAt* functions, but also a bunch
of other features.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Like with interpolatAtSample this is also not really implementing the
according sampling and will only work correctly for pixels that are fully
covered, but since softpipe only supports one sample this is good enough
for now.
v2: Correct spelling (Roland Scheidegger)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Since for this opcode the offsets are given manually the function
should actually also work for non-zero offsets, but the related piglits
only ever test with offset 0. Accordingly the patch satisfies
"fs-interpolateatoffset-*".
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Softpipe doesn't support more than one sample, so this function
implements the interpolation at sample 0 and adds a stub to make it
possible to interpolate at other samples.
As it is this makes the piglits "fs-interpolateatsample-*" pass, but
they only ever test sample 0 anyway.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This adds entry points for correcting the interpolation values if the
interpolation is done by using one of the interpolateAt* functions.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Now that the LOD is evaluated up front the cube faces can also be
evauate on a per sample basis instead of using the quad.
This fixes a large number of deqp gles 3 and 31 cube texture tests.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This enables the use of explicit gradients.
Also remove an unused parameter when changing the interfaces.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The shadow evaluation compare parameter is stored in different locations,
depending on the texture type. Move the values to a common location free
the lod storage and to be able to reduce the number of parameters.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The value is stored in the lod components and this will be overwritten
when swithcing to the new code path.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The EGL_KHR_create_context spec says:
"If an OpenGL context is requested and the values for attributes
EGL_CONTEXT_MAJOR_VERSION_KHR and EGL_CONTEXT_MINOR_VERSION_KHR,
when considered together with the value for attribute
EGL_CONTEXT_OPENGL_FORWARD_COMPATIBLE_BIT_KHR, specify an OpenGL
version and feature set that are not defined, than an
EGL_BAD_MATCH error is generated."
This case is already correctly handled a bit below in
the same source file.
The correct handling was added by commit: 63beb3df
Reported-by: Ian Romanick <idr@freedesktop.org>
Here: https://bugzilla.freedesktop.org/show_bug.cgi?id=92552#c9
Fixes: 11cabc45b7 "egl: rework handling EGL_CONTEXT_FLAGS"
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Some of the opts are not called in the general optimastion loop
in the state trackers glsl -> nir conversion. We need to call
the radeonsi specific optimisation once before scanning over
the nir otherwise we can end up gathering info on code that
is later removed.
Fixes an assert in the piglit test:
./bin/varying-struct-centroid_gles3
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Needs to update max_half_reg, or be remapped to full reg and update
max_reg accordingly, depending on generation..
Signed-off-by: Rob Clark <robdclark@chromium.org>
When discard_delayed_release is set (default), we allocate more buffers
and use a different buffer wait path.
Check if it is set, and use the old paths if not
(the alternative buffer wait path could still be used, but there is no
advantage to using it in this case).
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
thread_submit's throttling depending on the number of internal
back buffers, and wasn't affected by the driver requested
throttling value.
Now it is.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Optimize writeonly by passing PIPE_TRANSFER_WRITE
for these buffers instead of the safer
PIPE_TRANSFER_READ_WRITE.
This seems to improve the performance of d3d8 games
using d3d8to9.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
We used TGSI_SEMANTIC_FOG for fog,
however on vs/ps 3, fog is allowed to have
4 components (even on the ff pipeline according
to a wine test).
Since gallium's TGSI_SEMANTIC_FOG has only one
component, use TGSI_SEMANTIC_GENERIC instead.
Fixes:
https://github.com/iXit/Mesa-3D/issues/346
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
The shader constant buffer size with the
constant compaction code can vary depending
on the shader variant compiled (for example if
fog constants are required, etc).
Thus instead of using fixed size for the shader,
add in the variant cache the size required, pass it
to the context, and use this value.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
As with the constant compaction we map the constant
slots to new slots, we need to pass that information
to the context which is in charge of uploading
the constants.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
When indirect addressing is not used, we know exactly
which constants are accessed, and thus can
have them located in consecutive slots.
We thus parse again the shader with a slot map
for compaction.
The path contains the work inside nine_shader.c for this
path, but it needs some other commits to work, and thus
is not enabled yet by this commit.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Regroup all the param->rel assertions into one assertion for better clarity
and better covering.
param->rel on an input can only happen with float constants for vs,
or with inputs on vs/ps 3.0.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Boolean and Integer constants are used in d3d9 for flow control.
Boolean are used for if/then/else and Integer constants
for loops.
The compilers can generate better code if these values are known
at compilation.
I haven't met so far a game that would change the values of these
constants frequently (and when they do, they set to the values used
for the previous draw call, and thus the changes get filtered out).
Thus it makes sense to inline these constants and recompile the shaders.
The commit sets a bound to the number of variants for a given shader
to avoid too many shaders to be generated.
One drawback is it means more shader compilations. It would probably
make sense to compile these shaders asynchronously or let the user
control the behaviour with an env var, but this is not done here.
The games I tested hit very few shader variants, and the performance
impact was negligible, but it could help for games with uber shaders.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
dynamic textures seem to have predictable stride. This stride
should be the same as for a ram buffer.
It seems some game don't check the actual stride value, assuming
it to be the expected one.
Thus this workaround (protected by drirc option) is to use an intermediate
ram buffer.
Fixes Rayman Legends texture issues when enabled.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Use nine_context_box_upload instead of locking the pipe
for volume upload with format conversion.
nine_context_box_upload already handles format
conversion.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Use nine_context_box_upload instead of locking the pipe
for surface upload with format conversion.
nine_context_box_upload already handles format
conversion.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
SINCOS takes an input with replicated swizzle.
the swizzle can be on any component, not just x.
Enable it to read from any component, but also
use a temporary register to avoid dst/src aliasing.
No known game is fixed by this change as it seems
the input swizzle is commonly on x for this instruction,
and src and dst don't alias.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Systemmem has a specific behaviour we don't
mimick exactly.
That makes Halo feel free to use nooverwrite
with it all the time, even when reading again
at the same location.
Ignore nooverwrite to have proper synchronization.
Fixes: https://github.com/iXit/Mesa-3D/issues/348
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
For many ps 1.X instructions, we were reading the
texcoords directly, instead of through tx_src_param,
resulting in modifiers getting ignored.
Use tx_src_param for all these instructions.
Fixes: https://github.com/iXit/Mesa-3D/issues/337
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
d3d's nooverwrite and gallium's unsynchronized
have different semantics.
Indeed nooverwrite says the applications won't
write to locations needed by previous draws,
which is less strong than unsynchronized which
won't synchronize previous writes.
Thus in case app is locking without discard/nooverwrite,
then using nooverwrite, we need to add a
synchronization.
Fixes: https://github.com/iXit/wine-nine-standalone/issues/29
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Previously nine_state_clear was not using
NineBindBufferToDevice and NineBindTextureToDevice
to unbind buffers and textures (but used nine_bind)
This was resulting in an uncorrect bind count for these
resources.
Combined with
0ec4e5f630
Some buffers were scheduled to be uploaded directly
after they were locked (because the bind count incorrectly
assumed they were needed for the next draw call),
which resulted in uploads before the data was written.
To simplify a bit the code (and because I needed to
add a pointer to device),
remove the stateblock usage from nine_state_clear and
rename to nine_device_state_clear.
Fixes:
https://github.com/iXit/Mesa-3D/issues/345
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
When a draw call is emited, buffers in the
device->update_buffers list are uploaded.
This patch removes buffers from the list if they
are not bound anymore.
Behaviour found studying:
https://github.com/iXit/Mesa-3D/issues/345
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
When a draw call is emited, textures in the
device->update_textures list are uploaded.
This patch removes textures from the list if they
are not bound anymore.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
This seems to fix Rayman (which adds things
to the RCP result, and thus gets an Inf),
while not having regressions.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
No-one reported bugs for that, but is seems
c442dd7890
and previous commits used APIs not defined until
nine minor version 3.
This patch should prevent crash in this case.
Also turn off the resize feature in this case,
as we won't prevent a buffer leak anymore.
Cc: "19.0" mesa-stable@lists.freedesktop.org
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
These were updated in version 1.1.106 of vulkan.h to make more sense
with the extension names. We may as well keep with the times.
See also: 90108deb27 "anv: Update to use the new features struct names"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
These were updated in version 1.1.106 of vulkan.h to make more sense
with the extension names. We may as well keep with the times.
See also: 90108deb27 "anv: Update to use the new features struct names"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Newer gens (> 9) will start doing the linear -> sRGB conversion of the
clear color for us, if we use a sRGB surface format. So let's make sure
that doesn't happen and keep the same semantics as before.
Even though the hardware could convert the clear color for us during
fast clear, that converted color is only used for sampling. For resolve,
the original color would be used (without the conversion). So we convert
it ourselves and the same converted color gets used for both sampling
and resolving, simplifying the whole logic.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were disabling fast clears if the view format had a different
colorspace than the resource format (sRGB vs linear or vice-versa). But
we actually support them if we use the view format to decide if we
should encode the clear color into sRGB colorspace.
Also add a missing linear -> sRGB surface format conversion (we don't
want the clear color to be encoded to sRGB again during resolve).
v2: Do not track sRGB colorspace during fast clears (Nanley).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
vmw_screen.h uses dev_t which is defines in sys/types.h
this header is required to be included for getting dev_t
definition. This issue happens on musl C library, it is hidden
on glibc since sys/types.h is included through another
system headers
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This patch claimed that the autotools build generates libGLESv1_CM.so.1.0.0, but
it doesn't:
es1api_libGLESv1_CM_la_LDFLAGS = \
-no-undefined \
-version-number 1:1 \
$(GC_SECTIONS) \
$(LD_NO_UNDEFINED)
Revert commit cc15460e18 to ensure that the
autotools and meson builds produce the same libraries.
Fixes: cc15460e18 "meson: drop GLESv1 .so version back to 1.0.0"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
This enables the remaining capabilities in SPV_EXT_descriptor_indexing.
Fixes: 6e230d7607 "anv: Implement VK_EXT_descriptor_indexing"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This enables the remaining capabilities in SPV_EXT_descriptor_indexing.
Fixes: 0e10790558 "radv: Enable VK_EXT_descriptor_indexing."
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The code was handling the Weak variant in some cases, but missing
others, e.g. the get_deref_nir_atomic_op. Add all the missing cases
with the same behavior of the non-Weak SpvOpAtomicCompareExchange.
Note that the Weak variant is basically an alias, as SPIR-V 1.3,
Revision 7 says
"OpAtomicCompareExchangeWeak
Deprecated (use OpAtomicCompareExchange).
Has the same semantics as OpAtomicCompareExchange."
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
These files implement running almost all of deqp-gles2 on Chomebooks of
the rk3399-gru-kevin type in Collabora's LAVA lab.
The approach follows what is currently being used for virglrenderer,
but scheduling the actual test jobs via LAVA.
We start by building a container in Docker that contains a suitable
rootfs and kernel for the DUT, deqp and all dependencies for building
Mesa itself.
The Mesa is built and the rootfs, deqp and Mesa are combined in a cpio
ramdisk. A LAVA job is generated, submitted to LAVA and the results are
processed by simply comparing them to the expectations that are stored
in git. Any code that changes the expectations (hopefully tests are
fixed) needs to also update the expectations file.
The next step is adding support for other devices, possibly in other
LAVA labs.
In order to use this, the repository has to be configured to run the
gitlab-ci.yaml file from the panfrost/ci dir, and a LAVA token needs to
be setup.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Prep work for fb_read (blend_equation_advanced)
Switch to using 'enum pipe_shader_type' everywhere, and (optional, in
non-cache / slowpath case) pass ctx instead of image/ssbo state. In the
fb_read case we also need to access the framebuffer state, so having
the ctx simplifies things.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Hardware docs say that Gen11 requires the use of two MI_ATOMICs of size
QWORD when updating the clear color. The second MI_ATOMIC also needs CS
Stall and Return Data Control set.
v2: Remove include of srgb header (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Change some of the single bit fields to booleans, and add an enum with
the definition of the ATOMIC_OPCODE.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
With python's int(), if the optional second parameter is 0, then
python will support the 0x prefix for hex numbers.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
One special case, `src/util/xmlpool/.gitignore` is not entirely deleted,
as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`).
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
I was setting it based off a pipe_rasterizer_state field that appears
to be entirely dead outside of the draw module respecting it.
I should be setting it when the primitive type reaching the SF is
neither points nor lines. This is, unfortunately, rather dirty,
as we have to look at the rasterizer state, the geometry shader state,
the tessellation evaluation shader state, and the primitive type...
https://reviews.llvm.org/rL356946 (present in LLVM 9 and later) changed
the meaning of the "system" sync scope, making it no longer restricted to
the memory operation's address space. So a single address space sync scope
is needed for shared atomic operations (such as "system-one-as" or
"workgroup-one-as") otherwise buffer_wbinvl1 and s_waitcnt instructions
can be created at each shared atomic operation.
This mostly reimplements LLVMBuildAtomicRMW and LLVMBuildAtomicCmpXchg
to allow for more sync scopes and uses the new functions in ac->nir with
the "workgroup-one-as" or "workgroup" sync scopes.
F1 2017 (4K, Ultra High settings, TAA), avg FPS : 59 -> 59.67 (+1.14%)
Strange Brigade (4K, ~highest settings), avg FPS : 51.5 -> 51.6 (+0.19%)
RotTR/mountain (4K, VeryHigh settings, FXAA), avg FPS : 57.2 -> 57.2 (+0.0%)
RotTR/tomb (4K, VeryHigh settings, FXAA), avg FPS : 42.5 -> 43.0 (+1.17%)
RotTR/valley (4K, VeryHigh settings, FXAA), avg FPS : 40.7 -> 41.6 (+2.21%)
Warhammer II/fallen, avg FPS : 31.63 -> 31.83 (+0.63%)
Warhammer II/skaven, avg FPS : 37.77 -> 38.07 (+0.79%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
To quote Uli Schlachter, who understands this stuff more than I do:
> The function __glXSendError() in mesa's src/glx/glx_error.c invents an X11
> protocol error out of thin air. For the sequence number it uses dpy->request.
> This is the sequence number of the last request that was sent. _XError() will
> then update dpy->last_request_read based on the sequence number of the error
> that just "came in".
>
> If now another something comes in with a sequence number less than
> dpy->last_request_read, since sequence numbers are monotonically increasing,
> widen() will incorrectly add 1<<32 to the sequence number and things might go
> downhill afterwards.
`__glXSendErrorForXcb` was also patched, as that's the function that
`glXCreateContextAttribsARB` actually uses.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99781
Cc: mesa-stable@lists.freedesktop.org
Fixes: ad503c41 'apple: Initial import of libGL for OSX from AppleSGLX svn repository'
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Hal Gentz <zegentzy@protonmail.com>
In commit 0d46e404 ("anv: limit URB reconfigurations when using
blorp") we tried to limit the number of URB reconfiguration by
checking if the last allocation is large enough to fit the blorp
dispatch.
We used the last bound pipeline to compare the allocation. The problem
with this is that the pipeline is bound but its commands might not
have been emitted into the command buffer yet.
Let's just revert commit 0d46e40467
since it didn't seem to yield any performance improvement.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0d46e404 ("anv: limit URB reconfigurations when using blorp")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110535
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
It's prefectly legal and well-defined to render using a non-existing
or empty buffer object. The data coming out of the buffer object isn't
well defined unless we have the robustness flag set on the context, but
that's a different matter, and up to the shader hardware; it's the same
as out-of-bounds reads.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It's legal for a buffer-object to have a NULL-resource, but let's just
skip over it, as there's nothing to do.
This patch switches the order of the conditionals in swr_update_derived,
so the logic becomes a bit more straight forward:
if (is_user_buffer)
...
else if (resource)
...
else
...
...instead of this:
if (!is_user_buffer)
if (resource)
...
else
...
else
...
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Alok Hota <alok.hota@intel.com>
It's legal for a buffer-object to have a NULL-resource, but let's just
skip over it, as there's nothing to do.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
It's legal for a buffer-object to have a NULL-resource, but let's just
skip over it, as there's nothing to do.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
It's legal for a buffer-object to have a NULL-resource, but let's just
skip over it, as there's nothing to do.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
st_setup_current never sets this flag, and it's already checked against
right before. So let's remove this pointless check.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
From page 76 (page 80 of the PDF) of the GLSL 4.60 v.5 spec:
" No aliasing in output buffers is allowed: It is a compile-time or
link-time error to specify variables with overlapping transform
feedback offsets."
Currently, this is expected to fail, but it succeeds:
"
...
layout (xfb_offset = 0) out vec2 a;
layout (xfb_offset = 0) out vec4 b;
...
"
Fixes the following piglit test:
tests/spec/arb_enhanced_layouts/compiler/transform-feedback-layout-qualifiers/xfb_offset/invalid-overlap.vert
Fixes the following test:
KHR-GL44.enhanced_layouts.xfb_output_overlapping
v2:
- Use a data structure to track the used components instead of a
nested loop (Ilia).
v3:
- Take the BITSET_WORD array out from the
gl_transform_feedback_buffer struct and make it local to the
validation process (Timothy).
- Do not use a nested scope for the validation (Timothy).
v4:
- Add reference to the fixed piglit test in the commit log.
- Add reference to the fixed VK-GL-CTS test in the commit
log (Tapani).
- Empty initialize the BITSET_WORD pointers array (Tapani).
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Before setting the physical device API version, we should check if the
MESA_VK_VERSION_OVERRIDE environment variable is set and take it into
account.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
While we can clean this up later, it's trivial to not generate the
stupid code in the first place, which saves some optimization work.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
VK_ANDROID_external_memory_android_hardware_buffer requires this
extension. It is safe to enable it since currently aux usage is
disabled for ahw buffers.
Fixes following dEQP extension dependency test on Android:
dEQP-VK.api.info.device#extensions
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Treat gl_FragCoord variable as a system value and lower the w component
with a nir pass.
Add the necessary bits for correct codegen.
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
On some hardware (e.g. Mali400) the shader needs to apply some
transformations for correct gl_FragCoord handling. The lowering
actions look like the following in pseudocode:
gl_FragCoord.xyz = gl_FragCoord_orig.xyz
gl_FragCoord.w = 1.0 / gl_FragCoord_orig.w
Add this lowering as a nir pass in preparation for using it in the driver.
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
I have *no* idea what's happening here, but let's not regress an app
that used to work in the mean time while we're figuring it out..
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
In a perfect world, we'd use fp16 varyings for mediump and fp32 for
highp, allowing us to get a performance win without sacrificing
conformance. Unfortunately, we're not there (yet), so it's better we
assume always fp32 than always fp16 to avoid artefacts / breaking a lot
of deqp.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Two fixes here, one is that we tried to copyprop non-strictly-SSA values
which was bound to fly in our face. The other was peeling back the imov
workaround.. Turns out we still need that. More research is needed
still, but let's not regress real apps.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Fixes: commit b53b4573c3.
Optimization gone wrong. In the future, we should try this again (it's a
net win if implemented right), but at the moment this just regresses.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Some of the dEQP.functional.transform_feedback tests end up doing
the following sequence of operations:
1. BeginTransformFeedback
2. PauseTransformFeedback
3. Draw
4. ResumeTransformFeedback
At step 1, we'd pack 3DSTATE_SO_BUFFER commands saying to zero the
SO_WRITE_OFFSET registers. At step 2, we disable streamout, so step 3
doesn't bother emitting those commands. Then, step 4 re-packs new
3DSTATE_SO_BUFFER commands with offset = 0xFFFFFFFF, saying to continue
appending at the existing offset. This loads the value from the BO as
the offsets - but we never actually zeroed it.
So, just maintain a flag saying "we actually emitted the commands",
and stomp offset back to zero until we emit some.
I have a platform with vc4 display but V3D 4.x. We can fall back on
kmsro's probing to bring up the v3d gallium driver.
Acked-by: Rob Clark <robdclark@chromium.org>
Like vc4, we expect to have SOCs with various displays that have a single
V3D instance for rendering.
v2: Add v3d to the list of drivers that make enabling kmsro valid.
Acked-by: Rob Clark <robdclark@chromium.org>
We can't use the QPU functions to detect this until register allocation is
done and we've moved inst->dst into inst->qpu.
Fixes bad TMU sequences from register spilling in
KHR-GLES31.core.compute_shader.shared-max.
Looks like I lost it in a rebase conflict resolution. We'd hit the
unknown intrinsic assertion in
KHR-GLES31.core.compute_shader.shared-struct.
Fixes: 6b1c659825 ("v3d: Add Compute Shader compilation support.")
There are two cases where v3d's sampler view's resource doesn't match the
base's: shadow textures for sampling from raster, and pointing at the
separate depth texture for z32f_s8x24. We only want to update shadow for
the first case.
Fixes
dEQP-GLES31.functional.stencil_texturing.render.depth32f_stencil8_draw
when run after the previous testcase.
We were emitting a dummy load for when the VS doesn't load any attributes,
but we also need to emit a dummy load for when the render VS loads
attributes but the binner VS doesn't. Fixes simulator assertion failures
and GPU hangs on KHR-GLES31.core.texture_gather.\*
We are assured that the input segment size field is ignored for
!separate_segs mode, and now the simulator wants an in-range value set
regardless of whether it's functionally ignored or not.
fixes following warning with clang:
warning: suggest braces around initialization of subobject
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Used same syntax as elsewhere with Mesa sources, verified result
against MSVC with godbolt.org.
fixes following warning with clang:
warning: suggest braces around initialization of subobject
v2: empty braces -> braces around subobject (Caio, Kristian)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
- Emulation of AVX512 built into SIMDLIB
- Remove associated macros
- Remove knobs controlling AVX512 and let emulation handle it
- Refactor variable names for SIMD16
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
- Use 8x2 tiling by default
- Remove associated macros
- Use SIMDLIB emulation for SIMD16 on SIMD8 hardware
- Remove code rot in Load/StoreTile
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
If there is no last fence, due to no rendering happening yet, just
create a new signaled fence and return it, to match the expectations of
the EGL sync fence API.
Fixes random "Could not create sync fence 0x3003" assertion failures from
Skia on Android, coming from the following code:
https://android.googlesource.com/platform/frameworks/base/+/master/libs/hwui/pipeline/skia/SkiaOpenGLPipeline.cpp#427
Reproducible especially with thread count >= 4.
One could make the driver always keep the reference to the last fence,
but:
- the driver seems to explicitly destroy the fence whenever a rendering
pass completes and changing that would require a significant functional
change to the code. (Specifically, in lp_scene_end_rasterization().)
- it still wouldn't solve the problem of an EGL sync fence being created
and waited on without any rendering happening at all, which is
also likely to happen with Android code pointed to in the commit.
Therefore, the simple approach of always creating a fence is taken,
similarly to other drivers, such as radeonsi.
Tested with piglit llvmpipe suite with no regressions and following
tests fixed:
egl_khr_fence_sync
conformance
eglclientwaitsynckhr_flag_sync_flush
eglclientwaitsynckhr_nonzero_timeout
eglclientwaitsynckhr_zero_timeout
eglcreatesynckhr_default_attributes
eglgetsyncattribkhr_invalid_attrib
eglgetsyncattribkhr_sync_status
v2:
- remove the useless lp_fence_reference() dance (Nicolai),
- explain why creating the dummy fence is the right approach.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Currently if the timeout differs from 0, we'll end up with infinite
wait... even if the user is perfectly clear they don't want that.
Use the new lp_fence_timedwait() helper guarding both waits in an
!lp_fence_signalled block like the rest of llvmpipe.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The function is analogous to lp_fence_wait() while taking at timeout
(ns) parameter, as needed for EGL fence/sync.
v2:
- use absolute UTC time, as per spec (Gustaw)
- bail out on cnd_timedwait() failure (Gustaw)
v3:
- check count/rank under mutex (Gustaw)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v1)
Reviewed-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
As effectively required by the extension, we need to ensure we're master
Currently drivers employ vendor specific solutions, which check if the
device behind the fd is capable*, yet none of them do the master check.
*In the radv case, if acceleration is available.
Instead of duplicating the check in each driver, keep it where it's
needed and used.
Note this copies libdrm's drmIsMaster() to avoid depending on bleeding
edge version of the library.
v2: set the fd to -1 if not master (Bas)
Fixes: da997ebec9 ("vulkan: Add KHR_display extension using DRM [v10]")
Cc: Andres Rodriguez <andresx7@gmail.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Keith Packard <keithp@keithp.com>
Reported-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
These have been popping up more and more with the OpenCL work and other
bits causing extra conversions to/from 64-bit.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
In 105002bd2d, we fixed a memory leak bug where we weren't properly
destroying descriptor when destroying/resetting a descriptor pool.
However, the only real leak that happened was that we we take a
reference to the descriptor set layout in the descriptor set and we
weren't dropping our reference. Everything else in the descriptor set
is tied to the pool itself and doesn't need to be freed on a per-set
basis. This commit changes the destroy/reset functions to only bother
walking the list of sets to unref the layouts and otherwise we just
assume that the whole-pool destroy/reset takes care of the rest.
Now that we're doing more non-trivial things with descriptor sets such
as allocating things with util_vma_heap, per-set destruction is starting
to show up on perf traces. This takes reset back to where it's supposed
to be as a cheap whole-pool operation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In c520f4dec9, we chose to align the sizes of descriptor set buffers to
32 bytes. We have to align the descriptor set buffer to 32B so that
it's valid for using with push constants. We align the size as well so
we don't leave lots of holes with util_vma_heap_alloc. Unfortunately,
we were only aligning it for alloc and not for free so we were still
creating piles of holes when we delete descriptor sets. This causes
terrible perf for the allocator once we've deleted piles of descriptor
sets.
This commit reworks the code so that we align the descriptor set buffer
size to 32B for both alloc and free. The result is that it takes the
new crucible vkResetDescriptorPool from 104.567719 to 2.898354 seconds.
Fixes: c520f4dec9 "anv: Add a concept of a descriptor buffer"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110497
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This fixes a case where we are expecting 64-bit but generate
32-bit consts and validate gets angry.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes KHR-GL45.compute_shader.resources-max on radeonsi.
Fixes: 4e1e8f684b "glsl: remember which SSBOs are not read-only and pass it to gallium"
v2: use is_interface_array, protect again assertion failures in u_bit_consecutive
Reviewed-by: Dave Airlie <airlied@redhat.com>
Forgot to update corresponding entries for desktop GL.. kinda wish we
didn't have to update both GLES and GL tables.
Signed-off-by: Rob Clark <robdclark@chromium.org>
The compiler support for:
OES_sample_shading
OES_sample_variables
OES_shader_multisample_interpolation
Signed-off-by: Rob Clark <robdclark@chromium.org>
The so->inputs[] table is in units of vec4
Fixes: 7ff6705b8d freedreno/ir3: convert to "new style" frag inputs
Signed-off-by: Rob Clark <robdclark@chromium.org>
There are a few places that we check if a shader stage input reg is
used/valid (ie. not r63.x).. and there are about to be a bunch more.
So add some helper macros for less open-coding.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Since this is what the value actually is. Cleanup the name before
adding more different i,j related values for sample-shading.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Calculates i,j at specified offset within a pixel. A new load_size_ir3
intrinsic is used in conjunction with fddx/fddy to translate the offset
into primitive space and adjust the i,j from load_barycentric_pixel
accordingly.
Signed-off-by: Rob Clark <robdclark@chromium.org>
De-duplicate the "normal" and "flags" versions of the macros, and while
at it go ahead and add "flags" versions for all the remaining macros,
since we'll at least need INSTR1F in a following commit.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Update UABI header and add FD_PP_PGTABLE and FD_NR_FAULTS params.
Robustness can be supported by a kernel which provides the new ABI if it
also indicates that per-process pagetables are in use.
Signed-off-by: Rob Clark <robdclark@chromium.org>
We'll want to unify this with main copy prop (and extend to varyings),
but that'll take more care to handle some special cases, so leave it as
a stub pass for now.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This extends copy propagation to respect output modifiers for ALU
instructions, as well as potentially fixing some bugs related to looping
(all dEQP loop tests pass).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This enabled the basic YCBCR features.
We support basic multiplane formats using 8-bit and 16-bit unorms, as
well as YUV2 formats.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
No functional changes. This temporarily uses plane 0 for
everything.
Long term plan is that only single plane images get to use
metadata like htile/dcc/cmask/fmask.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
These will be lowered by nir_lower_tex() with the
lower_tex_when_implicit_lod_not_supported, so don't need the extra
handling here.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We already add the LOD src, so go ahead and update the texop as well
when this option is set.
v2: Make it an option. (Rob Clark)
v3: Use a more concise name suggested by Jason.
Reviewed-by: Rob Clark <robdclark@gmail.com>
One cannot write the URB arbitrarily and therefore the message
has to be carefully constructed. The clever tricks originate
from Kenneth and Jason, I'm just writing the patch.
Fixes GPU hangs on ICL with Vulkan CTS.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
OpenGL 4.6 Spec:
"5.3.3 Rules
.......
Note: “Updates” via rendering or transform feedback
are treated consistently with updates via GL commands.
Once EndTransformFeedback has been issued, any subsequent
command in the same context that uses the results of the
transform feedback operation will see the results."
v2: removed a wrong comment
( Kenneth Graunke <kenneth@whitecape.org> )
v3: - flush+dirty depends on buffers usage history
- removed an old hack
( Kenneth Graunke <kenneth@whitecape.org> )
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110404
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just enable it during init_render_context on Gen10+, and move the
Gen9 state tracking into iris_genx_state so it only exists on Gen9.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
DRI driver loadable modules are always installed with
install_megadriver.py with names ending with '.so', irrespective of
platform.
Force the name the loadable module is built with to match, so
install_megadriver.py doesn't spin trying to remove non-existent
symlinks.
Fixes: c77acc3c "meson: remove meson-created megadrivers symlinks"
Force the driver thread to sync immediately with a compiler thread (but
compilation still happens in a separate thread).
This can be useful to simplify debugging compiler issues.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Enabling this option will create ddebug-style dumps for the aux context,
except that instead of intercepting the pipe_context layer
we just dump the IB contents on flush.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Due to asynchronous execution, it's not clear which of the draws the state
may refer to.
This also works around an issue encountered with radeonsi where dumping
the driver state itself caused a hang.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Move the definition of radeonsi_clear_db_cache_before_clear there,
as well as radeonsi_enable_nir.
This removes the AMD_DEBUG=nir option.
We currently still have two places for options: the driconf machinery
and AMD_DEBUG/R600_DEBUG. If we are to have a single place for options,
then the driconf machinery should be preferred since it's more flexible.
The only downside of the driconf machinery was that adding new options
was quite inconvenient. With this change, a simple boolean option can
be added with a single line of code, same as for AMD_DEBUG.
One technical limitation of this particular implementation is that while
almost all driconf features are available, the translation machinery doesn't
pick up the description strings for options added in si_debvug_options. In
practice, translations haven't been provided anyway, and this is intended
for developer options, so I'm not too worried. It could always be added
later if anybody really cares.
v2:
- use bool instead of uint8_t for options
- si_debug_options.inc -> si_debug_options.h
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This reverts commit 40b3abb4d1.
It is not clear that this commit was entirely correct, and unfortunately
it was pushed by error.
CC: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
We were only setting the used mask for the first component of a
varying. Since the linking opts split vectors into scalars this
has mostly worked ok.
However this causes an issue where for example if we split a
struct on one side of the interface but not the other, then we
can possibly end up removing the first components on the side
that was split and then incorrectly remove the whole struct
on the other side of the varying.
With this change we simply mark all 4 components for each slot
used by a struct. We could possibly make this more fine gained
but that would require a more complex change.
This fixes a bug in Strange Brigade on RADV when tessellation
is enabled, all credit goes to Samuel Pitoiset for tracking down
the cause of the bug.
Fixes: f1eb5e6399 ("nir: add component level support to remove_unused_io_vars()")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This was a rebase issue which lost of change to a file moved from i965
to src/intel/perf.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 134e750e16 ("i965: extract performance query metrics")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
need_cs_space may clear the buffer list.
Fixes: 951d60f8cd "radeonsi: delay adding BOs at the beginning of IBs until the first draw"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is needed for exposing the samplerBuffer functions under
EXT_gpu_shader4.
v2: - expose it in the compat profile only
- make it an alias of EXT_gpu_shader4
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (v1)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
The CTS fails on
dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.*vertex
when they are enabled, due to the VS being run for both bin and render. I
think this behavior is expected to be valid, but I can't find text in
atomic counters or SSBO specs saying so (the closed I found was in
shader_image_load_store). Just disable it for now, since the closed
source driver doesn't expose vertex atomic counters/SSBOs either.
This is just asking for tests to get confused about the HW supporting
atomics in this shader stage or not, such as
dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.const_expression_vertex.
v2: Rebase on the other atomic cleanups that have happened since posting.
v3: Commit message tweak by Marek.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
this automatically enables preemption on gen10 where it is disabled by
default but still available
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We create two new helpers, iris_flush_bits_for_history, and
iris_dirty_for_history, then use them in the existing function.
The first accumulates flush bits based on res->bind_history, but doesn't
actually perform a flush. This allows us to accumulate flush bits by
looping over multiple resources, but ultimately emit a single flush for
all of them.
The latter flags dirty bits without flushing, which again allows us to
handle multiple resources, but also is more convenient when writing from
the CPU where we don't need a flush (as in commit 4d12236072).
This inserts a handle for the flink name and a handle the correct
gem handle for the bo.
v2: fix handles/names confusion (Lepton Wu)
v3: set flink name correctly (Lepton Wu)
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Instead of aligning and then taking inline uniforms into account, we
need to take inline uniforms into account and then align to a page.
Otherwise, we may not be aligned to a page and allocation may fail.
Fixes: 43f40dc7cb "anv: Implement VK_EXT_inline_uniform_block"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
anv_descriptor_pool_free_set is called on the clean-up path of
anv_descriptor_set_create and the set may not have been added to the
pool's list of sets yet. While we're here, we move adding it to that
list into set_create for symmetry.
Fixes: 105002bd2d "anv: destroy descriptor sets when pool gets..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Found by GCC warning:
src/intel/compiler/brw_fs_combine_constants.cpp: In function ‘bool needs_negate(const fs_reg*, const imm*)’:
src/intel/compiler/brw_fs_combine_constants.cpp:306:34: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
return ((reg->d & 0xffffu) < 0) != (imm->w < 0);
~~~~~~~~~~~~~~~~~~~^~~
The result of the bit-and is a 32-bit value with the top bits all zero.
This will never be < 0. Instead of masking off the bits, just cast to
int16_t and let the compiler handle the actual conversion.
Fixes: e64be391dd ("intel/compiler: generalize the combine constants pass")
Cc: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This code is outdated and unused; now that the compiler is mature,
there's no point keeping it around in-tree (or at all).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This code is stable and can live upstream independently while the rest
of the Bifrost stack comes up.
v2: Added a verbose flag to hide away some of the more verbose features
that nobody really needs
[The Bifrost disassembler is written by Connor Abbott, Lyude Paul, and
Ryan Houdek.]
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
We create an all-encompassing opcode table for handling name and
properties, removing a number of ad hoc opcode tables which became
brittle and quickly out of date. While we're at it, we fix some
incorrect opcodes relating to ball/bany, and move a small function out
to midgard_compile.c. Together these changes should allow compilation
without warnings, along with helping the codebase health considerably.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Most copy prop should occur at the NIR level, but we generate a fair
number of moves implicitly ourselves, etc... long story short, it's a
net win to also do simple copy prop + DCE on the MIR. As a bonus, this
fixes the weird imov precision bug once and for good, I think.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
For floating point ops, these bits determine the "negate?" and "abs?"
modifiers. For integer ops, it turns out they control how sign/zero
extension work, useful for mixing types.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
In the future, we might want to switch to a table-based approach, but
for now, at least have it current.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
We reshuffle the existing "dead move elimination" pass into a generic
dead code elimination layer, fixing bugs incurred with looping in the
process.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
The bug this worked around is no longer applicable, it seems -- remove
the hack that breaks more than it fixes.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
The hardware needs this lowered anyway; for now, might as well use
mesa's default lowering for pure conformance reasons.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This restriction makes sense logically. Not sure why it wasn't obeyed
before. In conjunction with previous commit's disclaimer, fixes
dEQP-GLES2.functional.shaders.loop.for_dynamic_iterations.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Along with a corresponding fix to the move elimination pass (not
included here yet -- I just have it disabled for now), this will fix
dEQP-GLES2.functional.shaders.loops.for_uniform_iterations.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This adds preliminary support for indirect loads of varying arrays and
uniform arrays, bringing a few new tests in shader.indexing.* to
passing, although there remains a number of cases still missing.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Varying arrays sometimes are lowered to a series of directly accessed
varyings (which we handled okay), but when indirectly accessed, they
appear as a single array; we need to handle this as well.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
The semantics of this field are not well understood; it is better to
print it unconditionally along with the other unknown state, rather than
silently eat the value. Without this change, some critical state was
being lost in some shaders (notably, the offset for load/store
scratchpad intructions found in shaders that spill registers.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Otherwise our textures don't get color compression. Thanks to
Eero Tamminen for noticing this was missing!
Improves performance of GLB27_FillTestC24Z16 on my Apollolake
laptop with single channel RAM by 2.3x.
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
flrp32 is also a 3-source instruction, but there is another pending
series that handles that for Gen4 and Gen5.
v2: Rebase on "intel/compiler: Don't have sepearate, per-Gen
nir_options"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Instead, just have separate scalar vs. vector nir_options and do
per-Gen "fix ups".
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Every file that included glsl/ir.h had a warning like:
src/compiler/glsl/ir.h: In member function ‘virtual bool ir_rvalue::is_lvalue(const _mesa_glsl_parse_state*) const’:
src/compiler/glsl/ir.h:236:64: warning: unused parameter ‘state’ [-Wunused-parameter]
virtual bool is_lvalue(const struct _mesa_glsl_parse_state *state = NULL) const
^
Cc: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: fa4ebf6b8d ("glsl: add _mesa_glsl_parse_state object to is_lvalue()")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Commit 624789e370 moved the destruction of types out of atexit() and
made use of a ref count instead. This is useful for avoiding a crash
where drivers such as radeonsi are still compiling in a thread when the app
exits and has not called MakeCurrent to change from the current context.
While the above scenario is technically an app bug we shouldn't crash.
However that change caused another race condition between the shader
compilation tread in radeonsi and context teardown functions.
This patch makes two changes to fix this new problem:
First we explicitly call _mesa_destroy_shader_compiler_types() when destroying
the st context rather than calling it indirectly via _mesa_free_context_data().
We do this as we must call it after st_destroy_context_priv() so that we don't
destory the glsl types before the compilation threads finish.
Next wait for the shader threads to finish in si_destroy_context() this
also means we need to call context destroy before destroying the queues
in si_destroy_screen().
Fixes: 624789e370 ("compiler/glsl: handle case where we have multiple users for types")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The dri options are optional. When the dri options are not provided
the WSI will not use adaptive sync.
FWIW I think for xf86-video-amdgpu this still requires an X11 config
option, so only people who opt in can get possible regressions from this.
So then the remaining question is: why do this in the WSI?
It has been suggested in another MR that the application sets this.
However, I disagree with that as I don't think we'll ever get a
reasonable set of applications setting it.
The next questions is whether this can be a layer. It definitely
can be as implemented now. However, I think this generally fits
well with the function of the WSI. Furthemore, for e.g. the DISPLAY
WSI this is much harder to do in a layer.
Of course, most of the WSI could almost be a layer, but I think
this still fits best in the WSI.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This includes 0 options.
The cache parsing is located at a position where we can easily add
config filtering by VkApplicationInfo.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
this hooks up the iris gallium driver to existing mesa bits which handle
the implementation
resolveskwg/mesa#8
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
if the driver (iris) indicates support for the inner_coverage pipe cap, this
will set the necessary states in the driver flags and rasterizer structs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
this can be used by drivers which support the extension to indicate support
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to subtract the starting offset from the final offset before
dividing by the stride. See src/intel/vulkan/genX_cmd_buffer.c:3142.
Not known to fix anything.
This operation decorate with an Id instead of a Literal or String.
It is used by HlslCounterBufferGOOGLE (provided by
SPV_GOOGLE_hlsl_functionality1). Even if we don't do anything with
that decoration, we must be able to parse SPIR-V that uses it.
Fixes: 891886da2f "spirv: Add no-op support for VK_GOOGLE_hlsl_functionality1"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Decorations (and ExecutionModes) can have not only literals, but also
Ids associated with them. So rename the field to the more general
name "Operand" used by the spec.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Several empty cmdbufs are submitted by app/xserver per frame, from
glamor_block_handler for example. Let's skip them.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Noticed while trying to decide if pipebuffer was of any use to me, and
found that nothing has used it in the last 10 years at least.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Kristian Høgsberg <hoegsberg@gmail.com>
Set REG_A2XX_RB_COPY_DEST_OFFSET in the tile init as it won't get touched
by the draw batch. Then gmem2mem is the same for all tiles.
Similar to what is done in a6xx, but only for gmem2mem.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Allows removing the load_deref/store_deref code in the compiler.
tgsi_to_nir now uses screen instead of options so we can simplify that too.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
a2xx driver is currently broken when PIPE_CAP_PACKED_UNIFORMS is enabled,
disable it for now.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
tgsi_to_nir now requires a screen pointer and is used by fd2_prog_init.
fd2_prog_init is used before fd_context_init so set the pointer manually.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
so that bound compute shader resources won't be added when they are not
needed and same for graphics.
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The assertion considers max_dw from the current IB in the chain, but
big_ib_buffer is a buffer for the next IB, which can be smaller.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Applications frequently call glBufferSubData() to consecutive regions
of a VBO to append new vertex data. If no data exists there yet, we
can promote these to unsynchronized writes, even if the buffer is busy,
since the GPU can't be doing anything useful with undefined content.
This can avoid a bunch of unnecessary blitting on the GPU.
u_threaded_context would do this for us, and in fact prohibits us from
doing so (see TC_TRANSFER_MAP_NO_INFER_UNSYNCHRONIZED). But we haven't
hooked that up yet, and it may be useful to disable u_threaded_context
when debugging...at which point we'd still want this optimization. At
the very least, it would let us measure the benefit of threading
independently from this optimization. And it's not a lot of code.
Removes most stall avoidance blits in "Total War: WARHAMMER."
On my Skylake GT4e at 1920x1080, this appears to improve performance
in games by the following (but I did not do many runs for proper
statistics gathering):
----------------------------------------------
| DiRT Rally | +2% (avg) | + 2% (max) |
| Bioshock Infinite | +3% (avg) | + 9% (max) |
| Shadow of Mordor | +7% (avg) | +20% (max) |
----------------------------------------------
This implements PIPE_CAP_INVALIDATE_BUFFER and invalidate_resource(),
as well as the PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag. When either
of these happen, we swap out the backing storage of the buffer for a
new idle BO, allowing us to write to it immediately without stalling
or queueing a blit.
On my Skylake GT4e at 1920x1080, this improves performance in games:
-----------------------------------------------
| DiRT Rally | +25% (avg) | +17% (max) |
| Bioshock Infinite | +22% (avg) | +11% (max) |
| Shadow of Mordor | +27% (avg) | +83% (max) |
-----------------------------------------------
This is probably not the best place for it, but I don't feel like moving
the one out of the TGSI translator today, and we already have the other
direction here, so...*shrug*
This unifies a bunch of the UBO and SSBO code to use common structures.
Beyond iris_state_ref, pipe_shader_buffer also gives us a buffer size,
which can be useful when filling out the surface state.
I have various conditions in place to try and avoid unnecessary
PIPE_CONTROL flushes, especially to batches which may have never
used the buffer being mapped. But if we do a CPU map to a bound
constant buffer, we still need to mark push constants dirty, even
if there's nothing happening in batches that would warrant a flush.
Fixes obvious misrendering in the "XCOM 2: War of the Chosen" menus
(lots of rainbow colored triangles). Fixes lots of blinking elements
in "Shadow of Mordor". Fixes missing crowd rendering in "DiRT Rally".
Allow ATTR and IMM sources unconditionally (ATTR are just GRFs, IMM will
be handled by opt_combine_constants(). Both are already allowed by
opt_copy_propagation().
Also allow FIXED_GRF if the regioning is 8,8,1. Could also allow other
stride=1 regions (e.g., 4,4,1) and scalar regions but I don't think
those occur. This is sufficient to allow a pass added in a future commit
(fs_visitor::lower_linterp) to avoid emitting extra MOV instructions.
I removed the 'src.stride > 1' case because it seems wrong: 3-src
instructions on Gen6-9 are align16-only and can only do stride=1 or
stride=0. A run through Jenkins with an assert(src.stride <= 1) never
triggers, so it seems that it was dead code.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The two new unit tests verify that propagating a saturate between
instructions of different exec sizes does not happen.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Will allow us to test that propagation between instructions of different
exec sizes does not happen (in the next commit).
The stray-looking change in intervening_dest_write is to adjust the size
of the texture result to keep the test functioning identically when the
instructions' exec sizes are doubled. Without the change, the texture
does not overwrite the destination fully as the unit test intends.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We now have a lowering pass that will do this at the fs_visitor level,
so we can remove this code from gen11+.
v2: Reduce size of the "i" array from 4 to 2 (Matt).
Reviewed-by: Matt Turner <mattst88@gmail.com>
On gen11, instead of using a PLN instruction, we convert
FS_OPCODE_LINTERP to 2 or 4 multiply adds. That is done in the
fs_generator code.
This patch adds a lowering pass that does the same thing at the
fs_visitor. It also drops the usage of NF types, since we don't need the
extra precision and it lets us skip the accumulator. With all that, some
optimizations will still be run on the generated code, and we should get
better scheduling.
v2: Update comment about saturation and conditional mod (Matt)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Move the scalar-region conversion from the IR to the generator, so it
doesn't affect the Gen11 path. We need the non-scalar regioning
for a later lowering pass that we are adding.
v2: Better commit message (Matt)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Otherwise it could propagate the saturation from a SIMD16 instruction
into a SIMD8 instruction. With that, only part of the destination
register, which is the source of the move with saturation, would have
been updated.
Reviewed-by: Matt Turner <mattst88@gmail.com>
I left code indented one level too far in the previous commit to make
the diff easier to review. Drop that extra level now.
Fixes: 6981069fc8 i965: Ignore uniform storage for samplers or images, use binding info
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
gl_nir_lower_samplers_as_deref creates new top level sampler and image
uniforms which have been split from structure uniforms. i965 assumed
that it could walk through gl_uniform_storage slots by starting at
var->data.location and walking forward based on a simple slot count.
This assumed that structure types were walked in a particular order.
With samplers and images split out of structures, it becomes impossible
to assign meaningful locations. Consider:
struct S {
sampler2D a;
sampler2D b;
} s[2];
The gl_uniform_storage locations for these follow this map:
0 => a[0], 1 => b[0], 2 => a[0], 3 => b[0].
But the new split variables look like:
sampler2D lowered_a[2];
sampler2D lowered_b[2];
and there is no way to know that there's effectively a stride to get to
the location for successive elements of a[] or b[]. So, working with
location becomes effectively impossible.
Ultimately, the point of looking at uniform storage was to pull out the
bindings from the opaque index fields. gl_nir_lower_samplers_as_derefs
can obtain this information while doing the splitting, however, and sets
up var->data.binding to have the desired values.
We move gl_nir_lower_samplers before brw_nir_lower_image_load_store so
gl_nir_lower_samplers_as_derefs has the opportunity to set proper image
bindings. Then, we make the uniform handling code skip sampler(-array)
variables, and handle image param setup based on var->data.binding.
Fixes Piglit tests/spec/glsl-1.10/execution/samplers/uniform-struct,
this time without regressing dEQP-GLES2.functional.uniform_api.random.3.
Fixes: f003859f97 nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This reverts commit 9e0c744f07, which
regressed dEQP-GLES2.functional.uniform_api.random.3. It turns out
that the newly produced location is meaningless and impossible to
consume by drivers that want to look at gl_uniform_storage, so it's
probably better to leave it unset (0) than a number that looks usable.
Leave a tombstone^Wcomment to discourage the next person from making
the obvious looking fix.
See the next commit for a longer description of the problem.
This breaks tests/spec/glsl-1.10/execution/samplers/uniform-struct
on i965, which was originally fixed by the revert. The next commit
will fix it again.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Marek recently extended pipe->set_shader_buffers() to take an extra
writable_bitmask parameter, indicating which SSBOs are writable (some
may be bound read-only). We can use this to decide whether to set
EXEC_OBJECT_WRITE when pinning. Avoiding the write flag can save us
some cross-batch flushing if the SSBO is used for reading in both the
render and compute engines.
The LLVM project made some questionable decisions about defaults for
armv7 (e.g. they enable NEON that is not there on NVIDIA and Marvell
platforms).
On top of that, getHostCPUFeatures() doesn't disable missing machine
attributes. Finally, -neon alone is not sufficient to disable emmision
of NEON instructions.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We have a macro for this now; no reason to hand-roll it for derefs.
While we're here, move the NIR_DEFINE_CAST for derefs down to where all
the other ones are.
Reviewed-by: Eric Anholt <eric@anholt.net>
A lot has happened in those two drivers since the 19.0 release and we
keep forgetting to update release notes. Time to bring everything up to
date again before 19.1 gets released.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Only computeDerivativeGroupLinear is supported for now.
All crucible tests pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Commit ad98fbc217 ("intel/fs: Refactor code generation for nir_op_fsign
to its own function") criss-crossed with c2b8fb9a81 ("anv/device:
expose VK_KHR_shader_float16_int8 in gen8+"), and I was not paying
enough attention when I rebased. This adds back the float16 changes and
enables the optimization.
v2: Incorporate more changes from 19cd2f5deb and a8d8b1a139 that I
missed in the previous version.
Fixes: ad98fbc217 ("intel/fs: Refactor code generation for nir_op_fsign to its own function")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110474
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Currently only meson build supported is added for lima driver.
Add Android build support for lima.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Acked-by: Qiang Yu <yuq825@gmail.com>
For HW cursors, "cursor.pos" doesn't hold the current position of the
pointer, just the position of the last call to SetCursorPosition().
Skip the check against stale values and bump the d3dadapter9 drm version
to expose this change of behaviour.
Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
Previously, we were storing the per-binding create info pointer in the
immutable_samplers field temporarily so that we can switch the order in
which we walk the loop. However, now that we have multiple arrays of
structs to walk, it makes more sense to store an index of some sort.
Because we want to leave immutable_samplers as NULL for undefined
bindings, we store index + 1 and then subtract one later.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
I missed this on the first go round. The bindingCount field of
VkDescriptorSetLayoutBindingFlagsCreateInfoEXT is allowed to be zero
which means the flags array is ignored.
Fixes: d6c9bd6e01 "anv: Put binding flags in descriptor set layouts"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Mali attribute buffers have to be 64-byte aligned. However, Gallium
enforces no such requirement; for unaligned buffers, we were previously
forced to create a shadow copy (slow!). To prevent this, we instead use
the offseted buffer's address with the lower bits masked off, and then
add those masked off bits to the src_offset. Proof of correctness
included, possibly for the opportunity to say "QED" unironically.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This (fairly large) patch continues work surrounding the panfrost_job
abstraction to improve job lifetime management. In particular, we add
infrastructure to track which BOs are used by a particular job
(currently limited to the vertex buffer BOs), to reference count these
BOs, and to automatically manage the BOs memory based on the reference
count. This set of changes serves as a code cleanup, as a way of future
proofing for allowing flushing BOs, and immediately as a bugfix to
workaround the missing reference counting for vertex buffer BOs.
Meanwhile, there are a few cleanups to vertex buffer handling code
itself, so in the short-term, this allows us to remove the costly VBO
staging workaround, since this patch addresses the underlying causes.
v2: Use pipe_reference for BO reference counting, rather than managing
it ourselves. Don't duplicate hash-table key removal. Fix vertex buffer
counting.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Now that everything is in place to do bindless for all resource types
except input attachments and UBOs, VK_EXT_descriptor_indexing is
"trivial".
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This commit changes anv to put bindless handles and sampler pointers
into the descriptor buffer and use those instead of bindful when we run
out of binding table space. This "spilling" of descriptors allows to to
advertise an almost unbounded number of images and samplers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Instead of setting it manually, call the helper. When setting
descriptor sets becomes more complicated than just setting some struct
values, this will keep immutable sampler handling correct.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We add two new texture sources for bindless surface and sampler handles.
Bindless surface handles are expected to be pre-shifted so that the
20-bit surface state table index is in the top 20 bits of the 32-bit
handle. This lets us avoid any extra shifts in the shader. Bindless
sampler handles are 32-byte aligned byte offsets from general state base
address. We use 32-byte aligned instead of 16-byte aligned to avoid
having to use more indirect messages than needed. It means we can't
tightly pack samplers but that's probably not a big deal.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
When we have a bindless sampler, we need an instruction header. Even in
SIMD8, this pushes the instruction over the sampler message size maximum
of 11 registers. Instead, we have to lower TXD to TXL.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This commit adds a new way for ANV to do SSBO bindings by just passing a
GPU address in through the descriptor buffer and using the A64 messages
to access the GPU address directly. This means that our variable
pointers are now "real" pointers instead of a vec2(BTI, offset) pair.
This carries a few of advantages:
1. It lets us support a virtually unbounded number of SSBO bindings.
2. It lets us implement VK_KHR_shader_atomic_int64 which we couldn't
implement before because those atomic messages are only available
in the bindless A64 form.
3. It's way better than messing around with bindless handles for SSBOs
which is the only other option for VK_EXT_descriptor_indexing.
4. It's more future looking, maybe? At the least, this is what NVIDIA
does (they don't have binding based SSBOs at all). This doesn't a
priori mean it's better, it just means it's probably not terrible.
The big disadvantage, of course, is that we have to start doing our own
bounds checking for robustBufferAccess again have to push in dynamic
offsets.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
In order to avoid the potential overhead of A64 operations on all SSBO
ops, we look for those SSBO ops where we can get to the descriptor set
from the SSBO access operation and lower those to a binding-table
approach. When robustBufferAccess is enabled, this lets the hardware do
the bounds checking for us. It also avoids some potentially expensive
64-bit integer calculations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This is more descriptive and a bit nicer than checking for gen >= 8 &&
use_softpin everywhere.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We're about to start doing 64-bit pointer calculations in ANV. They
will get applied after brw_preprocess_nir which is where we currently do
64-bit integer arithmetic lowering. Because we're adding 64-bit integer
arithmetic after the initial lowering has happened, we need to lower
again.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
If the number of surfaces or samplers exceeds what we can put in a
table, we will want to spill out to bindless. There is no bindless
support yet but this gets us the basic framework that will be used by
later commits.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This commit just sorts the bindings by how often they're used vs the
array size of the binding. This will let us make more nuanced decisions
about what goes in the binding table vs. what to make bindless.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This also fixes a bug where we mis-calculate maximum binding table sizes
and may return true in vkGetDescriptorSetLayoutSupport even for sets too
large to fit in a binding table.
Fixes: ddc4069122 "anv: Implement VK_KHR_maintenance3"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This is really where they belong; not push constants. The one downside
here is that we can't push them anymore for compute shaders. However,
that's a general problem and we should figure out how to push descriptor
sets for compute shaders. This lets us bump MAX_IMAGES to 64 on BDW and
earlier platforms because we no longer have to worry about push constant
overhead limits.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We spend a lot of time in the driver adding things to hash sets to track
residency. The reality is that a properly built Vulkan app uses large
memory objects and sub-allocates from them. In a typical frame, most of
if not all of those allocations are going to be resident for the entire
frame so we're really not saving ourselves much by tracking fine-grained
residency. Just throwing everything in the validation list does make it
a little bit more expensive inside the kernel to walk the list and
ensure that all our VA is in order. However, without relocations, the
overhead of that is pretty small.
If we ever do run into a memory pressure situation where the fine-
grained residency could even potentially help, we would likely be
swapping one page out to make room for another within the draw call and
performance is totally lost at that point. We're better off swapping
out other apps and just letting ours run a whole frame.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Remove unused functions and mark unhandled default case with
unreachable.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This suppresses warning about calling a non-virtual destructor in a
non-final class with virtual functions:
src/compiler/glsl/ast.h:53:4: warning: destructor called on non-final 'ast_node' that has virtual functions but non-virtual destructor [-Wdelete-non-virtual-dtor]
DECLARE_LINEAR_ZALLOC_CXX_OPERATORS(ast_node);
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Compiler warns about overflow when assigning UINT64_MAX to something
smaller than a uin64_t:
src/compiler/nir/nir_constant_expressions.c:16909:50: warning: implicit conversion from 'unsigned long long' to 'uint1_t' (aka 'unsigned char') changes value from 18446744073709551615 to 255 [-Wconstant-conversion]
uint1_t dst = (src0 + src1) < src0 ? UINT64_MAX : (src0 + src1);
~~~ ^~~~~~~~~~
Shift UINT64_MAX down to the appropriate maximum value for the type
being assigned to.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If we want to assert on found == true when the loop exits early, we
need to initialize it to false.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If the last graphics pipeline bound to the command buffer has enough
space in its VS URB entries for Blorp then avoid reconfiguring the URB
partitions.
v2: s/0/MESA_SHADER_VERTEX/ (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
When looking at the dEQP nested_struct_array_dynamic_index_fragment code
after lowering, I was horrified at the amount of adding and multiplying by
0 we were doing. The builder _imm helpers handle that for you so that the
following optimization passes have less work to do. Plus, it's easier to
read.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We were calcuating the offset for the field within the struct, and just
dropping it on the floor. Fixes a regression in
KHR-GLES3.shaders.struct.local.nested_struct_array_dynamic_index_fragment
and a few of its friends since the scratch lowering commit.
Fixes: e8e159e9df ("nir/deref: Add helpers for getting offsets")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The mali utgard pp doesn't support a sign instruction.
Use the nir lowering function for fsign to implement fsign in ppir.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The mali utgard pp doesn't support a sign instruction.
In the ARM offline shader compiler, the sign function is implemented
using sub(gt(0.0, a), lt(0.0, a)).
This is a generic optimization, so implement it in the nir level when
lower_fsign is set, alongside the lowering for isign.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Basically just reserve the memory in the descriptor sets.
On the shader side we construct a buffer descriptor, since
AFAIU VGPR indexing on 32-bit pointers in LLVM is still broken.
This fully supports update after bind and variable descriptor set
sizes. However, the limits are somewhat arbitrary and are mostly
about finding a reasonable division of a 2 GiB max memory size over
the set.
v2: - rebased on top of master (Samuel)
- remove the loading resources rework (Samuel)
- only load UBO descriptors if it's a pointer (Samuel)
- use LLVMBuildPtrToInt to avoid IR failures (Samuel)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v2)
This is actually fixed now.
This change requires LLVM r358579. Make sure to have it in
your tree, otherwise the following piglit will hang:
tests/spec/arb_shader_storage_buffer_object/execution/ssbo-atomicCompSwap-int.shader_test
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
They are buggy with older LLVM version, see r358579.
Fixes: 78c551aca1 ("ac/nir: use new LLVM 8 intrinsics for SSBO atomics except cmpswap")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
They are buggy with LLVM 8 because they weren't marked as source
of divergence, see r358579.
Fixes: dd0172e865 ("radv: Use structured intrinsics instead of indexing workaround for GFX9.")"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We empty the cache sets when flushing the batch, at which point we need
to add any framebuffer related BOs even though the bindings haven't
changed. So, we now do the cache set tracking unconditionally.
For now, we continue skipping resolve work based on the same conditions
in the predraw functions - the thinking is if we didn't trigger
resolves, there's nothing to update here. Time will tell if this works.
Partly reverts commit 365886ebe1, and
fixes Unigine Valley rendering on Gen9+. Drops drawoverhead scores
by about 10-12%.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110353
The current register allocator has a concept of "spill benefit" which is
based on the number of nodes with which a given node interferes. The
idea is that you want to spill stuff with high interference because
those are the most likely registers to help when spilling. However,
this fails to take into account the length of the live range so the
allocator frequently picks "cheap" (not many uses) registers which are
actually very short lived and so spilling them doesn't help with the
pressure situation.
This commit takes into account the length of the live range to make
long-lived registers more likely to get spilled than short-lived ones.
This encourages the spill chooser to choose slightly larger registers
which will affect a larger area of the program and hopefully we have to
spill fewer of them to get the same reduction in over-all register
pressure.
Shader-db results on Kaby Lake:
total spills in shared programs: 23664 -> 12050 (-49.08%)
spills in affected programs: 19243 -> 7629 (-60.35%)
helped: 296
HURT: 8
total fills in shared programs: 32028 -> 25139 (-21.51%)
fills in affected programs: 20378 -> 13489 (-33.81%)
helped: 295
HURT: 16
Of course, most of that is in Deus Ex...
Shader-db results on Kaby Lake (without Deus Ex):
total spills in shared programs: 6479 -> 5834 (-9.96%)
spills in affected programs: 3231 -> 2586 (-19.96%)
helped: 40
HURT: 4
total fills in shared programs: 17165 -> 17099 (-0.38%)
fills in affected programs: 6951 -> 6885 (-0.95%)
helped: 40
HURT: 7
Even without Deus Ex, the spill help is pretty respectable. The worst
hurt shaders were one compute shader in Aztec Ruins and one fragment
shader in KSP that were each hurt by around 13% fill 9% spill.
VkPipeline-db results on Kaby Lake:
total spills in shared programs: 9149 -> 8069 (-11.80%)
spills in affected programs: 5197 -> 4117 (-20.78%)
helped: 27
HURT: 16
total fills in shared programs: 26390 -> 25477 (-3.46%)
fills in affected programs: 12662 -> 11749 (-7.21%)
helped: 24
HURT: 22
The Vulkan results were decidedly more mixed but we don't have nearly as
many apps in that database yet.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Otherwise, there's artifacts when running Unigine Valley with
protocol version 2.
We can get away with not waiting for most buffers, but let's
be conservative.
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
The only tricky part is with protocol 0 we can either have
a display target or resource backing store. With protocol
2 we can have both. Make the map/unmap functions only deal
with the resource backing store.
v2: Handle MSAA texture case.
v3: spelling
v4: Fix dangling else (@prak)
v5: mmap --> os_mmap (@prak) + added comments (@gerddie)
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
This just moves everything to a helper function -- "flush_front_buffer"
will be used later.
virgl_vtest_resource_map / virgl_vtest_resource_unmap already take
care to map the display target.
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
We really need to wait under certain circumstances, or we can end
up writing to memory the same time the host is reading.
Partial revert of d6dc68 ("virgl: use uint16_t mask instead of separate booleans").
Test cases:
- dEQP-GLES31.functional.texture.texture_buffer.render_modify.as_vertex_array.bufferdata
on vtest protocol version 2
- Flickering during Alien Isolation
Fixes: d6dc68 ("virgl: use uint16_t mask instead of separate booleans")
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
In what might be my first case of finding a divergence between hardware
and simpenrose for v3d 4.x, it seems that despite what the spec claims,
you actually need specific values in the TYPE field for atomic ops.
Fixes dEQP-GLES31.functional.*.compswap.*
All of the affected shaders are in Mad Max. I noticed this while
looking at some other things. I tried a couple similar patterns, but
the affect on cycles was general negative. It may be worth revisiting
this later.
v2: Rebase on 1-bit Boolean changes.
All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282073 -> 15282053 (<.01%)
instructions in affected programs: 1192 -> 1172 (-1.68%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.43 x̃: 1
helped stats (rel) min: 1.16% max: 2.17% x̄: 1.65% x̃: 1.39%
95% mean confidence interval for instructions value: -1.73 -1.13
95% mean confidence interval for instructions %-change: -1.91% -1.38%
Instructions are helped.
total cycles in shared programs: 372595954 -> 372594532 (<.01%)
cycles in affected programs: 11477 -> 10055 (-12.39%)
helped: 14
HURT: 0
helped stats (abs) min: 76 max: 122 x̄: 101.57 x̃: 104
helped stats (rel) min: 7.76% max: 15.62% x̄: 12.94% x̃: 14.78%
95% mean confidence interval for cycles value: -111.05 -92.09
95% mean confidence interval for cycles %-change: -14.90% -10.98%
Cycles are helped.
No changes on any Gen6 or earlier platforms.
Reviewed-by: Matt Turner <mattst88@gmail.com>
All of the affected shaders are in Mad Max. The inner part of the
pattern is itself an open-coded sign(a). I tried using that as a
pattern, but the results were not good. A bunch of shaders were helped
for instructions, but overall cycles, spill, and fills were hurt.
v2: Rebase on 1-bit Boolean changes.
v3: Fix order of copysign() parameters in comments and commit message.
Noticed by Matt.
All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282141 -> 15282073 (<.01%)
instructions in affected programs: 6106 -> 6038 (-1.11%)
helped: 17
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 1.02% max: 2.20% x̄: 1.15% x̃: 1.06%
95% mean confidence interval for instructions value: -4.00 -4.00
95% mean confidence interval for instructions %-change: -1.30% -1.00%
Instructions are helped.
total cycles in shared programs: 372597886 -> 372595954 (<.01%)
cycles in affected programs: 32701 -> 30769 (-5.91%)
helped: 17
HURT: 0
helped stats (abs) min: 6 max: 216 x̄: 113.65 x̃: 118
helped stats (rel) min: 0.40% max: 21.86% x̄: 6.20% x̃: 5.83%
95% mean confidence interval for cycles value: -152.84 -74.45
95% mean confidence interval for cycles %-change: -8.89% -3.51%
Cycles are helped.
No changes on any Gen6 or earlier platforms.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Normally fsign generates -1, 0, or +1. The new scale factor, S, causes
fsign to generate -S, 0, or +S.
v2: Rebase on v2 changes in previous commit.
v3: Rebase on 85c35885b3 ("nir: Rework nir_src_as_alu_instr to not take
a pointer").
Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]
We test the condition, declare a few variables, then test the exact
same condition again. Let's not do that.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Other nir_src_as_* functions just take a nir_src. It's not that much
more memory copying and the constness preserving really isn't worth the
cognitive dissonance.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This 3d performance workaround was initially put in the kernel but the
media driver requires different settings so the register has been
whitelisted in i915 [1] and userspace drivers are left initializing it as
they wish.
[1] : https://patchwork.freedesktop.org/series/59494/
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This 3d performance workaround was initially put in the kernel but the
media driver requires different settings so the register has been
whitelisted in i915 [1] and userspace drivers are left initializing it as
they wish.
[1] : https://patchwork.freedesktop.org/series/59494/
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This 3d performance workaround was initially put in the kernel but the
media driver requires different settings so the register has been
whitelisted in i915 [1] and userspace drivers are left initializing it as
they wish.
[1] : https://patchwork.freedesktop.org/series/59494/
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
v2 (Jason):
- Merge shaderFloat16 and shaderInt8 enablement into a single patch.
- Merge extension enable.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
v2:
- Merge Float16 and Int8 capabilities into a single patch (Jason)
- Merged patch that enabled SPIR-V front-end checks for these caps
(except for Int8, which was already merged)
v3:
- Keep capabilities sorted (Jason)
v4:
- SpvCapabilityFloat16 support already added in master (Juan)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
v2:
- Adapted unit tests to make them consistent with the changes done
to the validation of half-float conversions.
v3 (Curro):
- Check all the accummulators
- Constify declarations
- Do not check src1 type in single-source instructions.
- Check for all instructions that read accumulator (either implicitly or
explicitly)
- Check restrictions in src1 too.
- Merge conditional block
- Add invalid test case.
v4 (Curro):
- Assert on 3-src instructions, as they are not validated.
- Get rid of types_are_mixed_float(), as we know instruction is mixed
float at that point.
- Remove conditions from not verified case.
- Fix brackets on conditional.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2:
- Add some tests with UB type too (Jason)
v3:
- consider implicit conversions from 2src instructions too (Curro).
v4:
- Do not check src1 type in single-source instructions (Curro).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
v2:
- Consider implicit conversions in 2-src instructions too (Curro)
- For restrictions that involve destination stride requirements
only validate them for Align1, since Align16 always requires
packed data.
- Skip general rule for the dst/execution type size ratio for
mixed float instructions on CHV and SKL+, these have their own
set of rules that we'll be validated separately.
v3 (Curro):
- Do not check src1 type in single-source instructions.
- Check restriction on src1.
- Remove invalid test.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The section 'Execution Data Types' of 3D Media GPGPU volume, which
describes execution types, is exactly the same in BDW and SKL+.
Also, this section states that there is a single execution type, so it
makes sense that this is the wider of the two floating point types
involved in mixed float mode, which is what we do for SKL+ and CHV.
v2:
- Make sure we also account for the destination type in mixed mode (Curro).
Acked-by: Francisco Jerez <currojerez@riseup.net>
It is very likely that this optimzation is never useful and we'll probably
just end up removing it, so let's not bother adding more cases to it for
now.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
NIR already has these and correctly considers exact/inexact qualification,
whereas the backend doesn't and can apply the optimizations where it
shouldn't. This happened to be the case in a handful of Tomb Raider shaders,
where NIR would skip the optimizations because of a precise qualification
but the backend would then (incorrectly) apply them anyway.
Besides this, considering that we are not emitting much math in the backend
these days it is unlikely that these optimizations are useful in general. A
shader-db run confirms that MAD and LRP optimizations, for example, were only
being triggered in cases where NIR would skip them due to precise
requirements, so in the near future we might want to remove more of these,
but for now we just remove the ones that are not completely correct.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There are no 8-bit immediates, so assert in that case.
16-bit immediates are replicated in each word of a 32-bit immediate, so
we only need to check the lower 16-bits.
v2:
- Fix is_zero with half-float to consider -0 as well (Jason).
- Fix is_negative_one for word type.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At the very least we need it to handle HF too, since we are doing
constant propagation for MAD and LRP, which relies on this pass
to promote the immediates to GRF in the end, but ideally
we want it to support even more types so we can take advantage
of it to improve register pressure in some scenarios.
v2 (Jason):
- Support 64-bit types too.
- Check if we need to set the half-float flag if the immediate already
existed.
- Multiply the size of the immediate by the width of the copy
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The hardware only allows a stride of 1 on a Byte destination for raw
byte MOV instructions. This is required even when the destination
is the NULL register.
Rather than making sure that we emit a proper NULL:B destination
every time we need one, just fix it at emission time.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now that we have the regioning lowering pass we can just put all of these
opcodes together in a single block and we can just assert on the few cases
of conversion instructions that are not supported in hardware and that should
be lowered in brw_nir_lower_conversions.
The only cases what we still handle separately are the conversions from float
to half-float since the rounding variants would need to fallthrough and we
are already doing this for boolean opcodes (since they need to negate), plus
there is also a large comment about these opcodes that we probably want to
keep so it is just easier to keep these separate.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This function is used in two different scenarios that for 32-bit
instructions are the same, but for 16-bit instructions are not.
One scenario is that in which we are working at a SIMD8 register
level and we need to know if a register is fully defined or written.
This is useful, for example, in the context of liveness analysis or
register allocation, where we work with units of registers.
The other scenario is that in which we want to know if an instruction
is writing a full scalar component or just some subset of it. This is
useful, for example, in the context of some optimization passes
like copy propagation.
For 32-bit instructions (or larger), a SIMD8 dispatch will always write
at least a full SIMD8 register (32B) if the write is not partial. The
function is_partial_write() checks this to determine if we have a partial
write. However, when we deal with 16-bit instructions, that logic disables
some optimizations that should be safe. For example, a SIMD8 16-bit MOV will
only update half of a SIMD register, but it is still a complete write of the
variable for a SIMD8 dispatch, so we should not prevent copy propagation in
this scenario because we don't write all 32 bytes in the SIMD register
or because the write starts at offset 16B (wehere we pack components Y or
W of 16-bit vectors).
This is a problem for SIMD8 executions (VS, TCS, TES, GS) of 16-bit
instructions, which lose a number of optimizations because of this, most
important of which is copy-propagation.
This patch splits is_partial_write() into is_partial_reg_write(), which
represents the current is_partial_write(), useful for things like
liveness analysis, and is_partial_var_write(), which considers
the dispatch size to check if we are writing a full variable (rather
than a full register) to decide if the write is partial or not, which
is what we really want in many optimization passes.
Then the patch goes on and rewrites all uses of is_partial_write() to use
one or the other version. Specifically, we use is_partial_var_write()
in the following places: copy propagation, cmod propagation, common
subexpression elimination, saturate propagation and sel peephole.
Notice that the semantics of is_partial_var_write() exactly match the
current implementation of is_partial_write() for anything that is
32-bit or larger, so no changes are expected for 32-bit instructions.
Tested against ~5000 tests involving 16-bit instructions in CTS produced
the following changes in instruction counts:
Patched | Master | % |
================================================
SIMD8 | 621,900 | 706,721 | -12.00% |
================================================
SIMD16 | 93,252 | 93,252 | 0.00% |
================================================
As expected, the change only affects SIMD8 dispatches.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Empirical testing shows that gen8 has a bug where MAD instructions with
a half-float source starting at a non-zero offset fail to execute
properly.
This scenario usually happened in SIMD8 executions, where we used to
pack vector components Y and W in the second half of SIMD registers
(therefore, with a 16B offset). It looks like we are not currently doing
this any more but this would handle the situation properly if we ever
happen to produce code like this again.
v2 (Jason):
- Move this workaround to the lower_regioning pass as an additional case
to has_invalid_src_region()
- Do not apply the workaround if the stride of the source operand is 0,
testing suggests the problem doesn't exist in that case.
v3 (Jason):
- We want offset % REG_SIZE > 0, not just offset > 0
- Use a helper to compute the offset
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Broadwell has restrictions that apply to Align16 half-float that
make the Align16 implementation of this invalid for this platform.
Use the gen11 path for this instead, which uses Align1 mode.
The restriction is not present in cherryview, gen9 or gen10, where
the Align16 implementation seems to work just fine.
v2:
- Rework the comment in the code, move the PRM citation from the
commit message to the comment in the code (Matt)
- Cherryview isn't affected, only Broadwell (Matt)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
We were assuming 32-bit elements. Also, In SIMD8 we pack 2 vector components
in a single SIMD register, so for example, component Y of a 16-bit vec2
starts is at byte offset 16B. This means that when we compute the offset of
the elements to be differentiated we should not stomp whatever base offset we
have, but instead add to it.
v2
- Use byte_offset() helper (Jason)
- Merge the fix for SIMD8: using byte_offset() fixes that too.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Source0 and Destination extract the floating-point precision automatically
from the SrcType and DstType instruction fields respectively when they are
set to types :F or :HF. For Source1 and Source2 operands, we use the new
1-bit fields Src1Type and Src2Type, where 0 means normal precision and 1
means half-precision. Since we always use the type of the destination for
all operands when we emit 3-source instructions, we only need set Src1Type
and Src2Type to 1 when we are emitting a half-precision instruction.
v2:
- Set the bit separately for each source based on its type so we can
do mixed floating-point mode in the future (Topi).
v3:
- Use regular citation style for the comment referencing the PRM (Matt).
- Decided not to add asserts in the emission code to check that only
mixed HF/F types are used since such checks would break negative tests
for brw_eu_validate.c (Matt)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We are now using these bits, so don't assert that they are not set. In gen8,
if these bits are set compaction is not possible. On gen9 and CHV platforms
set_3src_control_index() checks these bits (and others) against a table to
validate if the particular bit combination is eligible for compaction or not.
v2
- Add more detail in the commit message explaining the situation for SKL+
and CHV (Jason)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is available since gen8.
v2: restore previously existing assertion.
v3: don't use separate tables for gen7 and gen8, just assert that we
don't use half-float before gen8 (Matt)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The original SrcType is a 3-bit field that takes a subset of the types
supported for the hardware for 3-source instructions. Since gen8,
when the half-float type was added, 3-source floating point operations
can use use mixed precision mode, where not all the operands have the
same floating-point precision. While the precision for the first operand
is taken from the type in SrcType, the bits in Src1Type (bit 36) and
Src2Type (bit 35) define the precision for the other operands
(0: normal precision, 1: half precision).
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- make 16-bit be its own separate case (Jason)
v3:
- Drop the result_int temporary (Jason)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Extended math with half-float operands is only supported since gen9,
but it is limited to SIMD8. In gen8 we lower it to 32-bit.
v2: quashed together the following patches (Jason):
- intel/compiler: allow extended math functions with HF operands
- intel/compiler: lower 16-bit extended math to 32-bit prior to gen9
- intel/compiler: extended Math is limited to SIMD8 on half-float
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
(allow extended math functions with HF operands,
extended Math is limited to SIMD8 on half-float)
There are some hardware restrictions that brw_nir_lower_conversions should
have taken care of before we get here.
v2:
- rebased on top of regioning lowering pass
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Since we handle booleans as integers this makes more sense.
v2:
- rebased to incorporate new boolean conversion opcodes
v3:
- rebased on top regioning lowering pass
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v2)
Going forward having these split is a bit more convenient since these two
groups have different restrictions.
v2:
- Rebased on top of new regioning lowering pass.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Some conversions are not directly supported in hardware and need to be
split in two conversion instructions going through an intermediary type.
Doing this at the NIR level simplifies a bit the complexity in the backend.
v2:
- Consider fp16 rounding conversion opcodes
- Properly handle swizzles on conversion sources.
v3
- Run the pass earlier, right after nir_opt_algebraic_late (Jason)
- NIR alu output types already have the bit-size (Jason)
- Use 'is_conversion' to identify conversion operations (Jason)
v4:
- Be careful about the intermediate types we use so we don't lose
range and avoid incorrect rounding semantics (Jason)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Use the raw version (ie. IDXEN=0) because vindex is unused.
Use the old intrinsic for compare&swap because the new one
hangs the GPU for some reasons.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
llvm 8 removed saturated unsigned add / sub x86 sse2 intrinsics, and
now llvm 9 removed the signed versions as well - they were proposed for
removal earlier, but the pattern to recognize those was very complex,
so it wasn't done then. However, instead of these arch-specific
intrinsics, there's now arch-independent intrinsics for saturated
add / sub, both for signed and unsigned, so use these.
They should have only advantages (work with arbitrary vector sizes,
optimal code for all archs), although I don't know how well they work
in practice for other archs (at least for x86 they do the right thing).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110454
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes a race condition where anv_gen_files are executed before
genxml files, which causes a build failure
v2: add dependency on idep_genxml (Lionel)
Fixes: d1992255bb
("meson: Add build Intel "anv" vulkan driver")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The Gen10+ expected format adds an additional counter which we can't
disclose yet. We can still make the size of the expected query result
match.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
We would like to reuse performance query metrics in other APIs. Let's
make the query code dealing with the processing of raw counters into
human readable values API agnostic.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This blit can fail, but this is not new; in the old version we
didn't even try to blit in this case. So let's just document the
limitation for now, and leave this for another day.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
When running on OpenGL ES, we can't just map any format for reading,
because of limitations on glReadPixels. So let's fall back to the
blit code-path, and translate the pixels to the correct format in the
end.
This fixes the remaining failures of KHR-GL32.packed_pixels.* apart
from the sRGB tests.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Without this, we can't for instance convert between r8_sint and
r8g8b8a8_sint. But that's pretty useful, so let's support it as well.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
We currently don't support writing to resources that uses a temporary
staging-resource to resolve the pixels. If a write-bit was set, we
forgot to perform a blit back to the old resource, followed by trying to
update the wrong resource, which lacks backing-storage. The end-result
would be that nothing useful happened.
This approach also fixes a few smaller bugs, like using the wrong box
(without x y and z zeroed out), which means a partial update of a
multisampled texture could result in the wrong part of the texture being
updated.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
It's hard to read the code that decides if we want to queue up an unmap
or destroy the transfer right away. So let's make it a bit simpler, by
setting a bool in case we want to queue it.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
This isn't the temporary resource itself, it's the template that we'll
create the resource from. So let's name it appropriately.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
This is a port of Jason's 8379bff6c4
from i965 to iris. We can't find anything relevant in the documentation
and no one we've talked to has been able to help us pin down a solution.
Unfortunately, we have to put the hack in both iris_blit() and
iris_copy_region(). st/mesa's CopyImage() implementation sometimes
chooses to use pipe->blit() instead of pipe->resource_copy_region().
For blits, we only do the hack if the blit source format doesn't match
the underlying resource (i.e. it's reinterpreting the bits). Hopefully
this should not be too common.
We were failing to set up payload[1] for use by LocalInvocationIndex/ID
and shared variable accesses if gl_WorkGroupID/gl_GlobalInvocationID
wasn't used (possibly because you only have one workgroup). You're always
going to use payload[1], and payload[0] is common enough and we have DCE
in the backend to clean it up if it happens to not be used.
Fixes assertion failures in the CTS since Karol's cleanup when NIR started
noticing that we were reading an invalid component.
Fixes: 5450f1c9fb ("v3d: prefer using nir_src_comp_as_int over nir_src_as_const_value")
v2: When available, include the opcode name too. (Karol)
v3: Use more to_string helpers. (Karol)
Include the wrong bit_size in those failures.
Include the capability number in spv_check_supported.
Provide vtn_fail_with_* macros to avoid noise in the call sites.
v4: Provide macros only for opcode and decoration, which have enough
usages to justify them. (Jason)
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Also, use a set to identify repeated values. The previous arrangement
worked when the repetitions were one after another, but in some of the
new cases they are not.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
This patch changes the GL_VENDOR string from "Mesa Project" to "Intel".
This makes GLX_MESA_query_renderer report "Vendor: Intel (0x8086)"
instead of "Vendor: Mesa Project (0x8086)" which is arguably wrong.
We now also use a consistent vendor string across Windows and Linux.
It also prepends "Mesa" to the GL_RENDERER string, both to credit the
community and have a distinguishing mark between the two drivers. We
drop "DRI" compared to i965, as it's not really that important.
Improves performance in Portal by 1.8x. Iris is now 3.86% faster
than i965 at the portal-d1.dem timedemo on my Kabylake laptop. One
change is that Portal selects the MapBufferRange path based on the
vendor string, and iris's BufferSubData path is still missing the
storage invalidation optimization.
This takes the stupid simplest and most reliable approach to reducing
redundancy that I could come up with: Just use the struct declaration
as the cach key. This cuts the size of the generated C file to about
half and takes about 50 KiB off the .data section.
size before (release build):
text data bss dec hex filename
5363833 336880 13584 5714297 573179 _install/lib64/libvulkan_intel.so
size after (release build):
text data bss dec hex filename
5229017 285264 13584 5527865 545939 _install/lib64/libvulkan_intel.so
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Order of operations is important, otherwise we'll find the program we
just uploaded as the "old" compile and get confused why nothing is
different between the two keys.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I was lazy earlier and hadn't bothered typing / refactoring this.
Now I'm hitting some extra recompiles and would like to see why.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The i965 driver has a bunch of code to compare two sets of program keys
and print out the differences. This can be useful for debugging why a
shader needed to be recompiled on the fly due to non-orthogonal state
dependencies. anv doesn't do recompiles, so we didn't need to share
this in the past - but I'd like to use it in iris.
This moves the bulk of the code to the compiler where it can be reused.
To make that possible, we need to decouple it from i965 - we can't get
at the brw program cache directly, nor use brw_context to print things.
Instead, we use compiler->shader_perf_log(), and simply pass in keys.
We put all of this debugging code in brw_debug_recompile.c, and only
export a single function, for simplicity. I also tidied the code a
bit while moving it, now that it all lives in one file.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This patch will add support for frame_cropping when the input size is not
matched with aligned size. Currently vaapi driver ignores frame cropping
values provided by client. This change will update SPS nalu with proper
cropping values.
Signed-off-by: Satyajit Sahu <satyajit.sahu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
This patch will add support for frame_cropping when the input size is not
matched with aligned size. Currently vaapi driver ignores frame cropping
values provided by client. This change will update SPS nalu with proper
cropping values.
v2: Moving default crop setting to else when enc_frame_cropping_flag is not set.
Signed-off-by: Satyajit Sahu <satyajit.sahu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
This patch adds cropping flags for H264 in pipe_h264_enc_pic_control.
Signed-off-by: Satyajit Sahu <satyajit.sahu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Both Vulkan and OpenGL might be using glsl_types simultaneously or we
can also have multiple concurrent Vulkan instances using glsl_types.
Patch adds a one time init to track number of users and will release
types only when last user calls _glsl_type_singleton_decref().
This change fixes glsl_type memory leaks we have with anv driver.
v2: reuse hash_mutex, cleanup, apply fix also to radv driver and
rename helper functions (Jason)
v3: move init, destroy to happen on GL context init and destroy
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
bash subshells don't inherit the -e option by default, so failures in
the subshell commands wouldn't cause the CI job to fail.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
We either compile these locally, or they are dependencies of other
packages we install.
v2:
* Adapt to leaving self-compiled packages untouched.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
We now use the C frontend of GCC 8 instead of 6 (required tweaking the
before_script for the clang job). We cannot use the C++ frontend of GCC
7 or newer yet, because upstream GCC 7 changed some C++ name mangling
stuff in backwards incompatible ways, and LLVM < 6.0 packages aren't
available in buster.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The APT archive used by the Ubuntu docker image can be slow, even timing
out sometimes, causing spurious failures of the containers-build job.
The Debian docker image uses deb.debian.org, which is backed by a
content distribution network.
One downside is that stretch only has GCC 6, whereas bionic had 7.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
virgl_drm_fence can wrap either a fence fd or a virgl_hw_res. Because a
fence fd is cheaper than a virgl_hw_res, we use it whenever it is
available.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fence fds are cheaper than resources. We want to let winsys make the
decision and use fence fds whenever they are supported. This commit
prepares the work.
For the moment, we create a resource _and_ a fence fd when
supports_fences is true. This will be fixed such that we create a
resource _or_ a fence fd. (And because of a version check bug that we
will fix later, supports_fences is actually never true).
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
It does not need help from the driver. This also fixes one issue where
the fence is ignored when the transfer queue is full.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
0 is a valid value as max index, and the code handles it fine. This isn't
commonly seen, as it will only happen with array declarations of size 1.
Fixes piglit tests/shaders/complex-loop-analysis-bug.shader_test
Fixes: a3c898dc97 "gallivm: fix improper clamping of vertex index when fetching gs inputs"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110441
Reviewed-by: Brian Paul <brianp@vmware.com>
Until now, we were only doing this when linking a SSO
program. However, nothing avoids linking a non SSO program which
doesn't have both a VS and FS. In those cases, we also need to report
the usual linking errors, if happening.
v2: Use a better name for the renamed function (Timothy).
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Juan A. Suarez takes his place and the shorter loop makes Dylan
repeating earlier.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
We need to preserve PIPE_TRANSFER_FLUSH_EXPLICIT, DISCARD_RANGE, and
so on, but don't want to pass them to iris_bo_map(). So, keep them all,
but mask them off when calling map.
Chris Wilson told me to do this a long time ago and he was right.
Budgie Window Manager is an increasingly used alternative to GNOME and MATE.
Default in Solus OS, also used in other distros.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
There's still a few in here, but those docs are already so out of date
that it probably makes more sense to delete them. Such as the GLES
docs which still claim we only support 1.1 and 2.0, with no mention of
3.x at all.
v2: - Add docs for testing back end (Eric Engestrom)
- Drop more autootols references
- meson is now required not recommended
- Add $PWD
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Matt Turner <mattst88@gmail.com>
Even if we don't use local buffers in general. Turns out that even
though the performance is not the best the kernel still does it
better than our own list.
We still have to keep the radv bo list for buffers that are shared
externally.
This improves Talos on lowest quality setting (so as CPU bound as
possible) by ~10% if the global bo list is enabled.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
In radv we had a separate flag to actually use it + an env option
to experimentally use it.
The common code setting has_local_buffers to false of course broke
that experimental option.
Also the "enable on APU" did not make sense for RADV as it is still
disabled by default.
Fixes: b21a4efb55 "radv/winsys: allow local BOs on APUs"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Seems it was missing the "/ ma + 0.5" and the order was swapped.
Fixes: a1a2a8dfda ('nir: add AMD_gcn_shader extended instructions')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
These were updated in version 1.1.106 of vulkan.h to make more sense
with the extension names. We may as well keep with the times.
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
When gathering info for unmovable types we need to handle arrays.
While we dont support packing/moving arrays we do support packing
scalar components with these arrays.
Fixes piglit:
tests/spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-interleave-range.shader_test
Fixes: 5eb17506e1 ("nir: do not pack varying with different types")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Pipeline statistics queries should not count BLORP's rectangles.
(23) How do operations like Clear, TexSubImage, etc. affect the
results of the newly introduced queries?
DISCUSSION: Implementations might require "helper" rendering
commands be issued to implement certain operations like Clear,
TexSubImage, etc.
RESOLVED: They don't. Only application submitted rendering
commands should have an effect on the results of the queries.
Piglit's arb_pipeline_statistics_query-vert_adj exposes this bug when
the driver is hacked to always perform glBufferData via a GPU staging
copy (for debugging purposes).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order to pluck off components properly.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order copy components around properly.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Now that nir_const_value is a scalar, we don't need the switch on bit
size in order to swizzle them properly.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
v2: remove & operator in a couple of memsets
add some memsets
v3: fixup lima
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
we already assert above that there are no more than 3 sources, so it
doesn't make sense to use an array of 4 sources
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
While we're here, fix a typo which caused it to actually return a vec4
with the third and fourth components zero.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
On Mali hardware (supported by Panfrost and Lima), the fixed-function
transformation from world-space to screen-space coordinates is done in
the vertex shader prior to writing out the gl_Position varying, rather
than in dedicated hardware. This commit adds a shared NIR pass for
implementing coordinate transformation and lowering gl_Position writes
into screen-space gl_Position writes.
v2: Run directly on derefs before io/vars are lowered to cleanup the
code substantially. Thank you to Qiang for this suggestion!
v3: Bikeshed continues.
v4: Add to Makefile.sources (per Jason's comment). Bikeshed comment.
Ian and Qiang's reviews are from v3, but no real functional changes from
v4. Rob's review is from v4.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Suggested-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
As part of this cleanup, we use the newly-exposed
u_vbuf_get_minmax_index, deduplicating quite a bit of bookkeeping. We
also centralize the draw_flags tracking to make this code cleaner /
futureproofed; we have already had bugs regarding this field so we might
as well get it right now.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This was used as a workaround for uniform sizing which was fixed in
771adffe ("st: Lower uniforms in st in the...")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Fixes the following building error happening with Android build system:
external/mesa/src/gallium/auxiliary/draw/draw_gs.c:740:79:
error: address of array 'draw->gs.tgsi.machine->PrimitiveOffsets' will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion]
if (!draw->gs.tgsi.machine->Primitives[i] || !draw->gs.tgsi.machine->PrimitiveOffsets)
~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
1 error generated.
Fixes: 7720ce3 ("draw: add support to tgsi paths for geometry streams. (v2)")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Hardware supports writing back Z/S buffers and sampling from them,
so add support for that.
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Icenowy Zheng <icenowy@aosc.io>
Looks like it's somehow used by subsequent PP job, so we have to
preserve its contents until PP job is done.
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Icenowy Zheng <icenowy@aosc.io>
Adding \ prior to " in llvm version string fixes the following building errors:
external/mesa/src/gallium/drivers/r600/r600_pipe_common.c:1290:14:
error: expected ')'
", LLVM " MESA_LLVM_VERSION_STRING
^
<command line>:8:34: note: expanded from here
^
external/mesa/src/gallium/drivers/r600/r600_pipe_common.c:1287:10:
note: to match this '('
snprintf(rscreen->renderer_string, sizeof(rscreen->renderer_string),
^
1 error generated.
Fixes: 05b114e ("simplify LLVM version string printing")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
In 628c9ca908 I forgot to apply the same -4Gb of the high address
of the high heap VMA. This was previously computed in the
HIGH_HEAP_MAX_ADDRESS.
Many thanks to James for pointing this out.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Xiong, James <james.xiong@intel.com>
Fixes: 628c9ca908 ("anv: store heap address bounds when initializing physical device")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We can use the same register spilling infrastructure for our loads/stores
of indirect access of temp variables, instead of doing an if ladder.
Cuts 50% of instructions and max-temps from 2 KSP shaders in shader-db.
Also causes several other KSP shaders with large bodies and large loop
counts to not be force-unrolled.
The change was originally motivated by NOLTIS slightly modifying register
pressure in piglit temp mat4 array read/write tests, triggering register
allocation failures.
This commit adds new nir_load/store_scratch opcodes which read and write
a virtual scratch space. It's up to the back-end to figure out what to
do with it and where to put the actual scratch data.
v2: Drop const_index comments (by anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
While waiting for the CSD UABI to get reviewed, I keep having to rebase
the CS patch. Just land the compiler side for now to keep it from
diverging.
For now this covers just GLES 3.1 compute shaders, not CL kernels.
We're using ARB_debug_output for the main shader-db, but I had this env
var left around from the shader-db-2 support (vc4 apitrace-based). Keep
the env var around since it's nice sometimes to get the stats on a shader
you're optimizing without having to do a shader-db run, but drop the old
formatting that's not useful and keeps tricking me when I go to add
another measurement to the shader-db output.
The constant_index slots are named right there in the intrinsic
definition, and the comment is just a chance to get out of sync. Noticed
while reviewing the lower_to_scratch changes that copy-and-pasted wrong
comments, and load_ubo and load_per_vertex_output had incorrect comments
currently.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
gl_nir_lower_samplers_as_deref splits structure uniform variables,
creating new variables for individual fields. As part of that, it
calculates a new location. It then never set this on the new variables.
Thanks to Michael Fiano for finding this bug. Fixes crashes on i965
with Piglit's new tests/spec/glsl-1.10/execution/samplers/uniform-struct
test, which was reduced from the failing case in Michael's app.
Fixes: f003859f97 nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
32-bit needs mmap64 for 64-bit offsets. We get 64-bit offsets from kernel.
Signed-off-by: Mateusz Krzak <kszaquitto@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Required for 64-bit kernel to interpret the pointer from 32-bit userspace.
Signed-off-by: Mateusz Krzak <kszaquitto@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
ac_build_image_opcode() casts if necessary and buffer images
are casted too.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: handle atomics as well
make use of nir_rewrite_image_intrinsic
v3: remove call to nir_remove_dead_derefs
v4: (Timothy Arceri) dont actually call lowering yet
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Both processors of Mali Utgard are float-only, so bool are not
acceptable data type of them. Fortunately the NIR compiler
infrastructure has a lower pass to lower bool to float.
Call this lower pass to lower bool to float for both GP and PP. This
makes Glamor on Xorg server 1.20.3 at least doesn't hang when starting
gtk3-demo.
The old map of nir op bcsel is changed to fcsel, and the map of b2f32 in
PP is dropped because it's not needed now (it's originally only mapped
to ppir_op_mov).
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
We were pinning it for compute shaders, and pinning it when restoring
saved buffers, but we never actually pinned it in the original batch
for VS/TCS/TES/GS/FS.
Fixes rendering in GFXBench5's Tessellation demo and a bunch of Piglit
geometry shader tests.
We can then reuse those bounds to initialize the VMA heaps at logical
device creation.
This fixes an issue on EHL which has only 36bits of VMA. We were
incorrectly using the fixed 48bits upper bound to initialize the
logical device heap, resulting in addresses beyong the device's
limits.
v2: Don't confuse heap size (limited by system memory) and VMA size
(limited by number of addressing bits the platform has)
v3: Fix low heap vma_size :( (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: James Xiong <james.xiong@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
Our exec masking introduces lots of redundant flags updates, and even
without that there will be cases where NIR comparisons on the same sources
for different reasons may generate the same comparison instruction before
the selection.
total instructions in shared programs: 6492930 -> 6460934 (-0.49%)
total uniforms in shared programs: 2117460 -> 2115106 (-0.11%)
total spills in shared programs: 4983 -> 4987 (0.08%)
total fills in shared programs: 6408 -> 6416 (0.12%)
This allows using the Marvell Armada display controllers (with the
armada drm modesetting driver) along with the render-only drivers,
such as Etnaviv on an OLPC XO-1.75 laptop.
v2:
- Add to Android.mk too
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Reviewed-by: Eric Anholt <eric@anholt.net>
As we have already prepared for using util_blitter, use it to implement
lima_blit.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Currently the lima driver saves the framebuffer state in its
from-scratch struct lima_context_framebuffer. However, util_blitter
requires to save framebuffer with standard struct
pipe_framebuffer_state.
Make the lima_context_framebuffer a subtype of the standard
pipe_framebuffer_state, thus the standard part can be used for
util_blitter framebuffer state saving.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
The set_sample_mask function is required in util_blitter.
Add a dummy one to make util_blitter work.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Fixes a couple of Coverity warnings CID 1444626.
Fixes: e30804c602 ("nir/radv: remove restrictions on opt_if_loop_last_continue()")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
In commit 3b3653c4cf we decided not to use bare types; hence do not use
bare type when comparing with interface type to find out if the xfb
variable is an array block.
This fixes dEQP-VK.transform_feedback.* tests.
Fixes: 3b3653c4cf ("nir/spirv: don't use bare types, remove assert in
split vars for testing")
CC: Dave Airlie <airlied@redhat.com>
CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We never want to display a transfer-temp surface, so let's ignore that
flag when calculating the new binding flags.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
For formats with multiple planes, application will pass a num_planes
sized fds array which should be initialized properly in case fds amount
utilized by the driver is less than the number of planes.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Enable using lima for KMS renderonly. This still needs KMS driver
name mapping to kmsro to be used automatically.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
This helper function can be used by driver which
always need min/max index.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
This is for the case that user only know a max size
it wants to append to the array and enlarge the array
capacity before writing into it.
v2:
- rename newsize to newcap
- rename util_dynarray_enlarge to util_dynarray_grow_cap
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Without that kmscube with GALLIUM_TRACE would segfault like:
#0 0x0000000000000000 in ()
#1 0x0000ffff8f311760 in dri2_create_fence_fd (_ctx=0xaaaae266b8b0, fd=10) at ../src/gallium/state_trackers/dri/dri_helpers.c:122
#2 0x0000ffff90788670 in dri2_create_sync (drv=0xaaaae2667910, disp=0xaaaae26691f0, type=12612, attrib_list=0xaaaae26b9290) at ../src/egl/drivers/dri2/egl_dri2.c:2993
#3 0x0000ffff90776a9c in _eglCreateSync (disp=0xaaaae26691f0, type=12612, attrib_list=0xaaaae26b9290, orig_is_EGLAttrib=0, invalid_type_error=12292) at ../src/egl/main/eglapi.c:1823
#4 0x0000ffff90776be4 in eglCreateSyncKHR (dpy=0xaaaae26691f0, type=12612, int_list=0xfffff662e828) at ../src/egl/main/eglapi.c:1848
Signed-off-by: Guido Günther <agx@sigxcpu.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
libintel_common depends on libintel_compiler, but it contains debug
functionality that is needed by libintel_compiler. Break the circular
dependency by moving gen_debug files to libintel_dev.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Same as I did for V3D, drop all this code trying to GC the
non-indirectly-loaded uniforms from the UBO that's used for indirect
access of gallium cb[0]. While it does successfully drop some of those,
it came at the cost of uploading the VS's indirect unifroms twice, for the
bin and render versions of the shader.
With the UBO loads simplified, I was also able to easily backport V3D's
change to pack a UBO offset into the uniform_data[] field so that we don't
need to do the add of the uniform base in the shader.
As a bonus, now vc4 doesn't depend on mesa/st type_size functions.
total uniforms in shared programs: 25514 -> 25490 (-0.09%)
total instructions in shared programs: 77019 -> 76836 (-0.24%)
PIPE_CAP_PACKED_UNIFORMS conflates several things: Lowering uniforms i/o
at the st level instead of the backend, packing uniforms with no padding
at all, and lowering to UBOs.
Requiring backends to lower uniforms i/o for !PIPE_CAP_PACKED_UNIFORMS
leads to the driver needing to either link against the type size function
in mesa/st, or duplicating it in the backend. Given that all backends
want this lower-io as far as I can tell, just move it to mesa/st to
resolve the link issue and avoid the driver author needing to understand
st's uniforms layout.
Incidentally, fixes uniform layout failures in nouveau in:
dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment
dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex
dEQP-GLES2.functional.shaders.struct.uniform.sampler_array_fragment
dEQP-GLES2.functional.shaders.struct.uniform.sampler_array_vertex
and I think in Lima as well.
v2: fix indents
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the user didn't provide a pipeline cache and we're using the
default internal pipeline cache, then we shouldn't consider a cache
hit for VK_EXT_pipeline_creation_feedback as the application did not
provide a cache.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6601e5d6fc ("anv: implement VK_EXT_pipeline_creation_feedback")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
also set some constants for SSBOs.
With that it can compile the shader from:
dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.18
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
While we're at it, prefix the string with "VIRGL: ", to match similar
code elsewhere in virgl.
Fixes: d7b3196976 ("virgl: Return an error if we use fp64 on top of GLES")
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
This is needed to properly handle interpolateAt* when the input to be
interpolated is passed as array in the original GLSL.
Currently, the the GLSL compiler would lower selecting the correct input so
that the interpolant parameter to interpolateAt* is a temporary, and this
can not be used to create a valid shader on the host side, because here the
parameter must a shader input.
By allowing the passing the created TGSI allows to create proper GLSL.
This is related to the virglrenderer bug
https://gitlab.freedesktop.org/virgl/virglrenderer/issues/74
v2: Squash the two patches handling these flags into another
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
PIPE_CAP_TGSI_SKIP_SHRINK_IO_ARRAYS is added to indicate whether the TGSI
pass to shrink IO arrays should be skipped to enforce the originally declared array
sizes and locations instead.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Should be safe to enable as all instructions seem to support 16-bit.
Unfortunately, there is no CTS test.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
virgl render complains about "Illegal resource" when running
dEQP-EGL.functional.color_clears.single_context.gles2.rgb888_window,
the reason is that a zero bind value was given for temp resource.
Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
This patch does it as late as possible so the potential extra
basic blocks don't inhibit other optimizations.
Big thanks to Jason for writing the lowering pass.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Otherwise nir_lower_non_uniform_access crashes when it tries
to get the access of a load_ubo.
Fixes: 8ed583fe52 "spirv: Handle the NonUniformEXT decoration"
Fixes: e50ab2c0f2 "nir: Add access flags to deref and SSBO atomics"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
CTS: GL45-CTS.compute_shader.resources-max
Fixes: 4e1e8f684b "glsl: remember which SSBOs are not read-only and pass it to gallium"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
So far ANV was advertising 4 bits for both subTexelPrecisionBits and
mipmapPrecisionBits. But these values were not actually verified.
But it seems the right value is actually 8 bits for both cases.
Unfortunately Intel PRM does not clarify how many bits the hardware use.
For the mipmap case, there is the following reference in PRM Volume 6
(3D Media GPGPU), specifically in LOD Computation Pseudocode:
```
Bias: S4.8
MinLod: U4.8
MaxLod: U4.8
Base: U4.1
MIPCnt: U4
SurfMinLod: U4.8
ResMinLod: U4.8
``
We have other clues, though:
- On one side, dEQP-VK.texture.explicit_lod.* tests fail when using 4
bits, but work when using 8 bits. These tests try to mimic the expected
behaviour as much real as possible, and they use the reported
subTexelPrecisionBits and mipmapPrecisionBits reported to get this.
- On the other side, the equivalent driver for Windows is reporting 8
bits for both elements. Not sure if they got to verify it from the PRM
or from a diffent source.
CC: Jason Ekstrand <jason@jlekstrand.net>
CC: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
From the OpenGL 4.60.5 spec, section 4.4.1 Input Layout Qualifiers,
Page 67, (Location aliasing):
" Further, when location aliasing, the aliases sharing the location
must have the same underlying numerical type and bit
width (floating-point or integer, 32-bit versus 64-bit, etc.) and
the same auxiliary storage and interpolation qualification."
Additionally, we have improved the linker error descriptions.
Specifically, when taking structs into account we were producing a
linker error because we assumed that all components in each location
were used and that would cause component aliasing. This is not
accurate of the actual problem. Now, the failure specifies that the
underlying numerical type incompatibility is the cause for the
failure.
Fixes the following piglit test:
tests/spec/arb_enhanced_layouts/linker/component-layout/vs-to-fs-width-mismatch-double-float.shader_test
v2:
- Do not assert if we see invalid numerical types. These come
straight from shader code, so we should produce linker errors if
shaders attempt to do location aliasing on variables that are not
numerical such as records.
- While we are at it, improve error reporting for the case of
numerical type mismatch to include the shader stage.
v3:
- Allow location aliasing of images and samplers. If we get these
it means bindless support is active and they should be handled
as 64-bit integers (Ilia)
- Make sure we produce link errors for any non-numerical type
for which we attempt location aliasing, not just structs.
v4:
- Rebased with minor fixes (Andres).
- Added fixing tag to the commit log (Andres).
v5:
- Remove the helper function and check individually for the
underlying numerical type and bit width (Timothy).
- Implicitly, assume that any non-treated type which is checked for
its underlying numerical type is either integer or
float and has a defined bit width (Timothy).
- Implicitly, assume that structs are the only non-treated
non-numerical type (Timothy).
- Improve the linker error descriptions and commit log (Andres).
Fixes: 13652e7516 ("glsl/linker: Fix type checks for location aliasing")
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The offset alignment must be set to s16 because the tile cache is
implemented to require this.
This enables ARB_buffer_texture_range and OES_texture_buffer for
softpipe. The according deqp-gles31 tests pass.
Also update the feature table.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
With buffers the addressing is done on a per-byte bases so the code
path for normal textures doesn't work properly. Also add an assert
to make sure that the bit cound for storing the X coordinate is
large enough.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
With buffers the addressing is done on a per byte basis and we with
a maximal block size of 16 byte we have to take into acount four more
bits. For simplicity just remove the TEX_TILE_SIZE_LOG2, which is 5 bit.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
For the gather op no magnifictaion filter is provided, so always use
the filter given for minification (which is the linear filter)
Fixes: 0dff1533f2
softpipe: Use mag texture filter also for clamped lod == 0
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
We have a pass to lower global registers to locals and many drivers
dutifully call it. However, no one ever creates a global register ever
so it's all dead code. It's time we bury it.
Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All we ever do is initialize it to zero, clone it, print it, and
validate it. No one ever sets or uses it.
Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will pass the multi draw through to the host if it has
support for it instead of using the st to emulate it
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
When I added indirect support I forgot this, however to use it
now we need to check for a new enough capability on the host side.
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
As defined in SPV_NV_compute_shader_derivatives. These control how the
invocations are arranged in a CS when doing derivative and related
operations (which are also enabled by the extension).
Since we expect valid SPIR-V, we don't need to do more work at SPIR-V
level to enable the derivative and related operations to be called.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
To enable NV_compute_shader_derivatives, which allows derivatives (and
texture lookups with implicit derivatives) in compute shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will make that step visible in NIR_PRINT=1.
v2: Also use the macro for the cleanup passes.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This was needed when certain intrinsics were lowered to other ones
that were defined by the same pass. After 060817b2 "intel,nir: Move
gl_LocalInvocationID lowering to nir_lower_system_values" we don't
need the loop anymore.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When using quads, instead of mapping the elements to the next 4 local
invocation indices, we map the two next in the "current" row and two
next in the "next row". A side effect is that a thread will execute
the indices in a different order.
We now perform the lowering of both local invocation ID and index
together -- and don't rely anymore on lowering done by
nir_lower_system_values. That is convenient when doing the math for
quads, because we need X and Y to get the right invocation index.
When the pass progresses, fold the constants and clean up to reduce
the noise from the indexing math.
This implements the derivative_group_quadsNV semantics from
NV_compute_shader_derivatives.
v2: Take subgroup_id into account, otherwise only values in the first
subgroup would be used. (Jason)
v3: Calculate invocation index and ID together, to avoid duplicating
some math in the quads case when both index and ID are used. (Jason)
v4: Don't call cleanup passes as part of the lowering, let that to the
call site. (Jason)
Change calculation to use less instructions. (Jason)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v3)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When using NV_compute_shader_derivatives to set a derivative group,
a compute shader supports texture with implicit LOD calculation, so
don't set an explicit LOD.
Note if the extension is used but the derivative group is not
specified, it will default to LOD=0 as before.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In compute shaders if no derivative group is defined, the derivatives
will always be zero. Specified in NV_compute_shader_derivatives.
To make the check more convenient, add a "info" local variable to the
generated code so we can refer to it in the Python rules. (Jason)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
NV_compute_shader_derivatives allow selecting between two possible
arrangements (quads and linear) when calculating derivatives and
certain subgroup operations in case of Vulkan. So parse and propagate
those up to shader_info.h.
v2: Do not fail when ARB_compute_variable_group_size is being used,
since we are still clarifying what is the right thing to do here.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.
However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.
28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)
Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)
The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.
This gives +10% FPS with Doom on my Vega56.
Rhys Perry also benchmarked Doom on his VEGA64:
Before: 72.53 FPS
After: 80.77 FPS
v2: disable pass on non-AMD drivers
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This enables the ARB_gpu_shader5 vertex streams on softpipe.
v2: only enable when not using llvm.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This hooks up the geometry shader processing to the TGSI
support added in the previous commits.
It doesn't change the llvm interface other than to
keep things building.
v2: fix some regressions caused by primitiveoffsets
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We need indexed queries to retrieve the geom shader info.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support to retrieve the primitive counts
for each stream, along with the offset for each
primitive into the output array.
It also adds support for parsing the stream argument
to the emit and end instructions.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds space for the member to the callback, doesn't
change anything else.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When wl_drm is missing and the driver supports modifiers, use
zwp_linux_dmabuf_v1 for the list of supported formats and for buffer
creation.
Limit the supported formats to those with modifiers, which are
WL_DRM_FORMAT_{ARGB8888,XRGB8888} currently.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Add wsi_wl_display_dmabuf for zwp_linux_dmabuf_v1-related states.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Add wsi_wl_display_drm for wl_drm-related states. We will move
formats into the struct in a later commit.
Remove the unnecessary check for wl_registry_bind failures.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Refactor the swtich statement in drm_handle_format out to
wsi_wl_display_add_wl_format.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
When modifiers are specified, we have to use dmabuf rather than
wl_drm. We don't need the wrapper in that case.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
XQueryExtension merely tells you whether the extension exists, it
doesn't tell you whether you're local enough for it to work.
XShmQueryVersion is not enough to discover this either, you need to
provoke the server to do actual work, and if it thinks you're remote it
will throw BadRequest at you. So send an invalid ShmDetach and use the
error code to distinguish local from remote.
[airlied: fixed bug not resetting xshm_error to 0 on success,
which made later stuff fail completely.]
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Consider the following search expression and NIR sequence:
('iadd', ('imul', a, b), b)
ssa_2 = imul ssa_0, ssa_1
ssa_3 = iadd ssa_2, ssa_0
The current algorithm is greedy and, the moment the imul finds a match,
it commits those variable names and returns success. In the above
example, it maps a -> ssa_0 and b -> ssa_1. When we then try to match
the iadd, it sees that ssa_0 is not b and fails to match. The iadd
match will attempt to flip itself and try again (which won't work) but
it cannot ask the imul to try a flipped match.
This commit instead counts the number of commutative ops in each
expression and assigns an index to each. It then does a loop and loops
over the full combinatorial matrix of commutative operations. In order
to keep things sane, we limit it to at most 4 commutative operations (16
combinations). There is only one optimization in opt_algebraic that
goes over this limit and it's the bitfieldReverse detection for some UE4
demo.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15310125 -> 15302469 (-0.05%)
instructions in affected programs: 1797123 -> 1789467 (-0.43%)
helped: 6751
HURT: 2264
total cycles in shared programs: 357346617 -> 357202526 (-0.04%)
cycles in affected programs: 15931005 -> 15786914 (-0.90%)
helped: 6024
HURT: 3436
total loops in shared programs: 4360 -> 4360 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 23675 -> 23666 (-0.04%)
spills in affected programs: 235 -> 226 (-3.83%)
helped: 5
HURT: 1
total fills in shared programs: 32040 -> 32032 (-0.02%)
fills in affected programs: 190 -> 182 (-4.21%)
helped: 6
HURT: 2
LOST: 18
GAINED: 5
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
This revision allows for images to be :
- created by reusing image parameters from swapchain
- bound to memory from a swapchain
v2: Add color attachment flag
Use same implicit WSI parameters (tiling, samples, usage)
v3: Fix missing break in vk_foreach_struct_const() switch (Lionel)
v4: Fix accessing image aspects before android resolve (Tapani)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This is an array of 1, so [0] is the only content, and meson already
flattens the list so this is unnecessary.
Also, all the other uses of vk_api_xml don't do that.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
This fixes the following LLVM error when using RADV_DEBUG=checkir:
Intrinsic name not mangled correctly for type arguments! Should be: llvm.amdgcn.buffer.atomic.add.i32
i32 (i32, <4 x i32>, i32, i32, i1)* @llvm.amdgcn.buffer.atomic.add
The cmpswap operation still uses the old intrinsic.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This structure was used maaaany moons ago as a placeholder for the
varying meta (now unified with mali_attr_meta and essentially fully
decoded). I don't know why it's still in the file. Let's wack it.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
The mechanics of this opcode are a little opaque, but essentially, it's
used in 8-bit mode to do a bit count in parallel of a uint and then
doing a ton of clever iadd/imov ops to recombine.
v2: Correct opcode. Thank you to jernej on IRC for noticing this awkward
typo!
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
These flags are set when reading back the tilebuffer from a fragment
shader via various mechanisms (including ARM_shader_framebuffer_fetch
and EXT_pixel_local_storage).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
The new OR pattern has been seen in the wild and can end up being
generated by GLSLang. Not sure about the other two new patterns but we
may as well throw them in for completeness. While we're here, we can
drop the '@bool' specifier from the one pattern because specifying True
already implies 1-bit which basically implies boolean.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15321227 -> 15321129 (<.01%)
instructions in affected programs: 3594 -> 3496 (-2.73%)
helped: 6
HURT: 0
total cycles in shared programs: 357481321 -> 357479725 (<.01%)
cycles in affected programs: 44109 -> 42513 (-3.62%)
helped: 6
HURT: 0
VkPipeline-DB results on Kaby Lake:
total instructions in shared programs: 3770504 -> 3769734 (-0.02%)
instructions in affected programs: 19058 -> 18288 (-4.04%)
helped: 163
HURT: 0
total cycles in shared programs: 1417583701 -> 1417569727 (<.01%)
cycles in affected programs: 750958 -> 736984 (-1.86%)
helped: 158
HURT: 1
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Now that we have one-bit booleans, we don't need to rely on looking at
parent instructions in order to figure out if a value is a Boolean most
of the time. We can drop these specifiers and now the optimizations
will apply more generally.
Shader-DB results on Kaby Lake:
total instructions in shared programs: 15321168 -> 15321227 (<.01%)
instructions in affected programs: 8836 -> 8895 (0.67%)
helped: 1
HURT: 31
total cycles in shared programs: 357481781 -> 357481321 (<.01%)
cycles in affected programs: 146524 -> 146064 (-0.31%)
helped: 22
HURT: 10
total spills in shared programs: 23675 -> 23673 (<.01%)
spills in affected programs: 11 -> 9 (-18.18%)
helped: 1
HURT: 0
total fills in shared programs: 32040 -> 32036 (-0.01%)
fills in affected programs: 27 -> 23 (-14.81%)
helped: 1
HURT: 0
No change in VkPipeline-DB
Looking at the instructions hurt, a bunch of them seem to be a case
where doing exactly the right thing in NIR ends up doing the wrong-ish
thing in the back-end because flags are dumb. In particular, there's a
case where we have a MUL followed by a CMP followed by a SEL and when we
turn that SEL into an OR, it uses the GRF result of the CMP rather than
the flag result so the CMP can't be merged with the MUL. Those shaders
appear to schedule better according to the cycle estimates so I guess
it's a win? Also it helps spilling in one Car Chase compute shader.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is a field of FLOAT_32_UNSIGNED_INT_24_8_REV texture pixel.
OpenGL spec "8.4.4.2 Special Interpretations" is saying:
"the second word contains a packed 24-bit unused field,
followed by an 8-bit index"
The spec doesn't require us to clear this unused field
however it make sense to do it to avoid some
undefined behavior in some apps.
Suggested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110305
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
If a def is used as an condition before its definition, we should also
consider this a case to repair. When repairing, make sure we rewrite
any if conditions too.
Found in while inspecting a SPIR-V conversion from a 'continue block'
that contains a conditional branch. We pull the continue block up to
the beggining of the loop, and the condition in the branch ends up
defined afterwards.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 364212f1ed "nir: Add a pass to repair SSA form"
Add memory barrier sync for multiple launch cases, and unbind completed
resources after launch.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Multiple init buffer within one open instance will cause blank issue.
Updating viewport per frame will fix this issue.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Tested-by: Bruno Milreu <bmilreu@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The current algorithm only supports packing 32-bit types.
If a shader uses both 16-bit and 32-bit varyings, we shouldn't
compact them together.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Follow the spec when selecting the magnification filter (OpenGL 4.5,
section 8.14):
If λ(x, y) is less than or equal to the constant c (see section 8.15)
the texture is said to be magnified;
While we're here also silence a potential warning about implicit float
to double conversion.
v2: Update commit message to contain a reference to the spec as pointed
out by Eric.
Fixes a number of dEQP GLES2 and GLES3 test out of:
dEQP-GLES2.functional.texture.filtering.*
dEQP-GLES2.functional.texture.vertex.2d.filtering.*
dEQP-GLES3.functional.texture.vertex.*.filtering.*
dEQP-GLES3.functional.texture.filtering.*
dEQP-GLES3.functional.texture.shadow.2d.*
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Disable aux when resource seen the first time and EXPLICIT_FLUSH
not being set. This fixes issues seen when launching Xorg and
CCS_E getting utilized.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We'll need to do a render-based blit for scissors, since the TFU (as seen
in this conditional) can only update a whole surface.
Fixes: 976ea90bdc ("v3d: Add support for using the TFU to do some blits.")
Fixes piglit fbo-scissor-blit.
4.1 and 4.2 both have the same 16k limit, but it I'm seeing GPU hangs in
the CTS at 8k and 16k. 4k at least lets us get one 4k display working.
Cc: mesa-stable@lists.freedesktop.org
I have v3d allocating enough initial allocation memory that we've been
passing tests without it, but to match kernel behavior more it would be
good to actually exercise the OOM path.
Section 7.4.1 (Shader Interface Matching) of the OpenGL 4.30 spec says:
"Variables or block members declared as structures are considered
to match in type if and only if structure members match in name,
type, qualification, and declaration order."
Fixes:
* layout-location-struct.shader_test
v2: rebased against master and small fixes
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108250
While playing with compute shaders, I was getting a random crash,
noticed that bind_state was using the old shader info for comparision,
but gallium allows the shader to be deleted while bound, so this could
lead to a use after free.
This can't happen using the cso cache. As it tracks all of this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The GL 4.5 spec says:
"If any enabled array’s buffer binding is zero when DrawArrays or
one of the other drawing commands defined in section 10.4 is called,
the result is undefined."
The result is undefined but it should not crash.
Fixes: gl-3.1-vao-broken-attrib
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
MI_PREDICATE_DATA is an intermediate storage for the MI_PREDICATE
command's calculations - it holds the result of the subtraction when
the compare operation is SRCS_EQUAL or DELTAS_EQUAL. But the actual
result of the predication is MI_PREDICATE_RESULT, which is what we
want to copy from the render context to the compute context.
We consider it acceptable, but let's still document it in case people
notice it and are not sure why it's there.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Figure it out once in the build system, then just use that all over the place.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Without that `GALLIUM_DDEBUG=always kmscube -A` would segfault like
#0 0x0000000000000000 in ()
#1 0x0000ffffa72a3c54 in dri2_get_fence_fd (_screen=0xaaaaed4f2090, _fence=0xaaaaed9ef880) at ../src/gallium/state_trackers/dri/dri_helpers.c:140
#2 0x0000ffffa8744824 in dri2_dup_native_fence_fd (drv=0xaaaaed5010c0, disp=0xaaaaed5029a0, sync=0xaaaaed9ef7c0) at ../src/egl/drivers/dri2/egl_dri2.c:3050
#3 0x0000ffffa87339b8 in eglDupNativeFenceFDANDROID (dpy=0xaaaaed5029a0, sync=0xaaaaed9ef7c0) at ../src/egl/main/eglapi.c:2107
#4 0x0000aaaabd29ca90 in ()
#5 0x0000aaaabd401000 in ()
Signed-off-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Documentation for glDrawPixels with GL_COLOR_INDEX says:
"If the GL is in color index mode, and if GL_MAP_COLOR is true,
the index is replaced with the value that it references in
lookup table GL_PIXEL_MAP_I_TO_I"
We are always in RGBA mode and there is nothing in documentation
about GL_MAP_COLOR in RGBA mode for GL_COLOR_INDEX.
Scale and bias are also only applicable for RGBA format and not
mentioned for GL_COLOR_INDEX.
Thus the behaviour will be on par with i965.
Fixes: gl-1.0-drawpixels-color-index
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
iris_upload_border_color is passed a pointer which points to
variable that is introduced in a different scope.
CID: 1444296
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should lower transient memory usage and improve performance
slightly (due to less memory to malloc/free, better cache locality,
etc).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This fixes a regression uploading partial tiled textures introduced
sometime during the cubemap series.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This patch implements system values via specially-crafted uniforms.
While we previously had an ad hoc system for passing the viewport into
the vertex shader, this commit generalizes the system to allow for
arbitrary system values to be added to both shader stages. While we're
at it, we clean up uniform handling code (which was considerably muddied
to handle the ad hoc viewport uniform).
This commit serves as both a cleanup of the existing codebase and the
precursor to new functionality, like implementing textureSize().
Concurrent with these changes is respecting the depth transform, which
was not possible with the old fixed uniform system and here serves as a
proof-of-correctness test (as well as justifying the NIR changes).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
While a partial set of viewport system values exist, these are scalar
values, which is a poor fit for viewport transformations on vector ISAs
like Midgard (where the vec3 values for scale and offset each need to be
coherent in a vec4 uniform slot to take advantage of vectorized
transform math). This patch adds vec3 scale/offset fields corresponding
to the 3D Gallium viewport / glViewport+depth
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
For texture write-transfers, we either free them on the transfer-queue
or right away. But for read-transfers, we currently only destroy them in
case they used a temp-resource. This leads to occasional resource-leaks.
Let's add a call to virgl_resource_destroy_transfer in the missing case.
Do the same thing for buffers as well, but the logic is a bit easier to
follow there.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: f0e71b1088 ("virgl: use transfer queue")
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Previously, there was minimal support for interoperating with legacy
kernels (reusing kernel modules originally designed for proprietary
legacy userspaces, rather than for upstream-friendly free software
stacks). Now that the Panfrost kernel is stabilising, this commit drops
the legacy code path.
Panfrost users need to use a modern, mainline kernel supporting the
Panfrost kernel driver from this commit forward.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Trying to construct a scanout capable buffer will only ever work when
when we are on top of a KMS winsys, as the render node isn't capable
of allocating contiguous buffers.
Tested-by: Marius Vlad <marius.vlad@collabora.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
When setting up a transfer to a resource, all contexts where the resource
is pending must be flushed. Otherwise a write transfer might be started
in the current context before all contexts that access the resource in
shared (read) mode have been executed.
Fixes: 64813541d5 (etnaviv: fix resource usage tracking across
different pipe_context's)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Tested-By: Guido Günther <agx@sigxcpu.org>
The context is self synchronizing at the GPU side, as commands are
executed in order. We must not flush our own context when updating the
resource use, as that leads to excessive flushing on effectively every
draw call, causing huge CPU overhead.
Fixes: 64813541d5 (etnaviv: fix resource usage tracking across
different pipe_context's)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
If we increase vector sizing later it would be nice to avoid
tripped over this again.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
If we increase the vector size in the future it would be good
to not have to fix these up, this should change nothing at present.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This fd was create in virgl_drm_screen_create and should be closed
in virgl_drm_screen_destroy.
Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Since we are now properly storing the clear color with SCS bits, we can
now enable fast clears on gen8 too.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We want to skip some types of aux usages (for instance,
ISL_AUX_USAGE_HIZ when the hardware doesn't support it, or when we have
multisampling) when sampling from the surface.
Instead of checking for those cases while filling the surface state and
leaving it blank, let's have a version of aux.possible_usages for
sampling. This way we can also avoid allocating surface state for the
cases we don't use.
Fixes: a8b5ea8ef0 "iris: Add function to update clear color in surface state."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since we are not using it for the clear color, there's no need to
allocate it.
Fixes: a8b5ea8ef0 "iris: Add function to update clear color in surface state."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At the fast clear time, the only swizzle we have available is actually
the identity swizzle (which we use for most rendering). So the call to
swizzle_color_value() becomes simply a no-op, and doesn't properly zero
out the unused channels.
We have to manually override those channels.
Fixes: a8b5ea8ef0 "iris: Add function to update clear color in surface state."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The swizzle for rendering surfaces is always identity. So when we are
doing the fast clear, we don't have enough information to store the
clear color OR'ed with the Shader Channel Select bits for the dword in
the SURFACE_STATE.
Instead of trying to patch up the SURFACE_STATE correctly later, by
reading the color from the clear color state buffer and then doing all
the operations to store it, let's just re-emit the whole SURFACE_STATE.
That should make things way simpler on gen8, and we can still use the
clear color state buffer for gen9+.
Fixes: a8b5ea8ef0 "iris: Add function to update clear color in surface state."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Newer gens can read it directly.
Also properly skip updating the ISL_AUX_USAGE_NONE surface.
Fixes: a8b5ea8ef0 "iris: Add function to update clear color in surface state."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There shouldn't be a difference for users, but this way we do manage
all of our containers from freedesktop.org
note: compared to the provious Dockerfile, we need to manually
add gcc, g++ and python*-wheel
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Since with gles hosts we lie about the GLSL feature level it is better
to set the number of streams based on actual hosts capabilities.
v2: Make use of feature check level to avoid regressions.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
This enables the following piglits with PASS:
nv_shader_atomic_float/execution/
shared-atomicadd-float
shared-atomicexchange-float
ssbo-atomicadd-float
ssbo-atomicexchange-float
v2: Minimize the patch by using type punning (Eric Anholt)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Using RGTC, ETC1, ETC2 or S3TC for 3D-textures isn't alowed by any of
OpenGL 4.6, OpenGL ES 3.2, ARB_texture_compression_rgtc,
EXT_texture_compression_rgtc, OES_compressed_ETC1_RGB8_texture,
S3_s3tc or EXT_texture_compression_s3tc specifications.
So let's not allow any of those compressed 3d-textures at all. It's not
going to work once it hits the OpenGL driver in virglrenderer.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
We were caching only the value set with glXSwapIntervalSGI(), missing out
on the default setting of the swap interval by the loader. This fixes
glxgears's warning about being vblank synchronized by default.
Fixes: 9777c4234b ("loader: drop the [gs]et_swap_interval callbacks")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Not all hardware is made equal and some does not have the full
complement of 48b of address space. Ask what the actual size of virtual
address space allocated for contexts, and bail if that is not enough to
satisfy our static partitioning needs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
src/gallium/drivers/radeonsi/si_state_viewport.c:196: si_emit_guardband:
Assertion `vp_as_scissor.maxx <= max_viewport_size[vp_as_scissor.quant_mode]
&& vp_as_scissor.maxy <= max_viewport_size[vp_as_scissor.quant_mode]' failed.
The comparison was unsigned, so negative maxx or maxy would fail.
Fixes: 3c540e0a74 "radeonsi: Fix guardband computation for large render targets"
This updates allows an MI_LRI to trigger a OA report write in the
global OA buffer. This isn't really useful for us, we just keep close
to the internal public configs.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
These are new metrics for Gen8/9 to measure the effect of the PMA
stall workaround fix.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This unifies some of the programming between pre-production stepping
and production ones.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
- Implements RGB565/RGBA5551 formats
- Don't advertise support for flipped RGBA5551 and ETC
Fixes remaining tests in dEQP-GLES2.functional.texture.format.* which is
now at 36/36.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
transfer_unmap now tiles for any tiled resource, not just TEXTURE_2D,
which should more than just cubemaps!
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Add support for load_barycentric_pixel, load_interpolated_input, and
friends. For now, this retains support for old-style inputs, which can
probably be dropped with some ttn work.
Prep work for sample-shading support.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Originally we kept track of a table of inputs. But with new-style frag
inputs this becomes awkward. Re-work it so that initially we assigned
un-packed varying locations, and then after the shader is compiled scan
to find actual used inputs, and re-pack.
Signed-off-by: Rob Clark <robdclark@gmail.com>
CXXLD libxatracker.la
/usr/bin/ld: ../../../../src/gallium/auxiliary/.libs/libgallium.a(tgsi_to_nir.o): in function `ttn_finalize_nir':
src/gallium/auxiliary/nir/tgsi_to_nir.c:2111: undefined reference to `gl_nir_lower_samplers_as_deref'
/usr/bin/ld: src/gallium/auxiliary/nir/tgsi_to_nir.c:2113: undefined reference to `gl_nir_lower_samplers'
Fixes: 9a834447d6 ("tgsi_to_nir: Produce optimized NIR for a given pipe_screen.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109929
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
This prevents getting mixed-up results if a multi-threaded app has two
validation errors in different threads.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
__builtin_types_compatible_p() is GCC-specific and breaks the
MSVC build.
This intrinsic has been in u_vector_foreach() for a long time, but
that macro has only recently been used in code
(nir/nir_opt_comparison_pre.c) that's built with MSVC.
Fixes: 2cf59861a ("nir: Add partial redundancy elimination for compares")
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Removed a few unused variables and iris_getparam_boolean().
Kept 'name' around since there's a commented debug that make use of it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From GLVND author:
> From a functional standpoint, exporting additional symbols doesn't
> really matter, since libglvnd will load the vendor libraries with
> RTLD_LOCAL.
Suggested-by: Kyle Brenneman <kbrenneman@nvidia.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Kyle Brenneman <kbrenneman@nvidia.com>
This reverts commit 29132af234.
It seems the new intrinsic causes a hang on radeonsi (VEGA) when running the
piglit test:
tests/spec/arb_shader_storage_buffer_object/execution/ssbo-atomicCompSwap-int.shader_test
This fixes the following piglit with RadeonSI
tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-frexp-dvec4.shader_test
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
When we add new feature checks on the host side that is used to
enable a cap conditionally that was enabled unconditionally before
we might end up with a feature regression when a new mesa version
is used with an old virglrenderer version that doesn't check for
that cap.
To work around this problem add a version id to the caps that corresponds
to the features that are actually checked on the host and check that
version too when enabling the cap.
Fixes: 2ee197d6e8
virgl: Enable mixed color FBO attachemnets only when the host supports it
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Pohsien Wang <pwang@chromium.org>
Especially when performing a transtion from UNDEFINED->GENERAL,
the driver shouldn't initialize HTILE metadata in compressed
state because it doesn't decompress when the src layout is
GENERAL.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110259
Fixes: 3a2e93147f ("radv: always initialize HTILE when the src layout is UNDEFINED")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This pass attempts to dectect code sequences like
if (x < y) {
z = y - x;
...
}
and replace them with sequences like
t = x - y;
if (t < 0) {
z = -t;
...
}
On architectures where the subtract can generate the flags used by the
if-statement, this saves an instruction. It's also possible that moving
an instruction out of the if-statement will allow
nir_opt_peephole_select to convert the whole thing to a bcsel.
Currently only floating point compares and adds are supported. Adding
support for integer will be a challenge due to integer overflow. There
are a couple possible solutions, but they may not apply to all
architectures.
v2: Fix a typo in the commit message and a couple typos in comments.
Fix possible NULL pointer deref from result of push_block(). Add
missing (-A + B) case. Suggested by Caio.
v3: Fix is_not_const_zero to work correctly with types other than
nir_type_float32. Suggested by Ken.
v4: Add some comments explaining how this works. Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Move bug fix in get_neg_instr from the next patch to this patch
(where it was intended to be in the first place). Noticed by Caio.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No shader-db changes on any Intel platform.
v2: Use a loop to generate patterns. Suggested by Jason.
v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8. This broke ~180 tests. This
bug was introduced in v2.
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
No shader-db changes on any Intel platform.
v2: Use a loop to generate patterns. Suggested by Jason.
v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8. This broke ~180 tests. This
bug was introduced in v2.
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
llvm/spir-v spits out some struct a { struct b {} }, but it
doesn't deref, it casts (struct a) to (struct b), reconstruct
struct derefs instead of casts for these.
v2: use ssa_def_rewrite uses, rework the type restrictions (Jason)
v3: squish more stuff into one function, drop unused temp (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is no longer true since PIPE_CAP_PACKED_UNIFORMS was enabled.
Fixes: 3c8779af32 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS
Signed-off-by: Rob Clark <robdclark@gmail.com>
Not sure why new-style frag inputs start triggering this. But we
probably shouldn't consider src's from other blocks.
Signed-off-by: Rob Clark <robdclark@gmail.com>
For depth and stencil blits, we always want the main mask to be Z, and
the secondary pass mask to be S. If asked to blit Z+S to S, we should
handle the blit in the second pass which properly gets the stencil
resources.
Before, we were trying to handle S as the main mask, and accidentally
blitting a Z source to a S destination, which doesn't work out well.
Fixes Piglit's "framebuffer-blit-levels {draw,read} stencil" tests.
Fixes assertion failures in Piglit's "framebuffer-blit-levels
{draw,read} stencil" tests on iris. Also fixes assert failures in
frameretrace, which tries to ReadPixels the stencil values (only)
from a Z24S8 depth/stencil attachment.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
This instruction needs a workaround when used from vertex shaders.
Fixes:
dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler2dshadow_vertex
dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler3d_fixed_vertex
dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler3d_float_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler2dshadow_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler3d_fixed_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler3d_float_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler2dshadow_vertex
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
emit_cat5() needs to check if the last optional reg is there before it
accesses it.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
There are multiple `goto path_fail` with an open fd, but none that go to
`fail:` without going through `path_fail:` first, so let's just move the
`close(fd)` there.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
I don't think we should update metadata when conditional rendering
is enabled. For some reasons, some CTS breaks only on SI.
This fixes the following CTS on SI:
dEQP-VK.conditional_rendering.draw_clear.clear.depth.*
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
i965 does this, and st's tgsi path does this. st/nir did not.
Cuts 138MB of memory from a DiRT Rally trace, which is about 44%
of the total GLSL IR memory.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
I neglected to fill out this driver function, causing us to advertise
0 modifiers. Now we advertise the various tilings and let the driver
pick them. I've verified that X tiling works with Weston (by hacking
the list to skip Y tiling).
Y+CCS doesn't work yet because it's multiplane and the Gallium dri
state tracker isn't really prepared for that. Leave it off for now.
v2: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
- fix missing type
- fix *_FQM_*/*_QM_* commands
- shorten some media structs using groups
- factor out memory attributes
- switch MI_FLUSH_DW fields to bool
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
- fix missing type
- fix *_FQM_*/*_QM_* commands
- shorten some media structs using groups
- factor out memory attributes
- switch MI_FLUSH_DW fields to bool
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
- fix missing type
- fix *_FQM_*/*_QM_* commands
- shorten some media structs using groups
- factor out memory attributes
- switch MI_FLUSH_DW fields to bool
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
headers
v2: Fixed the check for engine
v3: Changed engine into an argument given to the scripts
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The code to handle image unit indirect was missing
Fixes piglit tests/spec/arb_arrays_of_arrays/execution/image_store/basic-imageStore-mixed-const-non-const-uniform-index.shader_test
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This fixes the vertex id fetch in the non-llvm drawing paths.
This vertex id in elt mode comes from the elts not just a linear
value.
Note we don't bad basevertex in the elts case as it's already included
in the elts by the looks of it (at least tests fail if I add it)
Fixes piglit end-primitive tests and some others.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We have a rather big constant file and it seems that the best way to
use it is to upload all UBOs and lower UBO access the load_uniform.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
This commit turns on the gallium cap and adds a pass to lower the
load_ubo intrinsics for block 0 back to load_uniform intrinsics and
adjust the backend where the cap switches units from vec4s to dwords.
As we stop using ir3_glsl_type_size() for uniform layout, this also
corrects an issue where we would allocate a vec4 slot for samplers in
uniforms, fixing:
dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_fragment
dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_vertex
dEQP-GLES3.functional.shaders.struct.uniform.sampler_nested_fragment
dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex
dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
We don't need to determine the number of uniform slots here, it's
already available as prog->Parameters->NumParameterValues. The way we
previously determined the number of slots was also broken for
PackedDriverUniformStorage, where we would add loc (in dwords) and
type_size() (in vec4s).
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
kms_swrast can work with primary nodes out of the box, but also
with rendernodes if the build environment specifies the
EGL_FORCE_RENDERNODE flag.
Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Load the kms_swrast driver when specified.
Doesn't work with drm_gralloc.
v2: remove unneeded line (@eric)
v3: Remove swrast_loader_extensions (@evelikov)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This way, we can use primary nodes with kms_swrast too.
Also fix up some whitespace issues.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This reverts commit 4e1bbb000c. It turns
out that some DXVK apps due to some implementation detail of DXVK or
other create and destroy instances in an interleaved way. Freeing the
glsl_type memory without being a bit more careful causes use-after-free
issues. Looks like we need to try again.
When the host is running on softpipe/llvmpipe the maximum number of
samples for multisampling is 1. GL 3.0 requires at least 4 samples, and
softpipe/llvmpipe get around this by enabling PIPE_CAP_FAKE_SW_MSAA.
This patch mimics softpipe/llvmpipe behavior in virgl by enabling the
same PIPE_CAP_FAKE_SW_MSAA workaround when the max sample count reported
by the host is 1. This change allows virgl on a softpipe/llvmpipe host
to advertise support for GL 3.0 and beyond.
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Up to twice, for a total of 3 attempts maximum.
This will hopefully avoid spurious CI pipeline failures due to
intermittent GitLab/docker infrastructure issues.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The containers-build stage job doesn't use the cache, so this might save
some wasted time for it.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This will allow us to make use of the selection control support in
spirv and the GL support provided by EXT_control_flow_attributes.
Note this only supports if-statements as we dont support switches
in NIR.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841
This patch refactors a substantial amount of code in preparation for
mipmaps. In particular, we know have a correct slice abstraction based
on offsets; cpu/gpu are no longer arbitrary pointers. We additionally
shuffle around other code to accompany these changes and cleanup how
tiled textures are handled, while drawing some attention to the blit
code.
Mipmaps are still disabled at this point, as autogeneration is not yet
implemented; enabling as-is would cause regressions.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
In fact, the native "fpow" instruction only does half of it; more work
is needed for the actual instruction. For now, just lower.
Fixes: 1ea42894c ("panfrost/midgard: Implement fpow")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Although this is not functional (and the command stream side is not
aiming for ES3 right now), this is enough to run dEQP-GLES3 shader
tests with the version override directive; this is useful, as some ES3
shader feature can occur in ES2 class shaders due to lowering.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
On Midgard, float ops support standard source modifiers (abs/neg) and
destination modifiers (sat/pos/round). Integer ops do not support these,
however. To cope, we use native NIR source modifiers for floats, but
lower them away to iabs/ineg for integers, implementing those ops
simultaneously to avoid regressions.
Fixes the integer tests in
dEQP-GLES2.functional.shaders.operator.unary_operator.minus.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Some of these are not yet fully functional due to related bugs, but this
the correct op mapping. The native ball/bany opcodes act on vec4's
unconditionally. That said, both ball and bany have the nice property
that duplicating an argument does not affect their output, so the
default "hanging swizzles" allow us to implement 2/3-component opcodes
correctly, implicitly lowering.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Whereas a normal fcsel acts on a boolean input in r31.w, the fcsel_i
variant acts on an integer input in r31.w, which can be preloaded with
an instruction like imov (with the appropriate negate flag on the
source).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This preliminary implementation should handle some basic cases. Future
work should scissor the FRAGMENT job as well for efficiency.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Our viewport code hardcoded a number of wrong assumptions, which sort of
sometimes worked but was definitely wrong (and broke most of dEQP). This
corrects the logic, accounting for flipped-Y framebuffers, which
fixes... most of dEQP.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This gets the basevertex from the draw depending on whether
it's an indexed or non-indexed draw.
We still fail a transform feedback test for vertex id, as
the vertex id actually an index id, and isn't getting translated
properly to a vertex id, suggestions on how/where to fix that welcome.
Reviewed-by: Brian Paul <brianp@vmware.com>
This field was added in a recent addrlib update, and while there
currently seems to be no issue with skipping it, we will have to
set it correctly in the future.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Most cat5 instructions are constructed using ir3_SAM, which uses
regs[1] for the (sampler, tex) src. Not DSX/DSY though, so we look up
src1 and src2 differently for those two.
Fixes: 1dffb089 ("freedreno/ir3: fix sam.s2en encoding")
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
In 1088b788 ("freedreno/ir3: find # of samplers from uniform vars") we
started counting number of samplers based on the uniform vars instead
of number of cat5 instructions. We used the number of samplers to
determine whether to enable derivatives, but when we only use
derivatives and no samplers, that now breaks. Track whether we need
derivatives explicitly and use that to enable the state.
Fixes: 1088b788 ("freedreno/ir3: find # of samplers from uniform vars")
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
We will need them for a new ACCESS_NON_UNIFORM flag that's about to be
added in the next commit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
On Intel, we have both bindless and bindful and we'd like to use them at
the same time if we can so we need to be able to distinguish at the NIR
level between the two. This also fixes nir_lower_tex to properly handle
bindless in its tex_texture_size and get_texture_lod helpers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fix the order of src0_alpha and sample mask in fb payload.
From SKL PRM Volume 7, "Data Payload Register Order
for Render Target Write Messages":
Type S0A oM sZ oS M2 M3 M4
SIMD8 1 1 0 0 s0A oM R
SIMD16 1 1 0 0 1/0s0A 3/2s0A oM
It also fixes working of alpha to coverage with sample mask
on GEN6 since now they are in correct order.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
From "Alpha Coverage" section of SKL PRM Volume 7:
"If Pixel Shader outputs oMask, AlphaToCoverage is disabled in
hardware, regardless of the state setting for this feature."
From OpenGL spec 4.6, "15.2 Shader Execution":
"The built-in integer array gl_SampleMask can be used to change
the sample coverage for a fragment from within the shader."
From OpenGL spec 4.6, "17.3.1 Alpha To Coverage":
"If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value
is generated where each bit is determined by the alpha value at the
corresponding sample location. The temporary coverage value is then
ANDed with the fragment coverage value to generate a new fragment
coverage value."
Similar wording could be found in Vulkan spec 1.1.100
"25.6. Multisample Coverage"
Thus we need to compute alpha to coverage dithering manually in shader
and replace sample mask store with the bitwise-AND of sample mask and
alpha to coverage dithering.
The following formula is used to compute final sample mask:
m = int(16.0 * clamp(src0_alpha, 0.0, 1.0))
dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) |
0x0808 * (m & 2) | 0x0100 * (m & 1)
sample_mask = sample_mask & dither_mask
Credits to Francisco Jerez <currojerez@riseup.net> for creating it.
It gives a number of ones proportional to the alpha for 2, 4, 8 or 16
least significant bits of the result.
GEN6 hardware does not have issue with simultaneous usage of sample mask
and alpha to coverage however due to the wrong sending order of oMask
and src0_alpha it is still affected by it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
If the geom shader emits a point size we failed to find it here,
use the correct API to look it up.
Fixes:
tests/spec/glsl-1.50/execution/geometry/point-size-out.shader_test
Reviewed-by: Brian Paul <brianp@vmware.com>
With indirect rendering it's fine to set the instance count
parameter to 0, and expect the rendering to be ignored.
Fixes assert in KHR-GLES31.core.compute_shader.pipeline-gen-draw-commands
on softpipe
v2: return earlier before changing fpstate
Reviewed-by: Brian Paul <brianp@vmware.com>
The wait here is unnecessary since we got a pool of back buffers,
and the wait for swap buffer will happen before the present pixmap,
at the same time the previous back buffer will be put back to pool
for reuse after the check for PresentIdleNotify event
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
v2 (Topi):
- Make bit-size handling order be 16-bit, 32-bit, 64-bit
- Clamp lower exponent range at -28 instead of -30.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When we destroy a context, we need to temporarily make that context
the current one for the thread.
That's because during context tear-down we make many calls to
_mesa_reference_texobj(&texObj, NULL). Note there's no context
parameter. If the texture's refcount goes to zero and we need to
delete it, we use the thread's current context. But if that context
isn't the context we're tearing down, we get into trouble when
deallocating sampler views. See patch 593e36f956 ("st/mesa:
implement "zombie" sampler views (v2)") for background information.
Also, we need to release any sampler views attached to the fallback
textures.
Fixes a crash on exit with a glretrace of the Nobel Clinician
application.
v2: at end of st_destroy_context(), check if save_ctx == ctx and
unbind the context if so.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
In Android O, MESA needs to statically link libexpat so that
it's in same VNDK namespace.
v2: apply change also to anv driver (Tapani)
v3: use += in anv change (Eric Engestrom)
Change-Id: I82b0be5c817c21e734dfdf5bfb6a9aa1d414ab33
Signed-off-by: Kishore Kadiyala <kishore.kadiyala@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
If VK_QUERY_RESULT_WITH_AVAILABILY_BIT is set and
VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are both not
set, we need return to VK_NOT_READY only and set the availability
status field for each query.
From Vulkan spec:
"If VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are both
not set then no result values are written to pData for queries that are
in the unavailable state at the time of the call, and
vkGetQueryPoolResults returns VK_NOT_READY. However, availability state
is still written to pData for those queries if
VK_QUERY_RESULT_WITH_AVAILABILITY_BIT is set."
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
If the query is not available and VK_QUERY_RESULT_WAIT_BIT and
VK_QUERY_RESULT_PARTIAL_BIT are both not set, the spec doesn't
allow to modify its result.
From Vulkan spec:
"If VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are
both not set then no result values are written to pData for queries
that are in the unavailable state at the time of the call, and
vkGetQueryPoolResults returns VK_NOT_READY. However, availability state
is still written to pData for those queries
if VK_QUERY_RESULT_WITH_AVAILABILITY_BIT is set."
v2:
- Move VK_NOT_READY change to next patch (Samuel Pitoiset)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
With vkpipelinedb Samuel discovered a regression since we stopped
stripping types at the spir-v level.
This adds a check to the var splitting for the case where it
asserts the type hasn't changed, when it has just created a bare
type, and it's different than the original type which has an explicit
stride.
This also removes a pointless assert that also triggers.
Fixes: 3b3653c4cf (nir/spirv: don't use bare types, remove assert in split vars for testing)
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Also handle GLSL_TYPE_INTERFACE the same way we do GLSL_TYPE_STRUCT in
various places. Motivated by ARB_gl_spirv work, that will take
advantage of the interface types when handling NIR coming from SPIR-V.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Also updates gl_spirv to pick the right one. At the moment nothing
uses it, but upcoming functionality part of ARB_gl_spirv will use it,
and we also later can be more assertful when handling certain features
for each of the execution environments.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
The CTS requires a 565-no-depth-no-stencil (meaning d/s not-required, not
not-present) config for ES 3.0, but at depth 24 of X11 we wouldn't do so.
We can satisfy that bad requirement using a pbuffer-only visual with
whatever other buffers the driver happens to have given us.
I've tried to raise this as an absurd requirement with Khronos and made no
progress.
v2: Make sure it's single sample, no depth, no stencil. Comment typo fix
Reviewed-by: Adam Jackson <ajax@redhat.com>
Report 320 for a6xx, which isn't *quite* true (no geom/tess, in
particular), but other caps keep the reported GL and GLSL versions
correct (3.1 / 3.10 es). But reporting 320 will switch on
EXT_gpu_shader5, which is the goal.
Signed-off-by: Rob Clark <robdclark@gmail.com>
For GLES2+ contexts, enable EXT_gpu_shader5 if the driver exposes a
sufficiently high ESSL feature level, even if the GLSL feature level
isn't high enough.
This allows drivers to support EXT_gpu_shader5 in GLES contexts before
they support all the additional features of ARB_gpu_shader5 in GL
contexts.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Adds a new cap to allow drivers to expose higher shading language
versions in GLES contexts, to avoid having to report an artificially
low version for the benefit of GL contexts.
The motivation is to expose EXT_gpu_shader5 even though a driver may
not support all the features needed for the corresponding GL extension
(ARB_gpu_shader5).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fix build error after llvm-9.0svn r352827 ("[opaque pointer types] Add a
FunctionCallee wrapper type, and use it.").
In file included from ./rasterizer/jitter/builder.h:158:0,
from swr_shader.cpp:35:
./rasterizer/jitter/gen_builder_meta.hpp: In member function ‘llvm::Value* SwrJit::Builder::VGATHERPD(llvm::Value*, llvm::Value*, llvm::Value*, llvm::Value*, llvm::Value*, const llvm:
:Twine&)’:
./rasterizer/jitter/gen_builder_meta.hpp:51:117: error: no matching function for call to ‘cast(llvm::FunctionCallee)’
Function* pFunc = cast<Function>(JM()->mpCurrentModule->getOrInsertFunction("meta.intrinsic.VGATHERPD", pFuncTy));
^
Suggested-by: Philip Meulengracht <the_meulengracht@hotmail.com>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alok Hota <alok.hota@intel.com>
The previous patch tried to address a bug when DESTDIR is '', however,
it introduces a bug when DESTDIR is not '', and fakeroot is used. This
patch does fix that, and has been tested with the arch pkg-build to
ensure it isn't regressed.
Fixes: 093a1ade4e24b7dd701a093d30a71efd669fe9c8
("bin/install_megadrivers.py: Correctly handle DESTDIR=''")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110221
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
This lowering isn't needed for RADV because AMDGCN has two
instructions. It will be disabled for RADV in an upcoming series.
While we are at it, factorize a little bit.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fix this build error with GCC 4.4.7.
CC nir/nir_opt_copy_prop_vars.lo
nir/nir_opt_copy_prop_vars.c: In function ‘load_element_from_ssa_entry_value’:
nir/nir_opt_copy_prop_vars.c:454: error: unknown field ‘ssa’ specified in initializer
nir/nir_opt_copy_prop_vars.c:455: error: unknown field ‘def’ specified in initializer
nir/nir_opt_copy_prop_vars.c:456: error: unknown field ‘component’ specified in initializer
nir/nir_opt_copy_prop_vars.c:456: error: extra brace group at end of initializer
nir/nir_opt_copy_prop_vars.c:456: error: (near initialization for ‘(anonymous).<anonymous>’)
nir/nir_opt_copy_prop_vars.c:456: warning: excess elements in union initializer
nir/nir_opt_copy_prop_vars.c:456: warning: (near initialization for ‘(anonymous).<anonymous>’)
Fixes: 96c32d7776 ("nir/copy_prop_vars: handle load/store of vector elements")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109810
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Invoking VALGRIND_CHECK_MEM_IS_DEFINED pulls in enough code to convince
gcc to not inline __gen_uint and results in a lot of packing code ending
up out-of-line with lots of stack copying. To ameliorate this, only
insert the check inside the packer if DEBUG is defined and instead
perform the validation checking before submitting the batch to the
kernel. This should give accurate results if --trace-origins=yes is
used, and failing that we can recompile in full debug mode to check on
insertion.
Improve drawoverhead baseline by 25% with a default build with
valgrind-dev installed (with effectively no loss of vg coverage).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes:
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_stencil
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_stencil_fbo
Signed-off-by: Rob Clark <robdclark@gmail.com>
There are other cases where we need to disable early-z, like image
writes. So rename to something more generic.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This is required by the validation layers if we want to validate the
commands inserted by the overlay layer.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
'invariant' qualifier is propagated on variables which are used
to calculate other invariant variables, however when we are matching
variable's declarations we should take into account only explicitly
declared invariance because invariance propagation is an implementation
specific detail.
Thus new flag is added to ir_variable_data which indicates 'invariant'
qualifier being explicitly set in the shader.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100316
Fixes: 89b60492 ('glsl: Add a pass to propagate the "invariant" and
"precise" qualifiers')
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The swizzling was putting float one in not integer 1.
This fixes a lot of arb_texture_view-rendering-formats cases.
Reviewed-by: Brian Paul <brianp@vmware.com>
I don't think this really buys us anything and TG4 with cubemap arrays
falls over because sampler == 2, but otherwise works fine.
Fixes:
./bin/textureGather fs shadow r CubeArray repeat
on softpipe with ARB_gpu_shader5 enabled.
Reviewed-by: Brian Paul <brianp@vmware.com>
These didn't deal with the width == 32 case that TGSI is defined with.
Fixes piglit tests if ARB_gpu_shader5 is enabled.
Reviewed-by: Brian Paul <brianp@vmware.com>
Rather than skipping code that looked like this:
loop {
...
if (cond) {
do_work_1();
continue;
} else {
break;
}
do_work_2();
}
Previously we would turn this into:
loop {
...
if (cond) {
do_work_1();
continue;
} else {
do_work_2();
break;
}
}
This was clearly wrong. This change checks for this case and makes
sure we now leave it for nir_opt_dead_cf() to clean up.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The idea was that we could skip uploading the constant-indexed uniform
data and just upload the uniforms that are variably-indexed. However,
since the VS bin and render shaders may have a different set of uniforms
used, this meant that we had to upload the UBO for each of them. The
first case is generally a fairly small impact (usually the uniform array
is the most space, other than a couple of FSes in shader-db), while the
second is a larger impact: 3DMMES2 was uploading 38k/frame of uniforms
instead of 18k.
Given that the optimization is of dubious value, has a big downside, and
is quite a bit of code, just drop it. No change in shader-db. No change
on 3DMMES2 (n=15).
We'd end up with the constant offset in the uniform stream anyway, since
they're bigger than small immediates. Avoids the extra uniforms and adds
in the shader in favor of just adding once on the CPU.
shader-db:
total instructions in shared programs: 6496865 -> 6494851 (-0.03%)
total uniforms in shared programs: 2119511 -> 2117243 (-0.11%)
I want to reuse this for encoding small constant UBO/SSBO offsets into the
uniform stream to reduce the extra uniform loads and adds for the small
constant offsets.
When building the Chrome OS Android container, we need to build copies
of mesa that don't conflict with the Android system-supplied libraries.
This adds options to create suffixed versions of EGL and GLES libraries:
libEGL.so -> libEGL${egl-lib-suffix}.so
libGLESv1_CM.so -> libGLESv1_CM${gles-lib-suffix}.so
libGLESv2.so -> libGLES${gles-lib-suffix}.so
This is similar to what happens when --enable-libglvnd is specified, but
without the side effects of linking against libglvnd. To avoid
unexpected clashes with the suffixed appended by libglvnd, make it an
error to specify both --enable-libglvnd and --with-egl-lib-suffix.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
In some cases, we can end up with varying structs that aren't split to
their member variables. nir_compact_varyings attempted to record these
as unmovable, so it would leave them be. Unfortunately, it didn't do
it right for non-vector/scalar types. It set the mask to:
((1 << (elements * dmul)) - 1) << var->data.location_frac
where elements is the number of vector elements. For structures and
other non-vector/scalars, elements is 0...so the whole mask became 0.
This caused nir_compact_varyings to assign other varyings on top of
the structure varying's location (as it appeared to take up no space).
To combat this, we just set elements to 4 for non-vector/scalar types,
so that the entire slot gets marked as unmovable.
Fixes KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_in on iris.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment
and similar things with multiple UBOs
Signed-off-by: Rob Clark <robdclark@gmail.com>
Seems like it can only work 16b at a time. Fixes
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitcount.*
TODO need to check if this limitation applies to a3xx as well.
Signed-off-by: Rob Clark <robdclark@gmail.com>
For some things that show up when we expose higher glsl
TODO check blob traces to see if we have instructions for some of this?
I guess we don't but worth a check..
Signed-off-by: Rob Clark <robdclark@gmail.com>
For now it uses indirect for everything. The next step is for the
ir3_cp pass to detect the case that tex and samp idx are immediate
and convert the sam instruction back to the non .s2en variant. But
doing that in a following patch so we can shake out the bugs with
.s2en more easily.
Signed-off-by: Rob Clark <robdclark@gmail.com>
When we have indirect samplers, we cannot tell the max sampler
referenced. Instead just refer to the number of sampler uniforms.
Signed-off-by: Rob Clark <robdclark@gmail.com>
On a6xx+ with half-regs conflicting with full-regs, the legalize pass
needs to set appropriate sync bits, such as (sy), on writes to full regs
that conflict with half regs, and visa-versa.
Signed-off-by: Rob Clark <robdclark@gmail.com>
On a6xx, half-regs conflict with full-regs. But we were only setting up
conflicts for the first class (ie. scalar, but not hvec2/hvec3/hvec4),
resulting in higher half-reg classes getting assigned to regs that
overwrite full-regs.
Noticed while trying to enable indirect-sampler (sam.s2en) which uses an
hvec2 argument to pass the sampler/tex index.
Signed-off-by: Rob Clark <robdclark@gmail.com>
GLC/SLC are boolean.
This fixes the following LLVM error when checkir is set:
Intrinsic has incorrect argument type!
void (i32, <4 x i32>, i32, i32, i32, i32, i32, i32, i32, i32)* @llvm.amdgcn.tbuffer.store.i32
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
This fixes the following LLVM error when ckeckir is set:
Type too small for ZExt
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
Users of this function expect alu to be a supported comparision
if the induction variable is not NULL. Since we attempt to
override the return values if the first limit is not a const, we
must make sure we are dealing with a valid comparision before
overriding the alu instruction.
Fixes an unreachable in inverse_comparison() with the game
Assasins Creed Odyssey.
Fixes: 3235a942c1 ("nir: find induction/limit vars in iand instructions")
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110216
This cuts down the job runtime from ~9.5 to ~7 minutes with my personal
runner on an 8-core Ryzen 7 1700.
While this might result in slightly higher load on shared runners, it
should be OK, since libtool doesn't use the CPU cores as effectively as
e.g. ninja does; a significant part of the CPU load tends to be in bash
processes at any time, which should be relatively light on memory.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This increases the chance of them running earlier, which can have an
impact on the total duration of the pipeline.
v2:
* Minor style fix-up to moved comment (Eric Anholt)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Fixes leaks for each glsl_type generated:
==32470== 384 bytes in 3 blocks are possibly lost in loss record 18 of 18
==32470== at 0x483880B: malloc (vg_replace_malloc.c:309)
==32470== by 0x4C43F4A: ralloc_size (ralloc.c:119)
==32470== by 0x4C44014: rzalloc_size (ralloc.c:151)
==32470== by 0x4C44258: rzalloc_array_size (ralloc.c:215)
==32470== by 0x4D38957: glsl_type::glsl_type(glsl_struct_field const*, unsigned int, char const*) (glsl_types.cpp:114)
==32470== by 0x4D3BEED: glsl_type::get_struct_instance(glsl_struct_field const*, unsigned int, char const*) (glsl_types.cpp:1146)
==32470== by 0x4D42ECC: glsl_struct_type (nir_types.cpp:501)
==32470== by 0x4CDB5A1: vtn_handle_type (spirv_to_nir.c:1269)
==32470== by 0x4CE53DD: vtn_handle_variable_or_type_instruction (spirv_to_nir.c:4018)
==32470== by 0x4CD8CFF: vtn_foreach_instruction (spirv_to_nir.c:365)
==32470== by 0x4CE5E6B: spirv_to_nir (spirv_to_nir.c:4490)
==32470== by 0x497AF10: anv_shader_compile_to_nir (anv_pipeline.c:173)
v2: move release call to vkDestroyInstance
v3: apply fix also to radv driver
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Values inside the offsets parameter of textureGatherOffsets are required to be
constants in the range of [GL_MIN_PROGRAM_TEXTURE_GATHER_OFFSET,
GL_MAX_PROGRAM_TEXTURE_GATHER_OFFSET].
As this range is never outside [-32, 31] for all existing drivers inside mesa,
we can simply store the offsets as a int8_t[4][2] array inside nir_tex_instr.
Right now only Nvidia hardware supports this in hardware, so we can turn this
on inside Nouveau for the NIR path as it is already enabled with the TGSI one.
v2: use memcpy instead of for loops
add missing bits to nir_instr_set
don't show offsets if they are all 0
v3: default offsets aren't all 0
v4: rename offsets -> tg4_offsets
rename nir_tex_instr_has_explicit_offsets -> nir_tex_instr_has_explicit_tg4_offsets
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Not sure how ptr_stride should be taken into account if at all here
v2: reorder check to avoid src walking (Jason)
v3: remove is_cast_cast checks, keep going afterwards (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For OpenCL we never want to strip the info from the types, and it makes
type comparisons easier in later stages. We might later need a nir pass to
strip this for GLSL, but so far the only regression is the assert and Jason
said removing that is fine.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
v2: Update tracked clear color when we update the surface state.
v3: Update all aux surface states when updating the clear color.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Only if the clear color/depth is changing. In those cases, it's hard to
keep track of the current clear color, and aux state of some layers,
when predication is enabled. So simplify everything by stalling on the
few cases where we would have a fast clear color change with
predication.
v2:
- fix comment (Ken)
- explicitly check for predicate state after resolving it (Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This function can be used to stall on the CPU and resolve the predicate
for the conditional render. It will convert ice->state.predicate from
IRIS_PREDICATE_STATE_USE_BIT to either IRIS_PREDICATE_STATE_RENDER or
IRIS_PREDICATE_STATE_DONT_RENDER, depending on the result of the query.
v2:
- return void (Ken)
- update the stored condition (Ken)
- simplify the code leading to resolve the predicate (Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If all the restrictions are satisfied, do a fast clear instead of
regular clear.
v2:
- add perf_debug() when we can't fast clear (Ken)
- improve comment: s/miptree/resource/ (Ken)
- use swizzle_color_value from blorp (Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It needs to be converted to a value that can be used by ISL (and our
hardware SURFACE_STATE structure).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Check and do a fast clear instead of a regular clear on depth buffers.
v3:
- remove swith with some cases that we shouldn't wory about (Ken)
- more parens into the has_hiz check (Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Take the clear depth into account when IRIS_DIRTY_DEPTH_BUFFER is marked
as dirty.
Also update the blorp surface clear color.
v2: Use a single if (zres && zres->aux.bo) (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Also store clear color in the iris_resource.
Always allocate clear color state buffer.
v2:
- Make clear_color_offset be 64 bits (Ken).
- Simplify the logic to decide when to memset the aux buffer (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Does what it says on the tin.
The per stage time is only an approximation due to linking and
the Vega merged stages.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Currently if destdir is set to '' then the resulting libdir will have
it's first character replaced by / instead of / being prepended to the
string. This was the result of ensuring that that DESTDIR wouldn't be
ignored if libdir was absolute, since the only cases that meson allows
the libdir to be absolute is if the prefix is /, this won't be a
problem.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110211
Fixes: ae3f45c11e
("bin/install_megadrivers: fix DESTDIR and -D*-path")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes dEQP-VK.binding_model.buffer_device_address.* and
dEQP-VK.ssbo.phys.layout* Vulkan CTS tests.
v2: set val->type->stride in the section below (Jason)
v3: restore val->type->type to original place (Jason)
Fixes: d0ba326f23 ("nir/spirv: support physical pointers")
CC: Karol Herbst <kherbst@redhat.com>
CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I noticed we crashed piglit arb_texture_view-rendering-formats
when run on softpipe.
This fixes the clear tiles to use the surface format not the
underlying storage format.
This fixes a bunch of srgb piglits as well.
Fixes: 396ac41fc2 (softpipe: add integer support)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
I added new barrier bits in 220c1dce1e
and made most drivers skip them. I thought nvc0 was already skipping
those but missed the else case here, which does something. So make it
explicitly skip like I did everywhere else.
Thanks to Ilia for catching this.
Fixes: 220c1dce1e gallium: Add PIPE_BARRIER_UPDATE_BUFFER and UPDATE_TEXTURE bits.
An extension reporting cache hit in the user supplied pipeline cache
as well as timing information for creating the pipelines & stages.
v2: Don't consider no cache for cache hits (Jason)
Rework duration accumulation (Jason)
v3: Fold feedback creation writing into pipeline compile functions (Jason/Lionel)
v4: Get cache hit information from anv_device_search_for_kernel() (Jason)
Only set cache hit from the whole pipeline if all stages also have that bit (Lionel)
v5: Always user_cache_hit in anv_device_search_for_kernel() (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
One line left out of the conversion to ir3 ssbo intrinsics on a6xx.
Fixes: 2e4525883f ir3/compiler: Enable lower_io_offsets pass and handle new SSBO intrinsics
Signed-off-by: Rob Clark <robdclark@gmail.com>
We initially set this lower because we didn't have SIMD32 support yet
but we've supported SIMD32 for quite some time now. We should bump it
up to the real limit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The mask should be accumulated if two calls are used for
binding two buffers at different indexes. Otherwise, the
driver only accounts for the last one.
Noticed while glancing at this code.
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The glMemoryBarrier() function makes shader memory stores ordered with
respect to things specified by the given bits. Until now, st/mesa has
ignored GL_TEXTURE_UPDATE_BARRIER_BIT and GL_BUFFER_UPDATE_BARRIER_BIT,
saying that drivers should implicitly perform the needed flushing.
This seems like a pretty big assumption to make. Instead, this commit
opts to translate them to new PIPE_BARRIER bits, and adjusts existing
drivers to continue ignoring them (preserving the current behavior).
The i965 driver performs actions on these memory barriers. Shader
memory stores go through a "data cache" which is separate from the
render cache and other read caches (like the texture cache). All
memory barriers need to flush the data cache (to ensure shader memory
stores are visible), and possibly invalidate read caches (to ensure
stale data is no longer visible). The driver implicitly flushes for
most caches, but not for data cache, since ARB_shader_image_load_store
introduced MemoryBarrier() precisely to order these explicitly.
I would like to follow i965's approach in iris, flushing the data cache
on any MemoryBarrier() call, so I need st/mesa to actually call the
pipe->memory_barrier() callback.
Fixes KHR-GL45.shader_image_load_store.advanced-sync-textureUpdate
and Piglit's spec/arb_shader_image_load_store/host-mem-barrier on
the iris driver.
Roland said this looks reasonable to him.
Reviewed-by: Eric Anholt <eric@anholt.net>
Handle buffers whose width is not aligned to 16px by padding the stride
and storing it accordingly.
This does not reject imports for images whose stride is not sufficiently
aligned.
v2: make sure bo->stride is set on imported buffers, and add missing
variable definition. (Tomeu)
Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
With autotools this close to being not supported anymore, let's not
waste half of the CI cycles on it. The default build will catch most
issues, and the rest can be tested by the old Travis.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows DRI3 to pick between UIF and raster according to whether we're
pageflipping or not and whether the pageflipping display can do UIF,
avoiding copies for the windowed/composited case that previously was
forced to linear.
Improves windowed glmark2 -b build:use-vbo=false performance by 30.7783%
+/- 13.1719% (n=3)
We ask the other side to make a buffer with the right number of pages, and
then just store the UIF in it. This avoids an extra silent copy of the
buffer from linear to UIF if it gets used for texturing (X11 copy-based
swapbuffers, GL compositors).
This reverts commit 1aa5738e66.
This patch incorrectly asumed that for SSOs no inner interface
matching check was needed.
From the ARB_separate_shader_objects spec v.25:
" With separable program objects, interfaces between shader stages
may involve the outputs from one program object and the inputs
from a second program object. For such interfaces, it is not
possible to detect mismatches at link time, because the programs
are linked separately. When each such program is linked, all
inputs or outputs interfacing with another program stage are
treated as active. The linker will generate an executable that
assumes the presence of a compatible program on the other side of
the interface. If a mismatch between programs occurs, no GL error
will be generated, but some or all of the inputs on the interface
will be undefined."
This completes the fix from commit:
3be05dd267 ("glsl/linker: don't fail non static used inputs without matching outputs")
Fixes: 1aa5738e66 ("glsl: relax input->output validation for SSO programs")
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Current implementation uses a complicated calculation which relies in
an implicit conversion to check the integral part of 2 division
results.
However, the calculation actually checks that the xfb_offset is
smaller or a multiplier of the xfb_stride. For example, while this is
expected to fail, it actually succeeds:
"
...
layout(xfb_buffer = 2, xfb_stride = 12) out block3 {
layout(xfb_offset = 0) vec3 c;
layout(xfb_offset = 12) vec3 d; // ERROR, requires stride of 24
};
...
"
Fixes: 2fab85aaea ("glsl: add xfb_stride link time validation")
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
If there is no Static Use of an input variable, the linker shouldn't
fail whenever there is no defined matching output variable in the
previous stage.
From page 47 (page 51 of the PDF) of the GLSL 4.60 v.5 spec:
" Only the input variables that are statically read need to be
written by the previous stage; it is allowed to have superfluous
declarations of input variables."
Now, we complete this exception whenever the input variable has an
explicit location. Previously, 18004c338f ("glsl: fail when a
shader's input var has not an equivalent out var in previous") took
care of the cases in which the input variable didn't have an explicit
location.
v2: do the location based interface matching check regardless on
whether it is a separable program or not (Ilia).
Fixes: 1aa5738e66 ("glsl: relax input->output validation for SSO programs")
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Outputs are always validated when having explicit locations and we
were trusting its outcome to catch similar problems with the inputs
since, in case of having undefined outputs for existing inputs, we
would be already reporting a linker error.
However, consider this case:
" Shader stage n:
---------------
...
layout(location = 0) out float a;
...
Shader stage n+1:
-----------------
...
layout(location = 0) in float b;
layout(location = 0) in float c;
...
"
Currently, this won't report a linker error even though location
aliasing is happening for the inputs.
Therefore, we also need to validate the inputs independently from the
outcome of the outputs validation.
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
From page 62 (page 68 of the PDF) of the GLSL 4.50 v.7 spec:
" A dvec3 or dvec4 can only be declared without specifying a
component."
Therefore, using the "component" qualifier with a dvec3 or dvec4
should result in a compiling error.
v2: enhance the error message (Timothy).
Fixes: 94438578d2 ("glsl: validate and store component layout qualifier in GLSL IR")
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This reverts commit db57db5317. When
building IR, nothing is really immutable and, since C has no concept of
constness propagating beyond the first pointer, we have to be vary
careful with how we use it. To just throw const into a function like
this is a lie.
Instead, we should just drop the unneeded const in spirv_to_nir which
this commit does along with the revert.
`clang` has a different set of warnings and errors than `gcc`, so it's
useful to do at least a generic pass over Mesa with it.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Fix this check so that we can get a HiZ aux buffer for multisampled
surfaces as well. Also make sure we don't try to emit a sampler view
surface state for multisampled depth sufaces with HiZ enabled, as
the sampler can't HiZ for multisampled buffers and isl would assert.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
the idea here is to generate an entry point stub function wrapping around the
actual kernel function and turn all parameters into shader inputs with byte
addressing instead of vec4.
This gives us several advantages:
1. calling kernel functions doesn't differ from calling any other function
2. CL inputs match uniforms in most ways and we can just take advantage of most
of nir_lower_io
v2: move code into a seperate function
v3: verify the entry point got a name
fix minor typo
v4: make vtn_emit_kernel_entry_point_wrapper take the old entry point as an arg
Signed-off-by: Karol Herbst <kherbst@redhat.com>
local memory is too small to require 64 bit pointers, so cast the array index
to a 32 bit value to save up on 64 bit operations.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
We need this for OpenCL kernels because we have to apply C rules for alignment
and padding inside structs and for this we also have to know if a struct is
packed or not.
v2: fix for kernel params
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
There are two stages to varying assembly in the command stream: creating
the varying buffers in the command stream, and creating the varying meta
descriptors (also in the command stream) linked to the aforementioned
buffers. The previous code for this was ad hoc and brittle, making some
invalid assumptions causing unmaintainable workarounds to pile up across
the driver (both compiler and command stream side).
This patch completely rewrites the varying assembly code. There's a
trivial performance penalty (we now memcpy the varying meta to the
command stream on draw, rather than on compile). That said, the
improvement in flexibility and clarity is well-worth it.
The motivator for these changes was support for gl_PointCoord (and
eventually point sprites for legacy GL), which was impossible to
implement with the old varying assembly code. With the new refactor,
it's super easy; support for gl_PointCoord is included with this patch.
All in all, I'm quite happy with how this turned out.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
In addition to fixing actual primconvert bugs, this prevents an infinite
loop when trying to draw POINTS.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
The if is actually returning true on success, enabling fast clears, so we
need to have the test succeed when the iview dimensions are right.
Fixes: d5400a5ec2 "radv: provide a helper for comparing an image extents."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Vulkan spec doesn't explicitly forbid zero size transform
feedback buffers.
Having zero size xfb caused SurfaceSize overflow and
triggered assert in debug build.
The only way to have zero size SO_BUFFER is to disable
SO_BUFFER as stated in hardware spec.
From SKL PRM, Vol 2a, "3DSTATE_SO_BUFFER":
"If set, stream output to SO Buffer is enabled,
if 3DSTATE_STREAMOUT::SO Function ENABLE is also enabled.
If clear, the SO Buffer is considered "not bound" and effectively
treated as a zero- length buffer for the purposes of SO output and
overflow detection. If an enabled stream's Stream to Buffer Selects
includes this buffer it is by definition an overflow condition.
That stream will cause no writes to occur,
and only SO_PRIM_STORAGE_NEEDED[<stream>] will increment."
Fixes: 36ee2fd61c "anv: Implement the basic form of VK_EXT_transform_feedback"
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
VkFullScreenExclusiveEXT comes from the win32 header. Mostly took
the logic from the entrypoint scripts:
1) If there is an ext that has it in the requires and has a platform,
take the guard for that platform.
2) Otherwise assume it is from the core headers.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
In commit 530927d3f6 ("vulkan/util: generate instance/device
dispatch tables") we started generating instance dispatch tables some
of them (like wayland) require external headers.
This commit moves the dependencies up one level so that they apply the
whole vulkan directory. We use them for both the util & overlay layer.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 530927d3f6 ("vulkan/util: generate instance/device dispatch tables")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Some of the header file locations are changed between Android
versions (when VNDK is used), patch makes sure we get all the
required headers.
v2: cleanups, put SDK version checks in all places (Tapani)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Chen Lin Z <lin.z.chen@intel.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
For VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL and
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL we do not care about
the queue mask because
1) using these is only allowed on the gfx queue
2) transitions for these are only allowed on the gfx queue.
This enables some fast clears for Doom that uses
VK_SHARING_MODE_CONCURRENT.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This was used to avoid freeing a sampler view which was created by a
context that was already deleted. But the state tracker does not
allow that.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-By: Jose Fonseca <jfonseca@vmware.com>
This function was used in the past to avoid deleting a sampler view
for a context that no longer exists. But the Mesa state tracker
ensures that cannot happen. Use the standard refcounting function
instead.
Also, remove the code which checked for context mis-matches in
svga_sampler_view_destroy(). It's no longer needed since implementing
the zombie sampler view code in the state tracker.
Testing Done: google chrome, variety of GL demos/games
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-By: Jose Fonseca <jfonseca@vmware.com>
In all instances here we can replace pipe_sampler_view_release(pipe,
view) with pipe_sampler_view_reference(view, NULL) because the views
in question are private to the state tracker context. So there's no
danger of freeing a sampler view with the wrong context.
Testing done: google chrome, misc GL demos, games
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-By: Jose Fonseca <jfonseca@vmware.com>
As with the preceding patch for sampler views, this patch does
basically the same thing but for shaders. However, reference counting
isn't needed here (instead of calling cso_delete_XXX_shader() we call
st_save_zombie_shader().
The Redway3D Watch is one app/demo that needs this change. Otherwise,
the vmwgfx driver generates an error about trying to destroy a shader
ID that doesn't exist in the context.
Note that if PIPE_CAP_SHAREABLE_SHADERS = TRUE, then we can use/delete
any shader with any context and this mechanism is not used.
Tested with: google-chrome, google earth, Redway3D Watch/Turbine demos
and a few Linux games.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-By: Jose Fonseca <jfonseca@vmware.com>
When st_texture_release_all_sampler_views() is called the texture may
have sampler views belonging to several contexts. If we unreference a
sampler view and its refcount hits zero, we need to be sure to destroy
the sampler view with the same context which created it.
This was not the case with the previous code which used
pipe_sampler_view_release(). That function could end up freeing a
sampler view with a context different than the one which created it.
In the case of the VMware svga driver, we detected this but leaked the
sampler view. This led to a crash with google-chrome when the kernel
module had too many sampler views. VMware bug 2274734.
Alternately, if we try to delete a sampler view with the correct
context, we may be "reaching into" a context which is active on
another thread. That's not safe.
To fix these issues this patch adds a per-context list of "zombie"
sampler views. These are views which are to be freed at some point
when the context is active. Other contexts may safely add sampler
views to the zombie list at any time (it's mutex protected). This
avoids the context/view ownership mix-ups we had before.
Tested with: google-chrome, google earth, Redway3D Watch/Turbine demos
a few Linux games. If anyone can recomment some other multi-threaded,
multi-context GL apps to test, please let me know.
v2: avoid potential race issue by always adding sampler views to the
zombie list if the view's context doesn't match the current context,
ignoring the refcount.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-By: Jose Fonseca <jfonseca@vmware.com>
Split up the "Environment Variables" section into "Compiler Options"
and "Compiler Specification". I think this makes the information
easier to find and understand.
Add the necessary build rules for android, to avoid building errors.
Fixes: f014ae3 ("nouveau: add support for nir")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Without this the build breaks with:
In file included from ../src/vulkan/util/vk_util.h:32,
from ../src/vulkan/util/vk_util.c:28:
../include/vulkan/vulkan.h:51:10: fatal error: wayland-client.h: No such file or
directory
#include <wayland-client.h>
^~~~~~~~~~~~~~~~~~
compilation terminated.
The above misses the include directory for wayland:
-I/usr/include/wayland
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
v4: use smarter getIndirect helper
use new getSlotAddress helper
v5: use loadFrom helper
v8: don't require C++11 features
Signed-off-by: Karol Herbst <kherbst@redhat.com>
v3: fix compiler warnings
v4: use loadFrom helper
v5: fix signed min/max
v6: set tex mask
add support for indirect image access
set cache mode
v7: make compatible with 884d27bcf6
rework the whole deref thing to prepare for bindless
v8: port to deref instructions
don't require C++11 features
v9: implement MS images
rebase on master (image modifiers)
fix regressions due to variable src compnents
replace '(*it).' with 'it->'
convert to C++ style comments
Signed-off-by: Karol Herbst <kherbst@redhat.com>
v4: use smarter getIndirect helper
use new getSlotAddress helper
use loadFrom helper
v8: don't require C++11 features
Signed-off-by: Karol Herbst <kherbst@redhat.com>
We store those arrays in local memory and reserve some space for each of the
arrays. With NIR we could store those arrays packed, but we don't do that yet
as it causes MemoryOpt to generate unaligned memory accesses.
v3: use fixed size vec4 arrays until we fix MemoryOpt
v4: fix for 64 bit types
v5: use loadFrom helper
v8: don't require C++11 features
v9: convert to C++ style comments
Signed-off-by: Karol Herbst <kherbst@redhat.com>
v2: add vote_eq support
use the new subop intrinsic helper
add ballot
v3: add read_(first_)invocation
v8: handle vectorized intrinsics
don't require C++11 features
v9: lower_subgroups to 32 bit (produces less instructions)
use getSSA and getScratch instead of new_LValue
Signed-off-by: Karol Herbst <kherbst@redhat.com>
a lot of those fields are not valid for a lot of tex ops. Not quite sure if
it's worth the effort to check for those or just keep it like that. It seems
to kind of work.
v2: reworked offset handling
add tex support with indirect R/S arguments
handle GLSL_SAMPLER_DIM_EXTERNAL
drop reference in convert(glsl_sampler_dim&, bool, bool)
fix tg4 component selection
v5: fill up coords args with scratch values if coords provided is less than TexTarget.getArgCount()
v7: prepare for bindless_texture support
v8: don't require C++11 features
v9: convert to C++ style comments
fix txf with a uniform constant 0 lod
Signed-off-by: Karol Herbst <kherbst@redhat.com>
v2: support more sys values
fixed a bug where for multi component reads all values ended up in x
v3: add load_patch_vertices_in
v4: add subgroup stuff
v5: add helper invocation
v6: fix loading 64 bit system values
v8: don't require C++11 features
v9: convert to C++ style comments
Signed-off-by: Karol Herbst <kherbst@redhat.com>
v3: and load_output
v4: use smarter getIndirect helper
use new getSlotAddress helper
v5: don't use const_offset directly
fix for indirects
v6: add support for interpolateAt
v7: fix compiler warnings
add load_barycentric_sample
handle load_output for fragment shaders
v8: set info->prop.fp.readsSampleLocations for at_sample interpolation
don't require C++11 features
v9: convert to C++ style comments
Signed-off-by: Karol Herbst <kherbst@redhat.com>
v3: add workaround for RA issues
indirects have to be multiplied by 0x10
fix indirect access
v4: use smarter getIndirect helper
use storeTo helper
v5: don't use const_offset directly
v8: don't require C++11 features
v9: convert to C++ style comments
handle clip planes correctly
Signed-off-by: Karol Herbst <kherbst@redhat.com>
v2: use new getIndirect helper
fixes symbols for 64 bit types
v4: use smarter getIndirect helper
simplify address calculation
use loadFrom helper
v8: don't require C++11 features
Signed-off-by: Karol Herbst <kherbst@redhat.com>
v2: user bitfield_insert instead of bfi
rework switch helper macros
remove some lowering code (LoweringHelper is now used for this)
v3: add pack_half_2x16_split
add unpack_half_2x16_split_x/y
v5: replace first argument with nullptr in loadImm calls
prefer getSSA over getScratch
v8: fix setting precise modifier for first instruction inside a block
add guard in case no instruction gets inserted into an empty block
don't require C++11 features
v9: use CC_NE for integer compares
convert to C++ style comments
fix b2f for doubles
remove macros around nir ops to make it easier to grep them
add handling for fpow
Signed-off-by: Karol Herbst <kherbst@redhat.com>
v2: parse a few more fields
v3: add special handling for GL_ISOLINES
v8: set info->prop.fp.readsSampleLocations
don't require C++11 features
v9: replace '(*it).' with 'it->'
convert to C++ style comments
Signed-off-by: Karol Herbst <kherbst@redhat.com>
v2: add support for geometry shaders
set idx
add some missing mappings
fix for 64bit inputs/outputs
fix up some FP color output index messup
parse centroid flag
v3: fix arrays in outputs as well
fix input/ouput size calculation for tessellation shaders
v4: add getSlotAddress helper
fix for 64 bit typed inputs
v5: change getSlotAddress interface for easier use
fix sample inputs
fix slot counting for mat
v7: fix driver_location of images
v8: don't require C++11 features
v9: convert to C++ style comments
support VERT_ATTRIB_POINT_SIZE
add more error checking to slots
Signed-off-by: Karol Herbst <kherbst@redhat.com>
v4: treat imul as unsigned
v5: remove pointless !!
v7: inot is unsigned as well
v8: don't require C++11 features
v9: convert to C++ style comments
improve formatting
print error in all cases where codegen doesn't support a given type
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Pierre Moreau <pierre.morrow@free.fr>
v2: add helper function for indirects
v4: add new getIndirect overload for easier use
v5: use getSSA for ssa values
we can just create the values for unassigned registers in getSrc
v6: always create at least 32 bit values
v8: don't require C++11 features
v9: include unordered_map on supported stdlibs
replace '(*it).' with 'it->'
Signed-off-by: Karol Herbst <kherbst@redhat.com>
v2: add constant_folding
v6: print non final NIR only for verbose debugging
v8: add passes we will need for OpenCL compute shaders
v9: move type_size into anonymous namespace
convert to C++ style comments
lower bools to int32
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Pierre Moreau <pierre.morrow@free.fr>
v9: rename variable to driver_flags
use constants for shader cache flags
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
not all those nir options are actually required, it just made the work a
little easier.
v2: fix asserts
parse compute shaders
don't lower bitfield_insert
v3: fix memory leak
v4: don't lower fmod32
v5: set lower_all_io_to_temps to false
fix memory leak because we take over ownership of the nir shader
merge: use the lowering helper
v6: include TGSI debug header for proper assert call
add nv50 support
v7: fix Automake build
v8: free shader only for the set shader type
v9: check for IR type inside get_compiler_options
squash "nouveau: add env var to make nir default"
fix memory leak when creating compute shaders
use debug_get_bool_option as it is available in non debug builds
return failure if unsupported IR is encountered
don't lower fpow in nir
lower int 64 divmod inside nir to prevent crashes
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
if we start supporting multiple input IRs we might want to move lowering code
into a common place and keep the initial translation simplier.
This will also allows us to react on ISA changes more easily.
v5: also handle SAT
v6: rename type variables
fixed lowering of NEG
add lowering of NOT
v8: don't require C++11 features
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
No autotools build to care about.
The half baked turnips param is kind of ugly, but felt like a waste
defining more variables for it now.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Avoids
src/freedreno/vulkan/meson.build:42:0: ERROR: Tried to create target "vk_format_table.c", but a target of that name already exists.
when building both radv and turnip.
Fixes: 26380b3a9f "turnip: Add driver skeleton (v2)"
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Apparently GCC does not consider static const variables to be
integer constants, and hence the array size and the static assert
result in compile failures.
Fixes: 4b9f967cd1 "turnip: add a more complete format table"
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
This fixes a serious performance issue with DXVK:
https://github.com/doitsujin/dxvk/issues/937
This was caused by a recent change that to improve performance on RADV
which back-fired on ANV and killed performance for some apps:
e5a06d3f4a
Throwing in this bit of lowering lets us come along and CSE those UBO
loads (or copy-prop for SSBO load) and get one load where we previously
would have gotten several.
VkPipeline-db results on Kaby Lake:
total instructions in shared programs: 5115361 -> 5073185 (-0.82%)
instructions in affected programs: 1754333 -> 1712157 (-2.40%)
helped: 5331
HURT: 63
total cycles in shared programs: 2544501169 -> 2481144545 (-2.49%)
cycles in affected programs: 2531058653 -> 2467702029 (-2.50%)
helped: 9202
HURT: 4323
total loops in shared programs: 3340 -> 3331 (-0.27%)
loops in affected programs: 9 -> 0
helped: 9
HURT: 0
total spills in shared programs: 3246 -> 3053 (-5.95%)
spills in affected programs: 384 -> 191 (-50.26%)
helped: 10
HURT: 5
total fills in shared programs: 4626 -> 4452 (-3.76%)
fills in affected programs: 439 -> 265 (-39.64%)
helped: 10
HURT: 5
All of the shaders with hurt spilling were in Rise of the Tomb Raider
which also had shaders solidly helped in the spilling department. Not
shown in those results (because I've not had success dumping the
shaders) is Witcher 3 where this reduces spilling and improves over-all
perf by around 20-25%. There were no shader-db changes. Apparently,
this just isn't a pattern that happens in OpenGL.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: "19.0" mesa-stable@lists.freedesktop.org
This pass was originally written for lowering TCS output reads and
writes but it is also applicable just about anything including UBOs,
SSBOs, and shared variables.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This one's a tiny bit better than what we had in spirv_to_nir because it
emits a binary tree rather than a linear walk. It also doesn't leave
around unneeded bcsel instructions for a constant index and returns an
undef for constant OOB access.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Something that we didn't hit earlier because of the extra shr.b
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Increase shader_params size to pass sampler data to
compute shader during weave de-interlace.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Tested-by: Bruno Milreu <bmilreu@gmail.com>
The OpenMAX state tracker will use this.
RadeonSI is adapted to use pipe_grid_info::last_block instead of its
internal state.
Acked-by: Leo Liu <leo.liu@amd.com>
When varyings was added we moved to use to dynamycally allocated
pointers, instead of allocating just one block for everything. That
breaks some assumptions of some vulkan drivers (like anv), that make
serialization and copying easier. And at the same time, varyings are
not needed for vulkan.
So this commit moves them out. Although it seems a little an overkill,
fixing the anv side would require a similar, or more, changes, so in
the end it is about to decide where do we want to put our effort.
v2: (from Jason review)
* Don't use a temp variable on the _create methods, just return
result of rzalloc_size
* Wrap some lines too long.
Fixes: cf0b2ad486 ("nir/xfb: adding varyings on nir_xfb_info and gather_info")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The maximum value primitive restart index is different for each index data
type. Use the appropriate fixed restart index value.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The standard requires that the primitive restart comparison happens before
the basevertex value is added. Do this now, drop a reference to the standard
why this happens at this place.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Since mapping and unmapping the buffer objects in a VAO is handled
directly from the VAO, this part of the _NEW_ARRAY state is no longer
used. So remove this part of array element state.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Due to the use of bitmaps, the _mesa_vao_{,un}map_arrays functions
should provide comparable runtime efficienty to the currently used
_ae_{,un}map_vbos functions. So use this functions and enable
further cleanup.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Make use of the newly factored out _mesa_array_element function
in display list compilation. For now that duplicates out the
primitive restart logic. But that turns out to need a fix in
display list handling anyhow.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The factored out function handles emitting the vertex attributes
at the given index. The now public accessible function gets used
in the following patches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Provide a set of functions that maps or unmaps all VBOs held
in a VAO. The functions will be used in the following patches.
v2: Update comments.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Instead, we do UBO and SSBO deref lowering in NIR after we've given it a
chance to optimize SSBO access:
Shader-db results on Kaby Lake:
total instructions in shared programs: 15235775 -> 15235484 (<.01%)
instructions in affected programs: 14992 -> 14701 (-1.94%)
helped: 19
HURT: 20
total cycles in shared programs: 339220331 -> 339027307 (-0.06%)
cycles in affected programs: 79831981 -> 79638957 (-0.24%)
helped: 540
HURT: 602
total loops in shared programs: 4402 -> 4348 (-1.23%)
loops in affected programs: 186 -> 132 (-29.03%)
helped: 27
HURT: 0
total spills in shared programs: 23261 -> 23234 (-0.12%)
spills in affected programs: 38 -> 11 (-71.05%)
helped: 1
HURT: 0
total fills in shared programs: 31442 -> 31371 (-0.23%)
fills in affected programs: 98 -> 27 (-72.45%)
helped: 1
HURT: 0
LOST: 12
GAINED: 12
Most of the help and hurt in instruction counts was just churn caused by
re-ordering of optimizations and the fact that the NIR deref lowering
code is emitting slightly different instructions. Nothing was hurt by
more than three instructions and most things weren't helped by more than
four. The primary exception to this is one Car Chase shader:
shaders/non-free/gfxbench4/carchase/341.shader_test CS SIMD32: 1144 -> 821 (-28.23%)
There is also one compute shader in Manhattan 3.1 and a fragment shader
in the UE4 Shooter Game demo that now get a loop partially unrolled.
Those showed up in the results as hurt instructions but were manually
removed to get the results above.
The lost/gained was a dozen Car Chase shaders that went from SIMD8 to
SIMD16 thanks to improved register pressure:
shaders/non-free/gfxbench4/carchase/366.shader_test CS
shaders/non-free/gfxbench4/carchase/368.shader_test CS
shaders/non-free/gfxbench4/carchase/370.shader_test CS
shaders/non-free/gfxbench4/carchase/372.shader_test CS
shaders/non-free/gfxbench4/carchase/376.shader_test CS
shaders/non-free/gfxbench4/carchase/378.shader_test CS
shaders/non-free/gfxbench4/carchase/380.shader_test CS
shaders/non-free/gfxbench4/carchase/382.shader_test CS
shaders/non-free/gfxbench4/carchase/384.shader_test CS
shaders/non-free/gfxbench4/carchase/388.shader_test CS
shaders/non-free/gfxbench4/carchase/4.shader_test CS
shaders/non-free/gfxbench4/carchase/6.shader_test CS
Given how much it appeared to be improved, I ran Car Chase on my laptop.
Unfortunately, I wasn't able to see any measurable improvement. It
might be helped by 1-2% but it's in the noise. It does render correctly
as far as I can tell so the improvement is legitimate.
All of the loops that got delete were in dolphin uber shaders. I've had
no opportunity to test them for correctness or performance.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We didn't have any of these before because all NIR consumers always
called lower_ubo_references. Soon, we want to pass the derefs straight
through to NIR so we need to handle these intrinsics directly.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We want to be able to use variables and derefs for UBO/SSBO access in
NIR. In order to do this, the rest of NIR needs to know the type layout
information.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
All of these are backed by some sort of memory so if you have multiple
threads writing to different components of the same vector at the same
time, the load-vec-store pattern that GLSL IR emits won't work. This
shouldn't affect any drivers today as they all call GLSL IR lowering
which lowers access to these variables to index+offset intrinsics before
we get to this point. However, NIR will start handling the derefs
itself and won't want the lowering.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
It's just a 32-bit index and offset. We're going to want to use it in
GL as well so stop talking about Vulkan.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
If we get to two deref_var paths with different variables, we usually
know they don't alias. However, if both of the paths are marked
coherent, we don't have to worry about it.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We also need to modify the current size/align helpers to not blow up
when they encounter an explicitly laid out type. Previously we
considered using the size/align helpers mutually exclusive with standard
layouts but now we just assert that they match.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
With UBOs and SSBOs we have boolean types but they're actually 32-bit
values. Make the validator a little less strict so that we can do a
32-bit load/store on boolean types. We're about to add a lowering pass
called gl_nir_lower_buffers which will lower boolean load/store
operations to 32-bit and insert i2b and b2i instructions to convert
to/from 1-bit booleans. We want that to be legal.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
If we want to be able to use copy_deref instructions on explicitly laid
out types, we have to be a little more flexible about what types we
allow. Instead, of requiring the types to exactly match, only require
the bare types to match.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Shader-db results on Kaby Lake:
total instructions in shared programs: 15225213 -> 15222365 (-0.02%)
instructions in affected programs: 43524 -> 40676 (-6.54%)
helped: 203
HURT: 0
Lots of shaders in Shadow Warrior had this pattern along with Deus Ex,
Civ, Shadow of Mordor, and several others.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Starting a glxgears and closing it, I was seeing a lot of leaked TGSI for
the fixed function VPs.
v2: drop unused delete_ir() arg.
Fixes: 3b4929ec6e ("st/mesa: Copy VP TGSI tokens if they exist, even for NIR shaders.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GLSL NIR gets freed on relink by _mesa_delete_program(), but for ARB
programs we need to free the old NIR when PSN is used to set up new NIR in
the same gl_program. Additionally, set the base .nir field so that it
will get freed by _mesa_delete_program().
Fixes: 3d7611e9a6 ("st/nir: use NIR for asm programs")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We have a native op for this, which was just found in a disassembly --
so instead of lowering, use it!
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Previously, we were caching this incorrectly; there's no real reason to
given how variable it is (sensitive to changes in viewport, framebuffer
dimensions, and scissors) and how cheap it is to recompute. So, just do
it on the fly each draw.
Fixes glmark-es2 -bshadow and -brefract.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
For inexplicable reasons, the depth buffer is faster if kept as linear,
whereas the colour buffers are faster if AFBC. Given both code paths are
available, we'll choose the faster one of each (which also helps with
testing coverage).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
It's not clear why the hardware "spills" a little bit, but if we don't
do this, we get MMU faults with linear depth buffers.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
While a depth buffer may be supplied, it only needs to be written to if
the depth writemask is set for any draw AND if the depth buffer is not
immediately invalidated (as is the case for scanout). This refactors
panfrost_job to provide a depth write requirement, which is now
implemented for MFBD depth buffers.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This removes a clunky hack where the depth buffer was enabled during the
*clear*, instead of during depth buffer linking. That said, this does
not yet support writeback like AFBC depth buffers.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
The fragment framebuffer descriptor should not be a context entry;
rather, it should be constructed only at fragment time to keep analysis
tractable.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This substantially cleans up the corresponding logic at the expense of a
bit of code duplication; nevertheless, it's a net win since otherwise
incompatible hardware code is mixed confusingly.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This function is replicated across vc4/v3d/freedreno and is needed in
Panfrost; let's make this shared code.
v2: Supply generic util_array_contains_u64 version (Eric Engestrom). Add
missing stdbool.h include (Eric Anholt). Mark inline (Christian
Gmeiner).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
_mesa_log_msg must provide the length of the string passed into the
KHR_debug api. When the string formatted by _mesa_gl_vdebugf exceeds
MAX_DEBUG_MESSAGE_LENGTH, the length is incorrectly set to the number
of characters that would have been written if enough space had been
available.
Fixes: 3025680578
("mesa: Add support for GL_ARB_debug_output with dynamic ID allocation.")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
We don't set it on HSW and earlier in i965 and disabling it appears to
make derivatives somewhat more reliable.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Not mutating the boxes is arguably cleaner.
Split from a patch by Chris Wilson but reworked to use a pointer to the
original box rather than making a copy at all.
Patch removes scaling factors introduced in 2a2e69f975 but leaves
option to use scaling in place as it could be useful with other upcoming
YUV formats.
We did this scaling because ffmpeg was shifting channel bits down, however
it seems this is not the right place as compositor wants to flip same
buffers directly to display as well and therefore bitshifting needs to be
done by the client when receiving frame from ffmpeg.
Now P0x formats are treated the same, e.g. P010 is same as P016 but with
lower 6 bits set to zeros.
Fixes: 2a2e69f975 "i965: add P0x formats and propagate required scaling factors"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We've been fairly inconsistent about this so we should really choose
whether we're going to use VK_TRUE/FALSE or the C boolean values. The
Vulkan #defines are set to 1 and 0 respectively so it's the same value
as C gives you when you cast a boolean expression to an integer. Since
there are several places where we set a VkBool32 to a C logical
expression, let's just embrace C booleans and stop using the VK defines.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Now that we are properly resolving buffers before giving them to the
window system, let's enable aux support again.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In i965, we disable the use of RGBX formats, so the higher layers of
Mesa choose the equivalent RGBA format, and swizzle the alpha channel to
1.0.
However, Gallium won't do that. We need to explicitly convert it to
RGBA.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The flush_resource hook is supposedly called when the resource content
needs to be made visible to external (okay, that's pretty vague). For
instance, it gets called before a surface gets handled to the window
system. So we need to resolve it if it's not resolved yet.
v2 (Ken):
- Check mod_info in iris_flush_resource instead of ISL_AUX_USAGE_NONE
- Drop my old broken resolve code from iris_resource_get_handle() now
that Rafael's got it hooked up in the right place.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While we lack value range tracking, this patch tries to 'manually' propogate
the division by 4 to calculate SSBO element-offset, into a possible previous
shift operation (shift left or right); checking that it is safe to do so.
This should help in cases like ie. when accessing a field in an array of
structs, where the offset is likely defined as base plus a multiplication
by a struct or array element size.
See dEQP test 'dEQP-GLES31.functional.ssbo.atomic.xor.highp_uint'
for an example of a shader that benefits from this.
Reviewed-by: Rob Clark <robdclark@gmail.com>
These intrinsics have the offset in dwords already computed in the last
source, so the change here is basically using that instead of emitting
the ir3_SHR to divide the byte-offset by 4.
The improvement in shader stats is significant, of up to ~15% in
instruction count in some cases. Tested only on a5xx.
shader-db is unfortunately not very useful here because shaders that use
SSBO require GLSL versions that are not supported by freedreno yet.
For examples, most Khronos CTS tests under 'dEQP-GLES31.functional.ssbo.*'
are helped.
A random case:
dEQP-GLES31.functional.ssbo.layout.2_level_array.packed.row_major_mat3x2
with current master:
; CL prog 14/1: 1252 instructions, 0 half, 48 full
; 8 const, 8 constlen
; 61 (ss), 43 (sy)
with the SSBO dword-offset moved to NIR:
; CL prog 14/1: 1053 instructions, 0 half, 45 full
; 7 const, 7 constlen
; 34 (ss), 73 (sy)
The SHR previously emitted for every single SSBO instruction disappears
in most cases, and the dword-offset ends up embedded in the STGB
instruction as immediate in many cases as well.
There are also a few of those tests that are currently failing on register
allocation, that start to pass as a result of reducing the pressure. At least
these, probably more:
dEQP-GLES31.functional.ssbo.layout.random.unsized_arrays.24
dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.6
dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.17
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays.14
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.5
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.7
No regressions observed with relevant CTS and piglit tests.
Reviewed-by: Rob Clark <robdclark@gmail.com>
This NIR->NIR pass implements offset computations that are currently
done on the IR3 backend compiler, to give NIR a better chance of
optimizing them.
For now, it supports lowering the dword-offset computation for SSBO
instructions. It will take an SSBO intrinsic and replace it with the
new ir3-specific version that adds an extra source. That source will
hold the SSA value resulting from inserting a division by 4 (an SHR op)
of the original byte-offset source already provided by NIR in one of
the intrinsic sources.
Note that on a6xx the original byte-offset is not needed, so we could
potentially replace that source instead of adding a new one. But to
keep things simple and consistent we always add the new source and
a6xx will just ignore the original one.
Reviewed-by: Rob Clark <robdclark@gmail.com>
These are ir3 specific versions of SSBO intrinsics that add an
extra source to hold the element offset (dword), which is what the
backend instructions need.
The original byte-offset source provided by NIR is not replaced
because on a4xx and a5xx the backend still needs it.
Reviewed-by: Rob Clark <robdclark@gmail.com>
indexConfigAttrib iterates over every index in the dri driver, possibly
exceeding __DRI_ATTRIB_MAX. In other words, if the dri driver has newer
attributes libEGL will end up reading from uninitialized memory through
dri2_to_egl_attribute_map[].
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Always use the streaming load (since we know we have Broadwell+, all of
our target CPU support sse41) for reading back form the tiled surface
for mapping the resource. This means we hit the fast WC handling paths
on Atoms (without LLC), and for big Core (with LLC) using the streaming
load is no less efficient as we do not require the tiled buffer to be
pulled into the CPU cache.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On !llc machines (Atoms), reading from a linear buffers is slow and so
copying from one resource into the linear staging buffer is still slow.
However, we can tell the GPU to snoop the CPU cache when reading from and
writing to the staging buffer eliminating the slow uncached reads.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Due to lack of write mask in SPIR-V store, generators may produce
multiple stores to the same vector but using different array derefs.
Use the combining store pass to clean this up. For example,
layout(binding = 3) buffer block {
vec4 v;
};
void main() {
v.x = 11;
v.y = 22;
}
after going to SPIR-V and NIR, ends up with in two store_derefs to
v[0] and v[1]
vec2 32 ssa_4 = deref_struct &ssa_3->field0 (ssbo vec4) /* &((block *)ssa_2)->field0 */
vec2 32 ssa_6 = deref_array &(*ssa_4)[0] (ssbo float) /* &((block *)ssa_2)->field0[0] */
intrinsic store_deref (ssa_6, ssa_7) (1, 0) /* wrmask=x */ /* access=0 */
vec1 32 ssa_13 = load_const (0x00000001 /* 0.000000 */)
vec2 32 ssa_14 = deref_array &(*ssa_4)[1] (ssbo float) /* &((block *)ssa_2)->field0[1] */
intrinsic store_deref (ssa_14, ssa_15) (1, 0) /* wrmask=x */ /* access=0 */
producing two different sends instructions in skl. The combining pass
transform the snippet above into
vec2 32 ssa_4 = deref_struct &ssa_3->field0 (ssbo vec4) /* &((block *)ssa_2)->field0 */
vec4 32 ssa_18 = vec4 ssa_7, ssa_15, ssa_16, ssa_17
intrinsic store_deref (ssa_4, ssa_18) (3, 0) /* wrmask=xy */ /* access=0 */
producing a single sends instruction.
v2: Move this from spirv_to_nir into the general optimization pass for
intel compiler. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: (all from Jason)
Reuse existing function for the end of the block combinations.
Check the SSA values are coming from the right place in tests.
Document the case when the store to array_deref is reused.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We were not copying the saturate bit from the original instruction
to the new replacement instruction. This caused major misrendering
in DiRT Rally on iris, where comparisons leading to discards failed
due to the missing saturate, causing lots of extra garbage pixels to
be drawn in text rendering, trees, and so on.
This did not show up on i965 because st/nir performs a more aggressive
version of nir_opt_peephole_select, yielding more b32csel operations.
Fixes: 52c7df1643 i965/fs: Merge CMP and SEL into CSEL on Gen8+
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tessellation control shader outputs act as if they have memory backing
them and you can have multiple writes to different components of the
same vector in-flight at the same time. When this happens, the load vec
store pattern that gets used by ir_triop_vector_insert doesn't yield the
correct results. Instead, just emit a sequence of conditional
assignments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
The previous code was completely broken when it came to constructing the
undef values. I'm not sure how it ever worked. For the case of a copy
that reads an undefined value, we can just delete the copy because the
destination is a valid undefined value. This saves us the effort of
trying to construct a value for an arbitrary copy_deref intrinsic.
Fixes: e8a8937a04 "nir: add partial loop unrolling support"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
v2: Only reject no-error contexts for too-old GL if we're actually
trying to create a no-error context (Adam Jackson)
v3: Fix share contexts (Adam Jackson)
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
pool->next and pool->free_list were reset before their usage in
anv_descriptor_pool_free_set
Fixes: 775aabdd "anv: destroy descriptor sets when pool gets reset"
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This reduces the runtime of dEQP-GLES3.functional.shaders.precision.* from
11.5s to 3.3s. This brings CTS runs down to 4 hours on one of my target
devices.
The IO scalarization pass that we run to help with linking end up
turning some shader I/O such as that for tessellation and geometry
shaders into many scalar URB operations rather than one vector one. To
alleviate this, we now vectorize the I/O once again. This fixes a 10%
performance regression in the GfxBench tessellation test that was caused
by scalarizing.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15224023 -> 15220871 (-0.02%)
instructions in affected programs: 342009 -> 338857 (-0.92%)
helped: 1236
HURT: 443
total spills in shared programs: 23471 -> 23465 (-0.03%)
spills in affected programs: 6 -> 0
helped: 1
HURT: 0
total fills in shared programs: 31770 -> 31766 (-0.01%)
fills in affected programs: 4 -> 0
helped: 1
HURT: 0
Cycles was just a lot of churn do to moves being different places. Most
of the pure churn in instructions was +/- one or two instructions in
fragment shaders.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
Fixes: 4434591bf5 "intel/nir: Call nir_lower_io_to_scalar_early"
Fixes: 8d8222461f "intel/nir: Enable nir_opt_find_array_copies"
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This ensures Mesa3D build doesn't fail in this case as encountered when
bisecting Scons source code while regression testing
https://bugs.freedesktop.org/show_bug.cgi?id=109443
and when testing 3.0.5.a.2
Technical details:
Scons version string has consistently been in this format:
MajorVersion.MinorVersion.Patch[.alpha/beta.yyyymmdd]
so these formulas should strip alpha/beta flags and return Scons version:
- as string - `'.'.join(SCons.__version__.split('.')[:3])`
- as tuple of integers - `tuple(map(int, SCons.__version__.split('.')[:3]))`
- v2: Fixed Scons version retrieval formulas as string and tuple of integers.
- v3: Fixed Scons version string format description.
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This reverts commit 47fc359822.
Reason is that patch did not take in to account situation where we might
have both OpenGL and Vulkan using glsl_types at the same time.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This reduces compilation time for my shader-db collection from around 40
seconds to 30, vs. 19 seconds for TGSI. There are still some shaders
that TGSI caches but NIR doesn't, partly because of more aggressive
cross-stage optimizations with NIR.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Oftentimes various nir shaders after lowering will be the same, or
almost the same. For example, this can happen when the same shader is
linked with different shaders to form different pipelines and
cross-stage optimizations don't kick in to change it. We want to avoid
running the backend twice on these shaders. We were already doing this
with radeonsi, but we were storing a few extra pieces of information
that made this much less effective compared to TGSI. The worse offender
by far was the program name, which caused most of the cache misses. This
pass strips out these pieces of information, controlled by the NIR_STRIP
debug env variable.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The values should match the ones that are emitted.
This fixes new CTS dEQP-VK.rasterization.primitive_size.points.*.
Fixes: f4e499ec79 ("radv: add initial non-conformant radv vulkan driver")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes a segfault when we try to access the array using a
-1 when the array wasn't allocated in the first place.
Before 7536af670b we would just access a pre-allocated array
that was also load/stored to/from the shader cache. But now the
cache will no longer allocate these arrays if they are empty.
The change resulted in tests such as the following segfaulting
when run with a warm shader cache.
tests/spec/arb_arrays_of_arrays/execution/sampler/fs-struct-const-index.shader_test
The fragment_extra structure contains additional fields extending the
MRT framebuffer descriptor, snuck in between the main framebuffer
descriptor and the render targets. Its fields include those related to
transaction elimination and depth/stencil buffers. This patch identifies
the flags field (previously just "unk" with some magic values) as well
as identifying some (but not all) flags set by the driver.
The process of identifying flags brought a bug to light where
transaction elimination (checksumming) could not be enabled unless AFBC
was in-use. This issue is now resolved.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
This bit, if set, causes the depth buffer to be copied from GPU tile
memory to the provided depth buffer in main memory. If not set, the GPU
will not access the main memory (saving considerable memory bandwidth if
depth results are not actually used).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This combination has not yet been seen "in the wild" in traces, but to
support linear depth FBOs, ~bruteforce reveals this bit pattern is
necessary. It's not yet clear why the meanings of 0x1 and 0x2 are
essentially flipped (tiled vs linear for colour, linear vs some sort of
tiled for depth).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Previously, linear BOs shared memory with each other to minimize kernel
round-trips / latency, as well as to work around a bug in the free_slab
function. These concerns are invalid now, but continuing to use the slab
allocator for BOs resulted in memory allocation errors. This issue was
aggravated, though not introduced (so not a real regression) in the
previous commit.
v2 (unreviewed): Fix bug in v1 preventing munmaps from working
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Again, these formats are only properly known at the time of fragment job
emit. Rather than hardcoding the format, at least for MFBD we begin to
construct the format bits on-demand. This cleans up the code,
futureproofs for ES3 framebuffer formats, and should fix bugs regarding
FBO colour swizzles.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>
In an effort to cleanup framebuffer management code, we delay
colour buffer setup until the FRAGMENT job is actually emitted, allowing
the AFBC and linear codepaths to be unified.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>
AFBC, tiled, and linear BO layouts are mutually exclusive; they should
be coupled via a single enum rather than ad hoc checks of booleans.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>
I'm not sure why we were checking for these additional criteria (likely
inherited from some other driver); remove the needless checks to cleanup
the code and perhaps fix some bugs down the line.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>
This implements virtually all documented PIPE_CONTROL restrictions
in a centralized helper. You now simply ask for the operations you
want, and the pipe control "brain" will figure out exactly what pipe
controls to emit to make that happen without tanking your system.
The hope is that this will fix some intermittent flushing issues as
well as GPU hangs. However, it also has a high risk of causing GPU
hangs and other regressions, as this is a particularly sensitive
area and poking the bear isn't always advisable.
Mark Janes noted that this patch helps with some GPU hangs on Icelake.
This does re-enable the VF Invalidate => Write Immediate workaround
on Gen8, which had been disabled (bug 103787) due to GPU hangs. The
old code did this workaround after another which would have added CS
stall bits, so it missed a workaround. The new code orders them
properly and appears to work.
v4: Don't pass "bo, offset, imm" to a recursive CS stall (caught by
Topi Pohjolainen), drop Gen10 workarounds that are unnecessary for
production hardware.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
While this does add a bunch of boilerplate, it also protects us against
the hardware moving bits, or changing their meaning. For something as
finnicky as PIPE_CONTROL, the extra safety seems worth it.
We turn PIPE_CONTROL_* into an bitfield of arbitrary flags, and then
pack them appropriately.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This will let us make multiple genX_*.c files, without copy and pasting
all this boilerplate.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Add new Introduction and Advanced Usage sections.
Spell out a few more details, like "ninja install".
Improve the layout around example commands.
Fix grammatical errors and tighten up the text.
Explain the --prefix option.
v2: Remove language about 'ninja clean' and move link to Meson
information about separate build directories earlier in the page.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Rename st_texture_free_sampler_views() to
st_delete_texture_sampler_views() to align with
st_DeleteTextureObject(), its only caller.
Move the call to st_texture_release_all_sampler_views() from
st_DeleteTextureObject() to st_delete_texture_sampler_views()
so all the sampler view clean-up code is in one place.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
st_init_driver_functions() is only called in st_context.c so there's
no need for the prototype in st_context.h
To avoid a forward declaration of st_init_driver_functions() in
st_context.c, we need to move around several other functions.
No functional change.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
To de-clutter st_context.h.
Clean up remaining function prototypes in st_context.h.
The st_vp_uses_current_values() helper is only used in st_context.c
so move it there.
The st_get_active_states() function is only used in st_context.c so
remove its prototype in st_context.h
Reviewed-by: Neha Bhende <bhenden@vmware.com>
As stated in Vulkan spec:
"Resetting a descriptor pool recycles all of the resources from all
of the descriptor sets allocated from the descriptor pool back to
the descriptor pool, and the descriptor sets are implicitly freed."
This fixes dEQP-VK.api.descriptor_pool.*
Fixes: 14f6275c92 "anv/descriptor_set: add reference counting for..."
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Rather than getting this from the alu instruction this allows us
some flexibility. In the following pass we instead pass the
inverse op.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This helps make find_trip_count() a little easier to follow but
will also be used by a following patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will be used to help find the trip count of loops that look
like the following:
while (a < x && i < 8) {
...
i++;
}
Where the NIR will end up looking something like this:
vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
loop {
...
vec1 1 ssa_12 = ilt ssa_225, ssa_11
vec1 1 ssa_17 = ilt ssa_226, ssa_1
vec1 1 ssa_18 = iand ssa_12, ssa_17
vec1 1 ssa_19 = inot ssa_18
if ssa_19 {
...
break
} else {
...
}
}
So in order to find the trip count we need to find the inverse of
ilt.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Here we create a helper is_supported_terminator_condition()
and use that rather than embedding all the trip count code
inside a switch.
The new helper will also be used in a following patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds support to loop analysis for loops where the induction
variable is compared to the result of min(variable, constant).
For example:
for (int i = 0; i < imin(x, 4); i++)
...
We add a new bool to the loop terminator struct in order to
differentiate terminators with this exit condition.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds partial loop unrolling support and makes use of a
guessed trip count based on array access.
The code is written so that we could use partial unrolling
more generally, but for now it's only use when we have guessed
the trip count.
We use partial unrolling for this guessed trip count because its
possible any out of bounds array access doesn't otherwise affect
the shader e.g the stores/loads to/from the array are unused. So
we insert a copy of the loop in the innermost continue branch of
the unrolled loop. Later on its possible for nir_opt_dead_cf()
to then remove the loop in some cases.
A Renderdoc capture from the Rise of the Tomb Raider benchmark,
reports the following change in an affected compute shader:
GPU duration: 350 -> 325 microseconds
shader-db results radeonsi VEGA (NIR backend):
SGPRS: 1008 -> 816 (-19.05 %)
VGPRS: 684 -> 432 (-36.84 %)
Spilled SGPRs: 539 -> 0 (-100.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 39708 -> 45812 (15.37 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 105 -> 144 (37.14 %)
Wait states: 0 -> 0 (0.00 %)
shader-db results i965 SKL:
total instructions in shared programs: 13098265 -> 13103359 (0.04%)
instructions in affected programs: 5126 -> 10220 (99.38%)
helped: 0
HURT: 21
total cycles in shared programs: 332039949 -> 331985622 (-0.02%)
cycles in affected programs: 289252 -> 234925 (-18.78%)
helped: 12
HURT: 9
vkpipeline-db results VEGA:
Totals from affected shaders:
SGPRS: 184 -> 184 (0.00 %)
VGPRS: 448 -> 448 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 26076 -> 24428 (-6.32 %) bytes
LDS: 6 -> 6 (0.00 %) blocks
Max Waves: 5 -> 5 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In order to stop continuously partially unrolling the same loop
we add the bool partially_unrolled to nir_loop, we add it here
rather than in nir_loop_info because nir_loop_info is only set
via loop analysis and is intended to be cleared before each
analysis. Also nir_loop_info is never cloned.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This detects an induction variable used as an array index to guess
the trip count of the loop. This enables us to do a partial
unroll of the loop, which can eventually result in the loop being
eliminated.
v2: check if the induction var is used to index more than a single
array and if so get the size of the smallest array.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
'dxc' hlsl-to-spirv compiler appears to emit 2 (Unknown) in the depth field,
when the image is not sampled and the value is not needed.
Previously, shaders failed with:
SPIR-V parsing FAILED:
In file ../src/compiler/spirv/spirv_to_nir.c:1412
!is_shadow
632 bytes into the SPIR-V binary
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We may bind new Z/S buffers (which come via the framebuffer CSO,
triggering IRIS_DIRTY_DEPTH_BUFFER), but with writes disabled.
The next draw may enable Z or S writes (which come via the ZSA CSO,
triggering IRIS_DIRTY_WM_DEPTH_STENCIL), which requires us to update
our pin to have the write flag.
So, update pinning if either dirty flag changes. To clarify, pass
cso_zsa to the pinning function rather than pulling the random values
out of ice->state, which unfortunately have to exist for the resolve
code since iris_depth_stencil_alpha_state only exists in iris_state.c.
This avoids the code duplication that caused me to put things in the
wrong place in the previous commit. One used to have extra flushes,
but we moved those out so now these are identical and can be easily
shared.
Commit d6dd57d43c (iris: Add missing depth cache flushes) added the
depth/stencil flushes to the wrong place. I meant to add them to the
iris_upload_dirty_render_state code that emits the packets, but I
accidentally added them to the nearly identical looking code in
iris_restore_render_saved_bos. This meant we missed the actual flushing
at draw time, but instead did pointless flushing on the first draw in a
batch where things are already flushed anyway.
This commit moves them to iris_resolve.c, next to the depth prepares,
similar to what we do for color buffers. i965 does them elsewhere, but
I'm not sure why - this seems like the most consistent place.
1. If we switch the TCS for one with a different number of output
vertices, then the TES's gl_PatchVerticesIn value will change.
We need to re-upload in this case. For now, re-emit constants
whenever the TCS/TES are swapped out.
2. If there is no TCS, then we can't grab gl_PatchVerticesIn from
the TCS info. Since it's a passthrough, we can just use the
primitive's patch count (like the TCS gl_PatchVerticesIn does).
Fixes KHR-GL45.tessellation_shader.single.max_patch_vertices and
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Now that we've added a system value uploading mechanism, we may as well
reuse the same system for default tessellation levels. This simplifies
the state upload code a bit.
Also fixes:
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_tessLevel
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This patch adds PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL which
despite its name is not a TGSI-specific capability, just lets
the state tracker know that it should generate a system value
for FACE.
This is needed if we want to run tgsi_to_nir on iris.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
That is, drop KHR from all tokens that were promoted to Vulkan 1.1.
The consistency makes ctags more useful (it now jumps directly to the
real definitions in vulkan_core.h instead of the typedefs); and it makes
the code slightly less verbose.
Save SPIR-V in tu_shader_module. Tranlation to NIR happens in
tu_shader_create, and compilation to binary code happens in
tu_shader_compile. Both will be called during pipeline creation.
Let tu_cs_begin_sub_stream imply tu_cs_reserve_space, and
tu_cs_end_sub_stream imply tu_cs_sanity_check. Callers are no
longer required to call them (but can still do if they choose to).
We will start a draw IB at the beginning of a subpass and consume it
at the end of the subpass. With tu_cs_discard_entries, we can reuse
the same tu_cs for all subpasses.
Asserting (cur < end) in tu_cs_emit catches much less programming
errors comparing to asserting (cur < reserved_end). We should never
write more commands than what we have reserved.
Assert IB is non-empty and sane in tu_cs_emit_ib.
This should be quite complete feature-wise. External fences are
still missing. We probably also want to add a simpler path to
tu_WaitForFences for when fenceCount == 1.
Add tu_pack_clear_value to correctly pack VkClearValue according to
VkFormat. It ignores the component order defined by VkFormat, and
always packs to WZYX order.
A format table is an array of tu_native_format. Table lookup is
done through array indexing.
This commit defines a single format table for core VkFormat. It is
derived from the table in the gallium driver. There might be errors
introduced in the process of the conversion.
When an extension that defines new VkFormat is supported, we need to
add a new table for the extension.
- create tile_load_ib and tile_store_ib at the beginning of each
subpass
- execute the IBs at the end of each subpass
- no DONT_CARE support
- no subpass dependency analysis and subpass merging
- no zs support
- no true VkImageView support
- assume VK_FORMAT_B8G8R8A8_UNORM
- no tiling
- no MSAA
This also removes cur_cs from tu_cmd_buffer.
When in TU_CS_MODE_SUB_STREAM, tu_cs_begin_sub_stream (or
tu_cs_end_sub_stream) should be called instead of tu_cs_begin (or
tu_cs_end). It gives the caller a TU_CS_MODE_EXTERNAL cs to emit
commands to.
Add tu_cs_mode and TU_CS_MODE_EXTERNAL. When in
TU_CS_MODE_EXTERNAL, tu_cs wraps an external buffer and can not
grow.
This also moves tu_cs* up in tu_private.h, such that other structs
can embed tu_cs_entry.
Error checking tu_cs_begin/tu_cs_end is too tedious for the callers.
Move tu_cs_add_bo and tu_cs_reserve_entry to tu_cs_reserve_space
such that tu_cs_begin/tu_cs_end never fails.
We need the current color/depth/stencil attachments and the current
render area to compute the tiling config.
We compute the tiling config at the beginning of each subpass for
the moment. We should change that when the driver can reorder/merge
subpasses.
It is very common that the render area is the entire framebuffer.
We might want to optimize for the case and compute the tiling config
in tu_framebuffer ctor.
Being the first commit that emits meaningful command packets, there
are many things included in this commit
- tu6_emit_xxx are low-level helpers that emit command packets
without boundary checks
- tu6_xxx are high-level helpers that emit command packets with
boundary checks
- cmdbuf->cs is a pointer to the current CS, so that we can use the
helpers above to emit to other CS
- use cmd as the variable name of tu_cmd_buffer
- there is a per-cmdbuf scratch bo for CP_EVENT_WRITE writeback
- there is a per-cmdbuf debug marker, using scratch reg 7 or 6
depending on whether the cmdbuf is primary or secondary
(olv, after rebase) REG_A6XX_SP_UNKNOWN_AB20 is renamed
They are used like
tu_cs_reserve_space(...);
tu_cs_emit(...);
...;
tu_cs_reserve_space_assert();
to make sure we reserved enough space at the beginning.
Build drm_msm_gem_submit_bo array directly in tu_bo_list. We might
change this again, but this is good enough for now.
There are other issues as well, such as not using
VkAllocationCallbacks and sloppy error checking. We should revisit
this in the near future. Same to tu_cs.
This adds a radv-style check_space functions + emit functions.
Also puts them in a header as a bunch of inlines, so
(1) we can use them from meta code.
(2) they are inline for performance as these are common and small.
Did not put them in tu_private.h as a bunch of inlines only
clutters up that huge headerfile.
Precise error propagation for memory allocation failures is still
todo.
This creates a new fd on each queue submit. I do not go with
DRM_IOCTL_MSM_WAIT_FENCE solely because the path is marked legacy.
Otherwise, we can use the fence id rather than requesting a fence
fd until external fences are supported and enabled.
./deqp-vk -n dEQP-VK.info.*
Writing test log into TestResults.qpa
dEQP Core unknown (0xcafebabe) starting..
target implementation = 'Surfaceless'
WARNING: tu is not a conformant vulkan implementation, testing use only.
WARNING: tu is not a conformant vulkan implementation, testing use only.
Test case 'dEQP-VK.info.build'..
Pass (Not validated)
Test case 'dEQP-VK.info.device'..
Pass (Not validated)
Test case 'dEQP-VK.info.platform'..
Pass (Not validated)
Test case 'dEQP-VK.info.memory_limits'..
Pass (Pass)
DONE!
Test run totals:
Passed: 4/4 (100.0%)
Failed: 0/4 (0.0%)
Not supported: 0/4 (0.0%)
Warnings: 0/4 (0.0%)
The Makefile.am doesn't work. I tried fixing it but gave up because
I don't understand Autotools. I strongly suspect the Android.mk also
doesn't work.
Rather than maintain the broken build files, let's delete them and
re-add working build files if-and-when we need them. (Maybe we'll be
lucky and turnip will never need to support Autotools!).
meson files have been updated, autotools and android still need
updating.
Only build tested.
v2 (chadv):
- Rebase onto master.
- Fix build breakage in Python scripts.
- Drop the WSI code. The internal WSI apis have changed recently, and
will likely change again before the driver goes upstream. To avoid
unnecessary rebase work, let's drop the WSI code and re-add it when
we're ready to really use WSI.
(olv, after rebase) do not enable freedreno by default on ARM
The nir_state_slot struct had some padding that was never initialized.
Serializing the individual parts of the struct is more robust and avoids
the overhead of zeroing it at creation, so just do that.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes leaks for each glsl_type generated:
==32470== 384 bytes in 3 blocks are possibly lost in loss record 18 of 18
==32470== at 0x483880B: malloc (vg_replace_malloc.c:309)
==32470== by 0x4C43F4A: ralloc_size (ralloc.c:119)
==32470== by 0x4C44014: rzalloc_size (ralloc.c:151)
==32470== by 0x4C44258: rzalloc_array_size (ralloc.c:215)
==32470== by 0x4D38957: glsl_type::glsl_type(glsl_struct_field const*, unsigned int, char const*) (glsl_types.cpp:114)
==32470== by 0x4D3BEED: glsl_type::get_struct_instance(glsl_struct_field const*, unsigned int, char const*) (glsl_types.cpp:1146)
==32470== by 0x4D42ECC: glsl_struct_type (nir_types.cpp:501)
==32470== by 0x4CDB5A1: vtn_handle_type (spirv_to_nir.c:1269)
==32470== by 0x4CE53DD: vtn_handle_variable_or_type_instruction (spirv_to_nir.c:4018)
==32470== by 0x4CD8CFF: vtn_foreach_instruction (spirv_to_nir.c:365)
==32470== by 0x4CE5E6B: spirv_to_nir (spirv_to_nir.c:4490)
==32470== by 0x497AF10: anv_shader_compile_to_nir (anv_pipeline.c:173)
v2: move release call to vkDestroyInstance
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes following leak:
==21853== 32 bytes in 1 blocks are definitely lost in loss record 2 of 20
==21853== at 0x483AB1A: calloc (vg_replace_malloc.c:762)
==21853== by 0x4C4DD7F: util_vma_heap_free (vma.c:221)
==21853== by 0x4C4D647: util_vma_heap_init (vma.c:46)
==21853== by 0x4957B9F: anv_CreateDescriptorPool (anv_descriptor_set.c:578)
Fixes: c520f4dec9 ("anv: Add a concept of a descriptor buffer")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Patch maintains a list of sets in the pool and destroys possible
remaining sets when pool is destroyed.
As stated in Vulkan spec:
"When a pool is destroyed, all descriptor sets allocated from
the pool are implicitly freed and become invalid."
This fixes memory leaks spotted with valgrind:
==19622== 96 bytes in 1 blocks are definitely lost in loss record 2 of 3
==19622== at 0x483880B: malloc (vg_replace_malloc.c:309)
==19622== by 0x495B67E: default_alloc_func (anv_device.c:547)
==19622== by 0x4955E05: vk_alloc (vk_alloc.h:36)
==19622== by 0x4956A8F: anv_multialloc_alloc (anv_private.h:538)
==19622== by 0x4956A8F: anv_CreateDescriptorSetLayout (anv_descriptor_set.c:217)
Fixes: 14f6275c92 ("anv/descriptor_set: add reference counting for descriptor set layouts")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This backend interacts with the new DRM driver for Midgard GPUs which is
currently in development.
When using this backend, Panfrost has roughly on-par functionality as
when using the non-DRM driver from Arm.
Alyssa Rosenzweig: To do so, we implement additional routines for
runtime GPU version detection and fencing. We cleanup some duplicate
code interfering with the new driver. We fix a long-standing memory leak
which is aggravated on the new driver. Finally, we implement BO
import/export in a way compatible with the new driver. These changes are
squashed to preserve bisectability given the hard-to-track ABI shifts in
the nondrm module
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Non-zero offset wasn't working, which breaks a bunch of
dEQP-GLES31.functional.texture.border_clamp.formats.* when doing sharded
deqp runs (because order of tests changes, resulting in different
texture state bound.. deqp doesn't really clean up it's gl state between
tests very well)
Previously, if additional textures were bound, due to using too small of
a bcolor_entry size, the last 32bytes of the bcolor_entry would be
overwritten.
Signed-off-by: Rob Clark <robdclark@gmail.com>
In standalone.h, the struct gl_context type is not declared by #includ'ing
mtypes.h:
In file included from src/gallium/drivers/panfrost/midgard/cmdline.c:24:
src/compiler/glsl/standalone.h:46:14: warning: ‘struct gl_context’ declared inside parameter list will not be visible outside of this definition or declaration
struct gl_context *ctx);
^~~~~~~~~~
This causes the following compilation failure:
src/gallium/drivers/panfrost/midgard/cmdline.c: In function ‘compile_shader’:
src/gallium/drivers/panfrost/midgard/cmdline.c:58:61: error: passing argument 4 of ‘standalone_compile_shader’ from incompatible pointer type [-Werror=incompatible-pointer-types]
prog = standalone_compile_shader(&options, 2, argv, &local_ctx);
^~~~~~~~~~
In file included from src/gallium/drivers/panfrost/midgard/cmdline.c:24:
src/compiler/glsl/standalone.h:43:28: note: expected ‘struct gl_context *’ but argument is of type ‘struct gl_context *’
struct gl_shader_program * standalone_compile_shader(
^~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: e67e072637 "panfrost: Implement Midgard shader toolchain"
Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Most hw on the native platform advertise these
caps this way.
D3DCAPS_READ_SCANLINE: We don't really have hardware
support for that, but many games don't even check the
flag, and expect GetRasterStatus to work, which is
why we emulated it with a timer (like wine). So we
may as well advertise the cap.
D3DCURSORCAPS_LOWRES: I don't know what is the status
of this on X11, but I don't know of any dx9 game
running at height < 400 either.
D3DPTEXTURECAPS_TEXREPEATNOTSCALEDBYSIZE: The cap should
correspond to what the current generation of hw is doing.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
The former is supported on Matrox cards but no other hw.
The latter isn't supported anywhere.
It is fine to not advertise them as supported,
and it could prevent apps to trigger weird rendering paths.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If D3D_ALWAYS_SOFTWARE is set for debugging purposes,
run on any DRI enabled platform.
Instead of probing for a compatible gallium driver (which might
fail if there's none) always use the KMS DRI software renderer.
Allows to run nine on i915 when D3D_ALWAYS_SOFTWARE=1.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
This broke piles of image load store tests (179 failures on CI,
mesa_master build #15546, previous build right before this landed
was green). I'd rather not leave the tree on fire over the weekend,
so let's revert for now, and we can figure out what happened next week.
No shader-db changes on any Intel platform.
v2: Use a loop to generate patterns. Suggested by Jason.
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
No shader-db changes on any Intel platform.
v2: Use a loop to generate patterns. Suggested by Jason.
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Now that nir_opt_copy_prop_vars can properly handle array derefs on
vectors, it's safe to move UBO and SSBO lowering to late in the
pipeline. This should allow NIR to actually start optimizing SSBO
access.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
It doesn't really matter where this pass goes as long as it's after we
call nir_lower_explicit_io and before we go into the back-end. Putting
it brw_postprocess_nir lets us move nir_lower_explicit_io significantly
later in the pipeline.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Which also requires uadd_carry lowering
Until recently this was lowered in glsl ir so it went unnoticed that we
weren't lowering it.
Fixes: 1d8994a63b glsl: [u/i]mulExtended optimization for GLSL
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fixes: 45271702ec freedreno: fix ir3_cmdline build
Fixes: 7530d4abfc glsl/freedreno/panfrost: pass gl_context to the standalone compiler
Signed-off-by: Rob Clark <robdclark@gmail.com>
With createImage(), the caller was expected to set a SHARED flag if they
needed the ability to get a GEM handle. DRI3, wayland, and gbm all set
it, EGL_MESA_drm_image passes it through, and surfaceless doesn't need it
because there's no way to request a handle.
With the new createImageWithModifiers() DRI method to replace it, the
expectation is that you'll always be able to share the buffer, so the flag
is unnecessary in its arguments. However, we do need to tell gallium
about this expectation.
Without this, kmscube's modifiers path using
gbm_bo_create_with_modifiers(&modifier, 1) instead of
gbm_bo_create(SCANOUT | SHARED) will call the driver's resource_create()
function wtih PIPE_BIND_SHARED unset, so the driver (particularly
renderonly drivers) may allocate in such a way that it can't return an
answer from gbm_bo_get_handle(). I used to have a hack in v3d using
count==1 && modifier==LINEAR to indicate that you wanted SHARED anyway,
but that was dropped recently.
Fixes: 59527a36e9 ("v3d: Restructure RO allocations using
resource_from_handle.")
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
This is similar to intel_miptree_map_blit and intel_buffer_object.c's
temporary blits in i965.
Improves performance of DiRT Rally by 20-25% by eliminating stalls.
Breaks piglit's spec/arb_shader_image_load_store/host-mem-barrier,
by using the GPU to do uploads, exposing a st/mesa issue where it
doesn't give us memory_barrier() calls. This is a pre-existing issue
and will be fixed by a later patch (currently out for review).
for fragment programs we already treat fog as a single component value,
but for vp we didn't.
Fixes fog related piglit tests with my out of tree Nouveau nir patches.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Everything else uses `#include "GL/internal/dri_interface.h"` instead,
and this full path was even already used in other parts of GLX.
While at it, nothing uses `inc_gl_internal` anymore so let's remove it
as well.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Since the compiler may not zero-out padding in the object.
Add a couple comments about this to prevent misunderstandings in
the future.
Fixes: 67d96816ff ("st/mesa: move, clean-up shader variant key decls/inits")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
More or less any of this issue pointed out by the compiler is
a coding error. Make sure we flag it and bail loudly.
v2: - apply the change to autotools and scons as well (Emil)
- C++ doesn't need this, it's already an error and the flag
doesn't exist (Gert)
v3: - drop scons, flags are not checked so until someone adds that
functionality we can't have this.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> # v1
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> # v1
[Emil: apply the same change to autotools and scons]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes: 45d58cd91567b39f51af "gitlab-ci: only build the default (=latest) and oldest llvm versions"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
GLVND already provides these, so distro packagers have been deleting
them all along. Let's save ourselves the trouble and not build them in
the first place.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
GLVND already provides these, so distro packagers have been deleting
them all along. Let's save ourselves the trouble and not build them in
the first place.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This fixes a rendering issue where UBO updates aren't always picked
up by drawing calls. This issue effected the Webots robotics
simulator. VMware bug 2175527.
Testing Done: Webots replay, piglit, misc Linux games
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
The draw_vgpu10() function was huge. Move the code for preparing the
vertex buffers and the index buffer into separate functions.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
And add a comment that we're implicitly converting PIPE_TRANSFER_
flags to PB_USAGE_ flags in one place. And statically assert that
the enum values match.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Use a new enum type instead of 'unsigned' to make things a bit more
understandable.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
With this patch, the svga shader type will be saved in the shader variant,
and there is no need to pass in the shader type to the define/destroy
variant functions.
Reviewed-by: Brian Paul <brianp@vmware.com>
From the ARB_enhanced_layouts specification:
"For the property TRANSFORM_FEEDBACK_BUFFER_INDEX, a single integer
identifying the index of the active transform feedback buffer
associated with an active variable is written to <params>. For
variables corresponding to the special names "gl_NextBuffer",
"gl_SkipComponents1", "gl_SkipComponents2", "gl_SkipComponents3",
and "gl_SkipComponents4", -1 is written to <params>."
We were storing the xfb_buffer value, instead of the value
corresponding to GL_TRANSFORM_FEEDBACK_BUFFER_INDEX.
Note that the implementation assumes that varyings would be sorted by
offset and buffer.
Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Instead of a custom ARB_gl_spirv xfb gather info pass.
In fact, this is not only about reusing code, but the current custom
code was not handling properly how many varyings are enumerated from
some complex types. So this change is also about fixing some corner
cases.
v2: Use util_bitcount, simplify current stage check (Kenneth)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
On OpenGL, a array of a simple type adds just one varying. So
gl_transform_feedback_varying_info struct defined at mtypes.h includes
the parameters Type (base_type) and Size (number of elements).
This commit checks this when the recursive add_var_xfb_outputs call
handles arrays, to ensure that just one is addded.
We also need to take into account AoA here
v2: use glsl_type_is_leaf from nir_types (Timothy Arceri)
v3: simplified aoa check, without the need ot using glsl_type_is_leaf,
using glsl_types_is_struct (Timothy Arceri)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Right now we are only re-sorting outputs. But it is better to sort too
varyings, as linker expect them to be sorted out (as it was done on
GLSL). For varyings, and to make easier to compute buffer_index, we
sort also by buffer. We could do the same for outputs, but we lack a
reason for that, so we left it as it is (just offset).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
In order to be used for OpenGL (right now for ARB_gl_spirv).
This commit adds two new structures:
* nir_xfb_varying_info: that identifies each individual varying. For
each one, we need to know the type, buffer and xfb_offset
* nir_xfb_buffer_info: as now for each buffer, in addition to the
stride, we need to know how many varyings are assigned to it.
For this patch, the only case where num_outputs != num_varyings is
with the case of doubles, that for dvec3/4 could require more than one
output. There are more cases though (like aoa), that will be handled
on following patches.
v2: updated after new nir general XFB support introduced for "anv: Add
support for VK_EXT_transform_feedback"
v3: compute num_varyings beforehand for allocating, instead of relying
on num_outputs as approximate value (Timothy Arceri)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Where component_offset here is the offset when accessing components of
a packed variable. Or in other words, location_frac on
nir.h. Different places of mesa use different names for it.
Technically nir_xfb_info consumer can get the same from the
component_mask, it seems somewhat forced to make it to compute it,
instead of providing it.
v2: rename local location_frac for comp_offset, more similar to the
intended use (Timothy Arceri)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This changes is actually wrong because we have to sync
before doing image layout transitions.
This fixes rendering issues in Batman, Path of Exile and
probably more titles.
This reverts commit 76c17cfd8d.
Fixes: 76c17cfd8d ("radv: execute external subpass barriers after ending subpasses")
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Found out that some base64 data matched the '---' identifier. We can
avoid this by adding the surrounding spaces.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The error state contains several kind of BOs, including the context
image which we will want to write in a later commit. Because it can
come later in the error state than the user buffers and because we
need to write it first in the aub file, we have to first build a list
of BOs and then write them in the appropriate order.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Add the missing PIPE_CAP_CONTEXT_PRIORITY_MASK and parsing of the context
construction flags.
Testcase: piglit/egl-context-priority
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'll want to use this for transfer maps, which already do their own
flushing. This lets us avoid a double flush, and also gives us more
control over the batch which is selected.
We were using batch->contains_draw as a proxy for "are we even using
this engine?" That isn't quite right, because it only counts regular
draws. BLORP operations may have also rendered to a resource, which
needs to trigger flushing. To check for this, we also see if the
render and sometimes depth caches are non-empty.
We can also drop the "but there might already be stale data in the
cache even if we haven't emitted any commands yet" concern in the
comments. The kernel flushes caches between batches.
This may not be great but it's at least better than what was there.
By mistake, this was previously set for all shaders.
It is a vertex shader property so only makes sense to
set it for vertex shaders.
Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-By: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Unlike most of the cases in which we do this by hand, the new helper
properly handles non-32-bit pointers.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
There's no guarantee when build_deref_follower is called that the two
derefs have the same bit size destination. Insert a cast on the array
index in case we have differing bit sizes. While we're here, insert
some asserts in build_deref_array and build_deref_ptr_as_array. The
validator will catch violations here but they're easier to debug if we
catch them while building.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Because we already know the immediate right-hand parameter, we can
potentially save the optimizer a bit of work.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes nearly all of the remaining
dEQP-GLES31.functional.texture.border_clamp.formats.* fails
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
We need a version of fd6_tex_swiz() that just returns the composed
swizzle without building part of the TEX_CONST_0 state. So just
refactor the existing function to build more of the TEX_CONST_0 state,
and leave fd6_tex_swiz() simply composing swizzles.
The small IBO state change (to use LINEAR for smaller sizes/levels) is
to match the state in fd6_tex_const_0(). It seems like maybe tiled
actually works at the smaller sizes but not if minification is in play,
so best just to make images match what we do for textures.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
This cap is mainly for working around a r600 texture swizzle issue,
but it also controls whether ARB_texture_buffer_object (with legacy
formats) is enabled. I suspect the missing I/L/A/LA faking is why
I had it set in the first place.
Thanks to Ilia for pointing out that I shouldn't be setting this.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For texturing, we map alpha formats to the corresponding red format,
as many alpha formats are outright missing, and red is more efficient
when sampling anyway.
When rendering to A8_UNORM, we use that format directly, so the image
gets the shader output's .a/.w channel, rather than the .r/.x channel.
All other A* formats are non-renderable, so we can't do much and just
mark them as unsupported for rendering. Fortunately, GL only requires
rendering to A8_UNORM, so that works out.
According to Andre Heider and Timur Kristóf, this fixes font rendering
in Witcher 1 (via nine). Andre also reported that it fixes Unigine
Heaven (presumably via nine).
v2: Use the same swizzle for both sampler views and "render targets".
BLORP expects the read swizzle, and will take the inverse when
setting up the destination swizzle (and actually applying it in
the shaders). We ignore the format swizzle when setting up normal
rendering SURFACE_STATEs, which is necessary because it would be
an illegal shader channel select combination. Thanks to Jason
Ekstrand for pointing out that BLORP took an inverse swizzle.
Tested-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Gallium might call us multiple times to bind subsets of the samplers,
at which point we'd recreate the table a bunch of times. It doesn't
really buy us anything to do it here - even if we defer to draw time,
the dirty tracking ensures we'll only do it on the first draw after a
bind_sampler_states() call.
We now use the number of samplers specified by the shader instead of
the binding count. If this number changes, we flag sampler state as
dirty so we re-upload a table with the right number of entries.
This also fixes a bug where ice->state.need_border_colors was never
unset, so once something needed border colors, the pool would always
be pinned in all future batches.
v2: Explicitly flag sampler states as dirty, rather than assuming that
bind_sampler_states() will be called if the program texture count
changes. While this may be true for st/mesa, it isn't the case for
Gallium HUD.
Tested-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is necessary for legacy texture buffer object formats, where we'll
need to use a swizzle to fake e.g. luminance.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This variable is now unused, so let's remove it.
Fixes: 9c4930946a (virgl: add encoder functions for new protocol)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
This variable is now unused, so let's remove it.
Fixes: db77573d7b (virgl: modify how we handle GL_MAP_FLUSH_EXPLICIT_BIT)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
This variable is now unused, so let's remove it.
Fixes: c19aedcf1a (virgl: don't mark unclean after a flush)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
These variables are now unused, let's remove them to get rif of a few
warnings.
Fixes: f0e71b1088 (virgl: use transfer queue)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
We allocate GGTT entries and physical addresses are we create engines
rather than having a fixed layout.
Context images now receive a parameter argument which is used to setup
pml4 & ring buffer addresses.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We'll make them more parameterized in a later commit.
As this is just a transitional commit, we allow ourself to leak the
context images allocated in get_context_init(). We'll fix this in the
next commit.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Prepare aub write to deal with multiple engine instances. We don't
pass the instance number yet this could be done in the future by
having a 2 dimensional array of struct engine.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>
In the future we'll want error2aub to reuse the context image saved by
i915 instead of the default one we write in intel_dump_gpu.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
IGT has a test to hang the GPU that works by having a batch buffer
jump back into itself, trigger an infinite loop on the command stream.
As our implementation of the decoding is "perfectly" mimicking the
hardware, our decoder also "hangs". This change limits the number of
batch buffer we'll decode before we bail to 100.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
An MI_BATCH_BUFFER_START in the ring buffer acts as a second level
batchbuffer (aka jump back to ring buffer when running into a
MI_BATCH_BUFFER_END).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The Gallium Nine state tracker now works on iris.
Also tested with GALLIUM_HUD and Star Wars: Knights of the Old
Republic on WINE (GL_ATI_fragment_shader).
Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes a leak:
==7576== 320 (48 direct, 272 indirect) bytes in 1 blocks are definitely lost in loss record 26 of 26
==7576== at 0x4C2EE3B: malloc (vg_replace_malloc.c:309)
==7576== by 0x53EF0E4: ralloc_size (ralloc.c:119)
==7576== by 0x53EF0C2: ralloc_context (ralloc.c:113)
==7576== by 0x5471F64: nir_split_per_member_structs (nir_split_per_member_structs.c:176)
==7576== by 0x51288CF: anv_shader_compile_to_nir (anv_pipeline.c:216)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes leaks from anv_device_upload_nir:
==7345== 8,192 bytes in 2 blocks are definitely lost in loss record 24 of 24
==7345== at 0x4C2ED78: malloc (vg_replace_malloc.c:308)
==7345== by 0x4C31393: realloc (vg_replace_malloc.c:836)
==7345== by 0x54E0848: grow_to_fit (blob.c:67)
==7345== by 0x54E0BE5: blob_reserve_bytes (blob.c:166)
==7345== by 0x54E0C7C: blob_reserve_intptr (blob.c:186)
==7345== by 0x54704A7: nir_serialize (nir_serialize.c:1091)
==7345== by 0x512F97D: anv_device_upload_nir (anv_pipeline_cache.c:756)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Use anv_gem_munmap for unmap when softpin in use, this corresponds to
anv_gem_mmap used in anv_block_pool_expand_range. This fixes valgrind
errors seen for each pool when softpin is in use:
==25581== 262,144 bytes in 1 blocks are definitely lost in loss record 31 of 31
==25581== at 0x50E77E8: anv_gem_mmap (anv_gem.c:96)
==25581== by 0x50EEE2B: anv_block_pool_expand_range (anv_allocator.c:543)
==25581== by 0x50EEB51: anv_block_pool_init (anv_allocator.c:477)
==25581== by 0x50EF7EF: anv_state_pool_init (anv_allocator.c:920)
==25581== by 0x510B8EB: anv_CreateDevice (anv_device.c:2031)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The NIR and TGSI paths are currently intertwined which makes it
not only hard to follow but also makes it hard to take advantage
of the differences in IR.
Here we take the first step to splitting that path apart. With
this we take the opportunity to no longer call the GLSL IR
optimisation passes after the final lowering calls for NIR. We
can instead just use the NIR passes which can produce better code
and should also result in faster compile times.
The speed-up can be measured in some dolphin uber shaders due to
no longer calling lower_if_to_cond_assign() for example
dolphin/ubershaders/120.shader_test goes from ~1.63 -> ~1.53
seconds on my machine.
There are some code changes as a result of not calling
lower_if_to_cond_assign(), this is because it flattens ifs that
contain UBOs where as NIR's peephole select doesn't. This is
were most of the regressions in Max Waves happens with shader-db.
shader-db results (VEGA):
Totals from affected shaders:
SGPRS: 2349056 -> 2349640 (0.02 %)
VGPRS: 1322160 -> 1323300 (0.09 %)
Spilled SGPRs: 21190 -> 21527 (1.59 %)
Spilled VGPRs: 99 -> 99 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 72 -> 72 (0.00 %) dwords per thread
Code Size: 57260904 -> 57270932 (0.02 %) bytes
Compile Time: 1107186 -> 1022942 (-7.61 %) milliseconds
LDS: 786 -> 786 (0.00 %) blocks
Max Waves: 391932 -> 391619 (-0.08 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Eric Anholt <eric@anholt.net>
glsl_to_nir() is still missing support for converting certain
functions to NIR, so for those we use the GLSL IR optimisations
to remove the functions.
Reviewed-by: Eric Anholt <eric@anholt.net>
If Vertex Shader uses EdgeFlag the hardware request that it is setup
as the last VERTEX_ELEMENT_STATE. If SGVS are add at draw time we
need to also reconfigure the last 3DSTATE_VF_INSTANCING so its
VertexElementIndex points to the new Vertex Element that contains
the EdgeFlag.
So if draw parameters or edgeflag are not used the CSO generated at
iris_create_vertex_element is sent directly in the batches. But if
edge flag is used we adjust last VERTEX_ELEMENT_STATE and
last 3DSTATE_VF_INSTANCING using their alternative edge flag version
we generate at iris_create_vertex_element and store at the CSO.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
You usually want to go find the highest pressure and figure out why you
couldn't spill or what pattern led to a bunch of pressure leading to that
point.
Now that we have a loop unrolling cost function and loop unrolling isn't
going to kill us the moment we have a 64-bit op in a loop, we can go
ahead and move 64-bit lowering later. This gives us the opportunity to
do more optimizations and actually let the full optimizer run even on
64-bit ops rather than hoping one round of opt_algebraic will fix
everything. This substantially reduces both fp64 shader compile times
and the resulting code size.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we have a loop unrolling cost function and loop unrolling isn't
going to kill us the moment we have a 64-bit op in a loop, we can go
ahead and move 64-bit lowering later. This gives us the opportunity to
do more optimizations and actually let the full optimizer run even on
64-bit ops rather than hoping one round of opt_algebraic will fix
everything. This substantially reduces both fp64 shader compile times
and the resulting code size. On the vs-isnan-dvec test from piglit:
Before this commit:
1684.63s user 17.29s system 99% cpu 28:28.24 total
101479 instructions. 0 loops. 802452 cycles. 79:369 spills:fills.
Peak memory usage (according to massif): 1.435 GB
After this commit:
179.64s user 7.75s system 99% cpu 3:07.92 total
57316 instructions. 0 loops. 459287 cycles. 0:0 spills:fills.
Peak memory usage (according to massif): 531.0 MB
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of trusting the caller to already have created a softfp64
function shader and added all its functions to our shader, we simply
take the softfp64 shader as an argument and do the function inlining
ouselves. This means that there's no more nasty functions lying around
that the caller needs to worry about cleaning up.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This pulls the guts of function inlining into a builder helper so that
it can be used elsewhere. The rest of the infrastructure is still
needed for most inlining cases to ensure that everything gets inlined
and only ever once. However, there are use-cases where you just want to
inline one little thing. This new helper also has a neat trick where it
can seamlessly inline a function from one nir_shader into another.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This doesn't really change anything as the functions will all get
inlined anyway. However it does let us do a bit of the work earlier and
in a common place.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Even though this is technically a step in the function inlining process
as laid out in nir_inline_functions.c, it's not really needed. We
already have constant initializers lowered here and no new ones are
added by appending the softfp64 functions.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of looking the devinfo directly, look at the lowering options we
provided to NIR. This is more accurate as it's now checking for "do we
need full software lowering" rather than a hardware bit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The lowering we do for 64-bit instructions can cause a single NIR ALU
instruction to blow up into hundreds or thousands of instructions
potentially with control flow. If loop unrolling isn't aware of this,
it can unroll a loop 20 times which contains a nir_op_fsqrt which we
then lower to a full software implementation based on integer math.
Those 20 invocations suddenly get a lot more expensive than NIR loop
unrolling currently expects. By giving it an approximate estimate
function, we can prevent loop unrolling from going to town when it
shouldn't.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We already have one internally for int64 but we don't have a similar one
for doubles so we'll have to make one.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In the old code, we would generate the exact same instruction for
extract_u8(some_u64, 0) and extract_u8(some_u64, 1). The mask-a-word
trick only works for even numbered bytes.
This fixes the (new) piglit test
tests/spec/arb_gpu_shader_int64/execution/fs-ushr-and-mask.shader_test.
v2: Use a SHR instead of an AND. This saves an instruction compared to
using two moves. Suggested by Jason.
Fixes: 6ac2d16901 ("i965/fs: Fix extract_i8/u8 to a 64-bit destination")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The parameter is never used, and it's not part of a common interface
idiom. Remove it.
src/intel/compiler/brw_interpolation_map.c: In function ‘brw_setup_vue_interpolation’:
src/intel/compiler/brw_interpolation_map.c:62:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
const struct gen_device_info *devinfo)
^~~~~~~
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In 61e009d2c4 we changed the number of components in the
vulkan_resource_index intrinsic and forgot the update Radv's code for
it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 61e009d2c4 ("spirv: Use the same types for resource indices as pointers")
Reviewed-by: Samuel Pitoiset samuel.pitoiset@gmail.com
Not complete, mostly just adding things as I encounter them in CTS. But
not getting far enough yet to hit most of the OpenCL.std instructions.
Anyway, this is better than nothing and covers the most common builtins.
v2: add hadd proof from Jason
move some of the lowering into opt_algebraic and create new nir opcodes
simplify nextafter lowering
fix normalize lowering for inf
rework upsample to use nir_pack_bits
add missing files to build systems
v3: split lines of iadd/sub_sat expressions
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: add assert in else clause
make local group intrinsics 32 bit wide
v3: always use 32 bit constant for local_size
v4: add comment by Jason
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
we define it inside 'include/c99_math.h' so it is safe to use.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The idea is that for repeated use of the same uniform, we could avoid
loading it on each consumer. The results look pretty good.
total instructions in shared programs: 6413571 -> 6521464 (1.68%)
total threads in shared programs: 154214 -> 154000 (-0.14%)
total uniforms in shared programs: 2393604 -> 2119629 (-11.45%)
total spills in shared programs: 4960 -> 4984 (0.48%)
total fills in shared programs: 6350 -> 6418 (1.07%)
Once we do scheduling at the NIR level, the register pressure (and thus
also instructions) issues we see here will drop back down.
This feels like the right tradeoff for threads vs uniforms, particularly
given that we often have very short thread segments right now:
total instructions in shared programs: 6411504 -> 6413571 (0.03%)
total threads in shared programs: 153946 -> 154214 (0.17%)
total uniforms in shared programs: 2387665 -> 2393604 (0.25%)
On each iteration of successfully spilling a reg, we'd allocate another
copy of temp_registers, and when decrementing thread conut we'd allocate
another copy of the graph. These all got cleaned up on freeing the
compile.
It is already obvious whether the job is building a container or running
a mesa build, so let's drop that prefix so that we can see more
information on the screen (eg. in the jobs list on a pipeline page).
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, only the driver_location was set for all variables,
but constants need to use the location field instead. This change
is necessary because the nine state tracker can produce non-packed
constants whose location needs to be explicitly set.
Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This patch extracts the interpolation mode translation
into a separate function called ttn_translate_interp_mode,
adds support for TGSI_INTERPOLATE_COLOR which was missing,
and also sets the proper interpolation mode to output
variables, which were not set previously.
Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: fix is_shadow, is_array and txq
Some drivers (eg. iris) need the presence of sampler variables and derefs
so that they can count them to determine the number of samplers used.
This change also makes the output NIR closer to what glsl_to_nir outputs.
Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, FACE was hard-coded as a sysval, but TTN emulated
it incorrectly. Also, POSITION was not supported when it was
a sysval. This patch fixes these by allowing both of them to
be sysvals or inputs, based on driver capabilities. It also
fixes the TGSI FACE emulation based on the TGSI spec.
Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
With this patch, tgsi_to_nir will output NIR that is tailored to
the given pipe, by reading its capabilities and adjusting the NIR code
to those capabilities similarly to how glsl_to_nir works.
It also adds an optimization loop that brings the output NIR in line
with what glsl_to_nir outputs. This is necessary for the same reason
why glsl_to_nir has its own optimization loop: currently not every
driver does these optimizations yet.
For uses which cannot pass a pipe_screen we also keep a variant
called tgsi_to_nir_noscreen which keeps the old behavior.
Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Acked-By: Eric Anholt <eric@anholt.net>
This patch makes it possible for freedreno to pass a pipe_screen
to tgsi_to_nir. This will be needed when tgsi_to_nir supports reading
pipe capabilities.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Note that locations can be set in different units, and the multiplier
argument caters to supporting these different units. For example,
st_glsl_to_nir uses dwords (4 bytes) so the multiplier should be 4,
while tgsi_to_nir uses bytes, so the multiplier should be 16.
Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The nir_lower_uniforms_to_ubo function is useful outside of
mesa/state_tracker, and in fact is needed to produce NIR for
drivers that have the PIPE_CAP_PACKED_UNIFORMS capability.
Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, tgsi_to_nir was a single big function, and this patch
intends to make the code easier to understand by splitting it up
to multiple smaller pieces.
Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-By: Tested-by: Rob Clark <robdclark@gmail.com>
TGSI spec says LIT needs a "greater than" comparison. NIR doesn't have that,
so let's use "less than" and swap the arguments. Previously "greater than or equal"
was used by tgsi_to_nir which is incorrect.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to the TGSI spec, ARR needs to do a rounding and then
a float-to-integer conversion which was missing. This patch also
makes the rounding a bit more efficient by using nir_fround_even
instead of the previous nir_ffloor+nir_fadd trick.
Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This patch adds a shader_info field that tells the driver to use window
space coordinates for a given vertex shader. It also enables this feature
in radeonsi (the only NIR-capable driver that supported it in TGSI),
and makes tgsi_to_nir aware of it.
Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This lets us emit the VPM_WRITEs directly from
nir_intrinsic_store_output() (useful once NIR scheduling is in place so
that we can reduce register pressure), and lets future NIR scheduling
schedule the math to generate them. Even in the meantime, it looks like
this lets NIR DCE some more code and make better decisions.
total instructions in shared programs: 6429246 -> 6412976 (-0.25%)
total threads in shared programs: 153924 -> 153934 (<.01%)
total loops in shared programs: 486 -> 483 (-0.62%)
total uniforms in shared programs: 2385436 -> 2388195 (0.12%)
Acked-by: Ian Romanick <ian.d.romanick@intel.com> (nir)
We were printing only when the channel was exactly the start channel, so
scalarized loads/stores would be missing the name on the rest.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We need more space than just a 32-bit scalar and we have to burn all
that space anyway so we may as well expose it to the driver. This also
fixes a subtle bug when UBOs and SSBOs have different pointer types.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
With the new deref changes, the old pointer_offset version may not be
the right one to call. Just call the generic one and let it sort it
out.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We can't pull it from the variable type because it might be an array of
blocks and not just the one block. While we're here, throw in some
error checking.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
This buffer goes along side the CPU data structure and may contain
pointers, bindless handles, or any other descriptor information.
Currently, all descriptors are size zero and nothing goes in the buffer
but this commit sets up the framework we will need later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Pull the common code out of the two entrypoints into the helper which
fetches the push descriptor set for us. Now that it does more than just
get a thing, call it anv_cmd_buffer_push_descriptor_set.
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The descriptor set layout code in our driver has undergone many changes
over the years. Some of the fields which were once essential are now
useless or nearly so. The has_dynamic_offsets field was completely
unused accept for the code to set and hash it. The per-stage indices
were only being used to determine if a particular binding had images,
samplers, etc. The fact that it's per-stage also doesn't matter because
that binding should never be accessed by a shader of the wrong stage.
This commit deletes a pile of cruft and replaces it all with a
descriptive bitfield which states what a particular descriptor contains.
This merely describes the data available and doesn't necessarily dictate
how it will be lowered in anv_nir_apply_pipeline_layout.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is what we're actually storing in the descriptor set and consuming
when we bind surface states. This commit renames image_count to
image_param_count a few places and moves the decision to not count image
params on gen9+ into anv_descriptor_set.c when we build the layout.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Make them all take a device followed by a set. This is consistent
with how the actual Vulkan entrypoint parameters are laid out.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This commit just puts the free list code together as part of the pool
instead of having it inlined into the descriptor set create code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In our backend, the successor edges from the blocks only point to where
QPU control flow goes, not where the notional control flow goes from a
"break" or "continue" modifying the execution mask to resume writing to
some channels later. As a result, this attempt at restricting live ranges
ended up missing the live range of a value where a conditional
break/continue was present in a loop before the later def of a variable.
The previous commit ended up fixing the problem that the flag tried to
solve.
Fixes glsl-vs-loop-continue.shader_test and/or
glsl-vs-loop-redundant-condition.shader_test based on register allocation
results.
In the backend, we often have condition codes on writes to variables, such
that there's no screening def anywhere and the previous live ranges
algorithm would conclude that the start of the range extends to the start
of the program. However, we do know that the live range can only extend
as early as you can reach from all blocks writing to the variable.
The motivation was that, while we have a couple of hacks to try to promote
conditional writes up to being a def within the block, the exec_mask one
was broken and needed a replacement.
Based on c3c1aa5aeb ("intel/fs: Restrict live intervals to the subset
possibly reachable from any definition.").
Ubuntu Bionic is shipping ninja 1.8.2. Therefore, we do not need to
download v1.6.0 manually any more.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This fixes some CTS crashes with:
dEQP-VK.renderpass2.suballocation.attachment_write_mask.attachment_count_8.start_index_*
Ideally, we should check cmd_buffer->cs->max_dw because there is
likely enough space (the internal clear draws allocate space), but
keep that way for consistency.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This function was never used, and isn't properly guarded by HAVE_LIBDRM,
breaking the build on systems that don't have libdrm.
Let's just remove it.
Fixes: 7552fcb7b9 "egl: add base EGL_EXT_device_base implementation"
Reported-by: Timo Aaltonen <tjaalton@debian.org>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
We already propagate coord_components correctly and did not have
layer restrictions for ycbcr formats.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This does not seem to fix anything ATM but is the right thing todo.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Fixes: f3e91e78a3 ("anv: add nir lowering pass for ycbcr textures")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Now i965 supports mesa_glthread=true like Gallium drivers do.
According to Markus (degasus), the Citra emulator now runs ~30% faster.
Emmanuel (linkmauve) also reported that the Dolphin emulator improved
by 2.8x on one game. (Both of those still need to be added to drirc.)
An Intel Mesa CI run with mesa_glthread=true appears to be happy.
Bioshock Infinite's benchmark mode seems to be around 15-20% faster
on my Skylake GT4 at 1920x1080.
Tested-by: Markus Wick <markus@selfnet.de>
Tested-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We zero out the prog data anyway and, now that bias is always zero, this
function is accomplishing nothing.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This commit moves our handling of gl_NumWorkgroups over to work like our
handling of other special bindings in the Vulkan driver. We give it a
magic descriptor set number and teach emit_binding_tables to handle it.
This is better than the bias mechanism we were using because it allows
us to do proper accounting through the bind map mechanism.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
When we have a larger sampler index, we get into the "high sampler"
scenario and need an instruction header. Even in SIMD8, this pushes the
instruction over the sampler message size maximum of 11 registers.
Instead, we have to lower TXD to TXL.
Fixes: cb98e0755f "intel/fs: Support min_lod parameters on texture..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
No idea how this fell through the cracks besides the fact that the
sampler bound at 0 almost always works and the CTS isn't amazing. In
any case, this appears to have been broken for almost forever.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Use new nir opcode nir_[i/u]mul_2x32_64 and extract lower and higher 32
bits as needed instead of emitting mul and mul_high.
v2: Surround the switch case with curly braces (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Optimize mulExtended to use 32x32->64 multiplication.
Drivers which are not based on NIR, they can set the
MUL64_TO_MUL_AND_MUL_HIGH lowering flag in order to have same old
behavior.
v2: Add missing condition check (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <Matt Turner <mattst88@gmail.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We
can reduce our 64x64 int multiplication from 4 instructions to 3.
Also instead of emitting two mul instructions, we can emit single mul
instuction and extract low/high 32 bits from 64 bit result for
[i/u]mulExtended
v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand)
2) Add lower_mul_2x32_64 flag (Matt Turner)
3) Remove associative property as bit size is different (Connor
Abbott)
v3: Fix indentation and variable naming convention (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fix anv_extrypoints.{c,h} and anv_extensions.{c,h} missing dependencies
Rename the variable labels according to targets and python scripts
Align the building rules as per Automake for simplification
Fixes building errors during rebuils due to missing dependencies
(v2) Fixed a missing $(VULKAN_API_XML) reference
Fixes: 9a508b7 ("android: anv/extensions: fix generated sources build")
Fixes: dd088d4bec ("anv/extensions: Generate a header file with extension tables")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
MinGW release build complains about a possible out-of-bounds
array access. Test i < 4 to silence it.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
MinGW release builds warns about use of a possbily uninitialized
variable here.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
virtio-gpu fallbacks to software rendering when 3D features
are unavailable since 6c5ab, and kms_swrast is more
feature complete than swrast.
v2: Add comment (Emil)
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Now that the buffer object usage history tracks if it is
being used as vertex buffer object, we can restrict setting
the ST_NEW_VERTEX_ARRAYS bit to dirty on glBufferData calls to
buffers that are potentially used as vertex buffer object.
Also put a note that the same could be done for index arrays
used in indexed draws.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
We already track the usage history for buffer objects
in a lot of aspects. Add GL_ARRAY_BUFFER and
GL_ELEMENT_ARRAY_BUFFER to gl_buffer_object::UsageHistory.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
If a selected unit causes a data hazard, the whole block gets cut short.
So, we preview for data hazards _while_ selecting units.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com
smul comes first in the pipeline, before vmul. Until we have a full
instruction scheduler, it's better to have vmul prioritized to maximize
bundle size.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com
This special-case was needlessly added and breaks purely offscreen
rendering (when there is no scanout involved)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com
Previously, we forced a #0 inline constant tacked on for the lut
instructions to mirror the blob's behaviour, which caused some
suboptimal codegen due to our constant inlining implementation. Instead,
just don't force a constant at all.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com
The operations of gallium->clear() and the hardware callbacks are
fundamentally independent. This routine decouples them by routing shared
information via panfrost_job, allowing the hardware half to be deferred
to the fragment job generation.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
At the moment, Panfrost state is ad hoc, which creates issues for FBOs.
This commit imports the skeleton of the v3d_job structure as
panfrost_job, in preparation for refactors to organize this state.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
This is purely for conformance, since it's not actually possible to do
XFB on TCS output varyings. However we do have to make sure we record
the names correctly, and this removes an extra level of array-ness from
the names in question.
Fixes KHR-GL45.tessellation_shader.single.xfb_captures_data_from_correct_stage
v2: Add comment to the new program_resource_visitor::process function.
(Ilia Mirkin)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108457
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Avoids regression on:
KHR-GLES*.core.tessellation_shader.single.xfb_captures_data_from_correct_stage
that is uncovered by the following patch.
"glsl: fix recording of variables for XFB in TCS shaders"
v2: Rebased over glsl: fix recording of variables for XFB in TCS shaders
v3: Move this patch before "glsl: fix recording of variables for XFB in TCS
shaders" to avoid temporal regressions. (Illia Mirkin)
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
EXT_texture_query_lod provides the same functionality for GLES like
the ARB extension with the same name for GL.
v2: Set ES 3.0 as minimum GLES version as required by the extension
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Not a perfect solution, and the "pressure" target is hard-coded. But it
doesn't really seem to much in the common case, and avoids exploding
register usage in dEQP ssbo tests.
So this should serve as a stop-gap solution until I have time to re-
write the scheduler.
Hurts slightly in instruction count, but gains (reduces) slightly the
register usage in shader-db. Fixes ~150 dEQP-GLES31.functional.ssbo.*
that were failing due to RA fail.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This might enough for iris and possible r600 (when it gets NIR)
This appears to work for iris.
v2:
* change cap return so DOUBLES == 2 means sw emu
v3:
* Refactor using int64/doubles lowering options which were added
into nir options
* Remove DOUBLES == 2 added in v2
[jordan: Remove "2" value on PIPE_CAP_DOUBLES]
[jordan: Use lowering options added to nir options]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Instead of calculating the int64 and doubles lowering options each
time a shader is preprocessed, save and use the values in
nir_shader_compiler_options.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will allow the options to be visible under nir_shader->options,
which will allow the gallium state_tracker to use the driver preferred
settings during glsl_to_nir.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This ran afoul of Iris's use of nir_lower_clamp_color_outputs which
applies fsat() before writes to vertex shader color outpus.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 7725d60938 ("intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))")
As requested by Ken ;)
v2: Also decode simple batches (Caio)
Fix u_vector usage issues (Lionel)
v3: Make binding/instruction/state/surface available (Lionel)
v4: Going through device pools for simple batches (Lionel)
Centralize search BO callbacks into anv_device.c (Lionel)
v5: Clear decoded batch buffer var after use (Caio)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
v3d may be built as part of a set of drivers in a system not requiring
NEON, but we know V3D devices will be paired with CPUs with NEON so we
should be able to use this asm.
Fixes: 0c05198d6b ("v3d: Always enable the NEON utile load/store code.")
I noticed this while looking at a shader that was affected by Tim's
"more loop unrolling" series.
In review, Tim Arceri asked:
> Why the hurt on Gen6+ is this something that should be in the late
> optimisations pass?
As far as I can tell, it's just because our scheduler is terrible. In
all the fragment shaders that I looked at (some hurt shaders were from
other stages), only one of the SIMD8 or SIMD16 version would be hurt.
In many of those case, the other SIMD width is improved (e.g.,
shaders/closed/steam/brutal-legend/3990.shader_test).
Often it looks like the scheduler decides to differently schedule a SEND
the occurs somewhere early in the shader. Once that happens, everything
is different.
I looked at one vertex shader that was hurt (from Goat Simulator). In
that case, both the floor and fract are used. The optimization
eliminates the add, and it should allow better scheduling. In the area
of the FRC and RNDD instructions, the scheduler does the right thing.
However, later in the shader a MAD and and ADD get scheduled
differently, and that makes it slightly worse.
In light of this, I tried adding some "is_used_once" mark-up, and that
did not fix all the cycles regressions. It also did a lot more harm
than good on SKL (helped 82 vs. hurt 241).
All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15437001 -> 15435259 (-0.01%)
instructions in affected programs: 213651 -> 211909 (-0.82%)
helped: 988
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59%
95% mean confidence interval for instructions value: -1.89 -1.63
95% mean confidence interval for instructions %-change: -1.23% -1.05%
Instructions are helped.
total cycles in shared programs: 383007378 -> 382997063 (<.01%)
cycles in affected programs: 1650825 -> 1640510 (-0.62%)
helped: 679
HURT: 302
helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14
helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98%
HURT stats (abs) min: 1 max: 250 x̄: 18.43 x̃: 7
HURT stats (rel) min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53%
95% mean confidence interval for cycles value: -13.05 -7.98
95% mean confidence interval for cycles %-change: -0.86% -0.50%
Cycles are helped.
Iron Lake and GM45 had similar results. (GM45 shown)
total instructions in shared programs: 5043616 -> 5043010 (-0.01%)
instructions in affected programs: 119691 -> 119085 (-0.51%)
helped: 432
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1
helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39%
95% mean confidence interval for instructions value: -1.58 -1.23
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.
total cycles in shared programs: 128139812 -> 128135762 (<.01%)
cycles in affected programs: 3829724 -> 3825674 (-0.11%)
helped: 602
HURT: 0
helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6
helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10%
95% mean confidence interval for cycles value: -8.40 -5.05
95% mean confidence interval for cycles %-change: -0.22% -0.16%
Cycles are helped.
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
I have not investigated the result of doing this during code
generation. That should be possible, but it would be a bit more
effort.
All Gen6+ platforms had nearly identical results. (Skylake shown)
total cycles in shared programs: 370961508 -> 370961367 (<.01%)
cycles in affected programs: 5174 -> 5033 (-2.73%)
helped: 2
HURT: 0
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8206587 -> 8206589 (<.01%)
instructions in affected programs: 1325 -> 1327 (0.15%)
helped: 0
HURT: 2
total cycles in shared programs: 187657422 -> 187657428 (<.01%)
cycles in affected programs: 11566 -> 11572 (0.05%)
helped: 0
HURT: 2
This change has almost no effect right now. However, removing this
patch (but leaving the patch "intel/fs: Generate if instructions with
inverted conditions") after adding a patch that removes !(a < b) -> (a
>= b) optimizations (like
https://patchwork.freedesktop.org/patch/264787/) has the following
results on Skylake:
Skylake
total instructions in shared programs: 15071804 -> 15071806 (<.01%)
instructions in affected programs: 640 -> 642 (0.31%)
helped: 0
HURT: 2
total cycles in shared programs: 369914348 -> 369916569 (<.01%)
cycles in affected programs: 27900 -> 30121 (7.96%)
helped: 4
HURT: 15
helped stats (abs) min: 2 max: 112 x̄: 30.00 x̃: 3
helped stats (rel) min: 0.28% max: 12.28% x̄: 3.34% x̃: 0.40%
HURT stats (abs) min: 2 max: 758 x̄: 156.07 x̃: 81
HURT stats (rel) min: 0.20% max: 74.30% x̄: 16.29% x̃: 16.91%
95% mean confidence interval for cycles value: 12.68 221.11
95% mean confidence interval for cycles %-change: 3.09% 21.23%
Cycles are HURT.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Other places will need to do this soon to properly handle source
swizzles. The patch looks a little odd, but the change is pretty
straight forward. All of the swizzle and mask handling is moved out,
but the code for handling move instructions and vecN instructions
remains in nir_emit_alu.
I'm not terribly pleased with the "need_dest" parameter, but
get_nir_dest is (somewhat surprisingly) destructive. I am open to
suggestions of alternatives.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To allow cmod propagation from a MOV in a sequence like:
and(16) g31<1>UD g20<8,8,1>UD g22<8,8,1>UD
mov.nz.f0(16) null<1>F g31<8,8,1>D
A similar change to the vec4 backend had no effect.
Somewhere between c1ec582059 and 40fc4b5acd (1,094 commits) the
effectiveness of this patch diminished, and as of commit d7e0d47b9d
(nir: Add a bunch of b2[if] optimizations) this optimization no longer
has any effect on any platform.
A later patch "intel/fs: Use De Morgan's laws to avoid logical-not of a
logic result on Gen8+," generates some instruction sequences that
require this change in order for cmod propagation to make progress.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts the following commits:
71a76a47cc "swr/codegen: fix autotools build"
7763e664ce "meson/swr: replace hard-coded path with current_build_dir()"
773b3ceaca "swr/rast: Fix autotools and scons codegen"
16e10b8c30 "swr/rast: Add general SWTag statistics"
b45a15a39f "swr/rast: Add string handling to AR event framework"
8608a747aa "swr/rast: Add initial SWTag proto definitions"
93cd9905c8 "swr/rast: Cleanup and generalize gen_archrast"
The last one in this list broke all the build systems that can build
this (meson, autotools & scons).
See MR !304 for more details:
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/304
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
UBWC requires space for a metadata or flag buffer
that contains compression data. Each 16x4 tile of image
data corresponds to a byte of compression data.
This buffer needs to be stored before (at a lower address)
the image buffer in order to match up with what the
display driver. This allows the display driver to directly
scan-out at UBWC buffer.
Universal bandwidth compression(UBWC) reduces memory bandwidth
by compressing buffers. This compression takes the form of
a full sized image buffer as well as a smaller metadata buffer.
I'm guessing a previous version of this script used an index-based map
of entrypoints, but that's not the case anymore.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Differently than the direct case, the indirect array derefs of vector
are handled like regular derefs, with the exception that we ignore any
vector entry that has SSA values when performing a load. Such SSA
values don't help loading of the indirect unless we emit an if-ladder.
Copy_derefs are supported for indirects.
Also enable two tests that now pass.
v2: Remove unnecessary temporaries. Be clearer when identifying the
case where copy_entry doesn't help when we are dealing with an
indirect array_deref (of a vector). (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When looking up an entry to use, always prefer an equal match, as it
more likely to contain reusable SSA or derefs to propagate.
This will be necessary when adding entries with array derefs of
vectors, because we don't want the vector if the equal entry (an array
deref of that vector) is present.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Both on an actual array and on a vector, and an extra test on a vector
mixing direct and indirect access. The vector tests are disabled and
will be enabled by a later commit.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When direct array deref is used on a vector type (for loads and
stores), copy_prop_vars is now smart to propagate values it knows
about.
Given a 'vec4 v', storing to v[3] will update the copy entry for v and
it is equivalent to a write to v.w. Loading from v[1] will try first
to see if there's a known value for v.y -- and drop the load in that
case.
The copy entries still always refer to the entire vectors, so the
operations happen on the parent deref (the 'vector') and the values
are fixed accordingly.
It might be the case now that certain entries have not only different
SSA defs in each element but also those come from different components
than they are set to, because stores to individual elements always
come from a SSA definition with a single component.
Tests related to these cases are now enabled.
v2: Instead of asserting on invalid indices, "load" an undef and
remove the store. (Jason)
v3: Merge code path for the cases of is_array_deref_of_vector into the
regular code path. Add a base_index parameter to
value_set_from_value. (code changes by Jason)
v4: Removed the get_entry_for_deref helper, now being used only once.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Also replace uses of 0xf with the appropriate full mask created from
the number of components.
Note that an increase of MAX might make us change how the data is
stored later on, but for now at least we make sure the pass is not
hardcoded.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The name reflected this function role back when the pass also did dead
write elimination. So rename it to what it does now, which is setting
a value using another value; and narrow the argument list.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
A pipe_resource can be shared by all the pipe_context's hanging off the
same pipe_screen.
Changes from v2 -> v3:
- add locking with mtx_*() to resource and screen (Marek)
Changes from v3 -> v4:
- drop rsc->lock, just use screen->lock for the entire serialization (Marek)
- simplify etna_resource_used() flush condition, which also prevents
potentially flushing resources twice (Marek)
- don't remove resouces from screen->used_resources in
etna_cmd_stream_reset_notify(), they may still be used in other
contexts and may need flushing there later on (Marek)
Changes from v4 -> v5:
- Fix coding style issues reported by Guido
Changes from v5 -> v6:
- Add missing locking in etna_transfer_map(..) (Boris)
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tested-by: Boris Brezillon <boris.brezillon@collabora.com>
Changes v1 -> v2:
- Avoid the GPU sampling from the resource that gets mutated by the the
transfer map by setting DRM_ETNA_PREP_WRITE.
Changes v2 -> v3:
- make use of likely(..)
- drop minor optimization regarding rsc->layout == ETNA_LAYOUT_LINEAR
- better documentation why DRM_ETNA_PREP_WRITE is needed
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Saves us from calling etna_bo_map(..) and saves us from doing the
same offset calcs for map() and unmap() operations.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
ETC2 is supported with HALTI0, however that implementation is buggy
in hardware. The blob driver does per-block patching to work around
this. We need to swap colors for t-mode etc2 blocks.
Changes v2 -> v3:
- Drop redundant format check
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
The scalar back-end uses SHADER_OPCODE_SEND for all surface messages so
we no longer need the non-logical opcodes there. Prefix them VEC4 so
it's clear that they're only used by the vec4 back-end.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The unused typed surface read/write support in the vec4 back-end has
been dropped and the fs back-end now uses SHADER_OPCODE_SEND for all
image and buffer ops. There's no reason to keep these opcodes around
anymore.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Since switching to SHADER_OPCODE_SEND for image operations, we no longer
need the non-logical opcode.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
All of the actual abstraction (except possibly setting size_written)
happens as part of the logical opcodes. The only thing that the surface
builder is providing at this point is extra levels of functions to call
through. I'm going to be adding bindless image support soon and all the
extra abstraction here is just getting in the way.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
It makes more sense to start at the surface then move on to the address
and then the data. Also, this is a really good test of whether or not
we got all the places that use the sources by explicit integer number.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This change applies the workaround suggested by Bill Deegan on the
affected SCons versions.
It also adds a comment with the URL explaining why we were using
customizing the decider and max_drift in the first place, as I had
forgotten all about it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109443
Tested-by: liviuprodea@yahoo.com
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This DTD can be used to validate the drirc xml:
$ xmllint --noout --valid 00-mesa-defaults.conf
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
VGEM and kms_swrast were introduced to work with one another.
All we do is CPU rendering to dumb buffers. There is no reason to carve
out GPU memory, increasing the memory pressure on a device that could
make a better use of it.
Note:
- The original code did not work out of the box, since the dumb buffer
ioctls are not exposed to render nodes.
- This requires libdrm commit 3df8a7f0 ("xf86drm: fallback to MODALIAS
for OF less platform devices")
- The non-kms, swrast is unaffected by this change.
v2:
- elaborate what and how is/isn't working (Eric)
- simplify driver_name handling (Eric)
v3:
- move node_type outside of the loop (Eric)
- kill no longer needed DRM_RENDER_DEV_NAME define
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Make the code a bit easier to read.
As a bonus point this makes it obvious that we forgot to call
_eglAddDevice() for the device - do so.
v2:
- s/dpy/disp/ (Eric)
- free(driver_name) on dri2_load_driver_swrast() failure (Eric)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> (v1)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
When emitting a branch in a block, it does not make sense to continue
processing further instructions, as they will not be reachable.
This fixes a nasty case with a loop with a branch that both then-part
and else-part exits the loop:
%1 = OpLabel
OpLoopMerge %2 %3 None
OpBranchConditional %false %2 %2
%3 = OpLabel
OpBranch %1
%2 = OpLabel
[...]
We know that block %1 will branch always to block %2, which is the merge
block for the loop. And thus a break is emitted. If we keep continuing
processing further instructions, we will be processing the branch
conditional and thus emitting the proper NIR conditional, which leads to
instructions after the break.
This fixes dEQP-VK.graphicsfuzz.continue-and-merge.
CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Usually the uniforms will be assigned locations and have their slots
counted automatically, but for builtin shaders the location assignment
is manual. So count them too otherwise we get num_uniforms == 0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
call XShmDetach to allow X server to free shared memory
Fixes: bcd80be49a "drisw/glx: use XShm if possible"
Signed-off-by: Ray Zhang <zhanglei002@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This helps improve compile times. For example the shader-db dolphin
shader shaders/dolphin/ubershaders/120.shader_test goes from
~1.69 -> ~1.57 seconds on my machine with this change.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Some types of params such as some builtins are always padded. We
need to keep track of this so we can restore the list correctly.
Here we also remove a couple of cache entries that are not actually
required as they get rebuilt by the _mesa_add_parameter() calls.
This patch fixes a bunch of arb_texture_multisample and
arb_sample_shading piglit tests for the radeonsi NIR backend.
Fixes: edded12376 ("mesa: rework ParameterList to allow packing")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The optimization in 4cd1a0be76 introduced a replacement of :
cmp(8).z.f0.0 vgrf11.y:D, vgrf10.xxxx:D, vgrf2.xyyy:D
...
cmp(8).nz.f0.0 null.x:D, vgrf11.yyyy:D, 0D
By :
cmp(8).z.f0.0 vgrf15.x:D, vgrf10.xxxx:D, vgrf2.yyyy:D
...
mov(8) vgrf11.y:D, vgrf15.yyyy:D
The first cmp instruction is storing in x while the second mov is
sourcing from y. We need to take into account where the replacement on
the scan_inst destination is going to store thing so that the
replacement mov can source things from the correct location.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4cd1a0be76 ("i965/vec4: Propagate conditional modifiers from more compares to other compares")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109759
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that freedreno has create_with_modifiers(), this "hack" is needed to
make some cases work. Copied from vc4.
Fixes: 41ddf1d1
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
In freedreno_gmem.c, gmem_align of 0x8000 is used. Alignment used here
should be the same.
Fixes: 912a9c8d
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
When the output directory was changed, the BUILT_SOURCES and build-rule
target-path was no longer correct, leading to races to generate the
sources and compiling them.
Fix this by updating both sets of paths, so automake see what's going on
here.
Fixes: 773b3ceaca ("swr/rast: Fix autotools and scons codegen")
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Alok Hota <alok.hota@intel.com>
This is useful for r600 since there the abs source modifier is not supported
for ops with three sources
v2: Use correct logic to enable lowering to abs source mod (Eric Anhold)
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is a partial revert of 9d81cd ("virgl: Pass resource size and
transfer offsets").
The adjustments made in the client code means there's various
mismatches when transfering data.
Let's fallback to protocol version 0 and deprecate protocol
version 1. We can still use the protocol version 1 slots for
a shared memory transfer mechanism later.
Fixes:
dEQP-GLES31.functional.copy_image.mixed.viewclass_128_bits_mixed.*_renderbuffer
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Header xmmintrin.h conditionally includes emmintrin.h that defines
_MM_DENORMALS_ZERO_MASK, add ifndef to fix this warning.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Otherwise with VNDK enabled we fail linking:
src/gallium/targets/dri/Android.mk: error: gallium_dri (native:vendor)
should not link to libbacktrace.vendor (native:vndk_private)
Option makes it possible to use libbacktrace only when VNDK is not
enabled.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, we were guarded by an #ifdef, which is generally a bad form.
This patch instead guards them behind an environmental variable.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Additional VERTEX_ELEMENT_STATE are used to store basevertex and
baseinstance and drawid updating the DWordLength of the
3DSTATE_VERTEX_ELEMENTS command.
This passes all piglit tests for spec.*draw_parameters.* tests
and VK-GL-CTS KHR-GL45.shader_draw_parameters_tests.* tests.
Now we only mark a dirty_update when parameters are changed or
when we have an indirect draw.
We enable PIPE_CAP_DRAW_PARAMETERS on Iris.
There is no edge flag support in the Vertex Elements setup.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently clover will advertise any device that advertises
PIPE_CAP_COMPUTE, even if they do not support PIPE_SHADER_IR_NATIVE,
which is the IR used internally by clover.
This avoids clover advertising devices as available even though they
actually are not supported.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Program linking options are only valid if the library was created with
the `-enable-link-options` option, which itself is only valid when
creating a library, and only when creating an executable.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
If creating a library, do not allow non-compiled object in it, as
executables are not allowed, and libraries would make it really hard to
enforce the "-enable-link-options" flag.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
From the OpenCL 1.2 Specification, Section 5.6.2 (about clBuildProgram):
> If program is created with clCreateProgramWithBinary, then the
> program binary must be an executable binary (not a compiled binary or
> library).
Reviewed-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* Avoid warnings from references to deprecated CL 1.0, 1.2, 2.0 and 2.1 APIs.
* Avoid warnings from not defining CL_TARGET_OPENCL_VERSION.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Seems like dxvk used integer builtins without setting the flat
interpolation decoration.
I believe in the current spec the app is required to set these,
but in the meantime to avoid breaking things in stable releases
(and so close to release for 19.0), only expand the interpolation
to float16 and struct (which cannot be builtins as our spirv parser
lowers the builtin block).
Fixes: f324784104 "radv: Allow interpolation on non-float types."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Array index should come before sample-id. And exclude all isam variants
(which take integer texel coords) from adding of offset.
Fixes dEQP-GLES31.functional.texture.multisample.samples_1.use_texture_*_2d_array
Signed-off-by: Rob Clark <robdclark@gmail.com>
We also need to put in the output mov. Possibly we could just fixup the
output register to read it directly from the dummy, but that is more
work and I guess dEQP is probably the only time you encounter this.
Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.const_literal_fragment
Signed-off-by: Rob Clark <robdclark@gmail.com>
The indirect offset does not effect the index buffer size. Fixes all of
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_100x100_drawcount_*
with drawcount > 1.
Signed-off-by: Rob Clark <robdclark@gmail.com>
We weren't propagating the array info for cases where result of atomic
is array/reg. This can happen, for example, if result is part of a phi
web lowered to regs.
Fixes dEQP-GLES31.functional.ssbo.atomic.compswap.*
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fixes a bunch of deqp ssbo tests that use multiple ssbo blocks packed
into a single buffer.
Note the a5xx value seems suspicious, but this is what blob seems to
advertise.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Use the (nopN) encoding for slightly denser shaders.. this lets us fold
nop instructions into the previous alu instruction in certain cases.
Shouldn't change the # of cycles a shader takes to execute, but reduces
the size. (ex: glmark2 refract goes from 168 to 116 instructions)
Currently only enabled for a6xx, but I think we could enable this for
a5xx and possibly a4xx.
Signed-off-by: Rob Clark <robdclark@gmail.com>
We were overflowing instrlen (which is # of groups of 16 instructions)
in a couple dEQP tests, causing gpu hangs:
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20
Signed-off-by: Rob Clark <robdclark@gmail.com>
This fixes a failed assertion in glDeleteLists() for the following
case:
list = glGenLists(1);
glDeleteLists(list, 1);
when those are the first display list commands issued by the
application.
When we generate display lists, we plug in empty lists created with
the make_list() helper. This function uses the OPCODE_END_OF_LIST
opcode but does not call dlist_alloc() which would set the
InstSize[OPCODE_END_OF_LIST] element to non-zero.
When the empty list was deleted, we failed the InstSize[opcode] > 0
assertion.
Typically, display lists are created with glNewList/glEndList so we
set InstSize[OPCODE_END_OF_LIST] = 1 in dlist_alloc(). That's why
this bug wasn't found before.
To fix this failure, simply initialize the InstSize[OPCODE_END_OF_LIST]
element in make_list().
The game oolite was hitting this.
Fixes: https://github.com/OoliteProject/oolite/issues/325
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Calculating the scissor rectangle fields with the y flipped (0 on top)
can generate negative values that will cause assertion failure later on
as the scissor fields are all unsigned. We must clamp the bbox values
again to make sure they don't exceed the fb_height. Also fixed a
calculation error.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108999https://bugs.freedesktop.org/show_bug.cgi?id=109594
v2:
- I initially clamped the values inside the if (Y is flipped) case
and I made a mistake in the calculation: the clamp of the bbox[2] should
be a check if (bbox[2] >= fbheight) bbox[2] = fbheight - 1 instead and I
shouldn't have changed the ScissorRectangleYMax calculation. As the
fixed code is equivalent with using CLAMP instead of MAX2 at the top of
the function when bbox[2] and bbox[3] are calculated, and the 2nd is more
clear, I replaced it. (Nanley Chery)
v3:
- Reversed the CLAMP change in bbox[3] as the API guarantees that the
viewport height is positive. (Nanley Chery)
v4:
- Added nomination for the mesa-stable branch and the link to the second
bugzilla bug (Nanley Chery)
CC: <mesa-stable@lists.freedesktop.org>
Tested-by: Paul Chelombitko <qamonstergl@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This DTD can be used to validate the output and make sure any parsers
out there can handle it:
$ xmllint --noout --valid driinfo.xml
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The Loader/Validation-Layers repository allow the user to choose where
header files are installed. On my system I choose /usr/include
thinking it was the obvious "base" location, but it turns out the
headers end up being installed right there rather in a vulkan
subdirectory. On Debian/Ubuntu the selected installation path is
/usr/include/vulkan, so just go with that.
Hopefully other distro don't choose another path.
Note that the validation layer doesn't provide a .pc file so we have
no way of querying where the headers are installed.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109739
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
If the first time a fork was created, the job creating the containers was
manually cancelled, this would have left the fork unable to use the CI
(until the next automatic regeneration of the container).
Avoid this by always running the container-generation job, even though
99% of the time it will spin up, see that the container exists and shut
down.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
It's the current maximum supported by the kernel. Stay consistent with
the rest of Mesa and use the same number.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Some platforms lack O_CLOEXEC. The loader_open_device() handles those
appropriately, so use the helper.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Earlier commit introduced support for haiku yet did not properly
annotate the loader/xmlconfig dependencies.
Thus we ended up adding inc_loader for each !haiku platform - see
659910eda09a96bf0ecdc731508b98ec6cb01e21.
One piece remained though - the wayland platform. Hence the following
would fail:
meson -Dgallium-drivers=etnaviv -Ddri-drivers=''\
-Dtools=etnaviv -Dplatforms=wayland -Dglx=disabled \
build/
Cc: Alexander von Gluck IV <kallisti5@unixzen.com>
Reported-by: Boris Brezillon <boris.brezillon@collabora.com>
Fixes: 834d221512 ("meson: Add Haiku platform support v4")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The difference between the three functions is the list of mandatory
driver extensions. Pass that as an argument to the common helper.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes following valgrind warning:
==27561== Conditional jump or move depends on uninitialised value(s)
==27561== at 0x667856B: value_set_ssa_components (nir_opt_copy_prop_vars.c:78)
==27561== by 0x667A1C4: copy_prop_vars_block (nir_opt_copy_prop_vars.c:797)
Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
If we have a MOV of a uniform value available to spill, that's one of our
best choices. We can just not spill the value, and emit a new load of the
uniform as the fill. This saves bothering the TMU and the thrsw, and is
the same cost in uniforms (since the spill offset is a uniform anyway).
This doesn't have a huge impact on shader-db, since there aren't a whole
lot of spills and we usually copy-prop the uniforms at the VIR level such
that the only uniform MOVs are from vir_lower_uniforms:
total instructions in shared programs: 6430292 -> 6430279 (<.01%)
total uniforms in shared programs: 2386023 -> 2385787 (<.01%)
total spills in shared programs: 4961 -> 4960 (-0.02%)
total fills in shared programs: 6352 -> 6350 (-0.03%)
However, I'm interested in dropping the uniforms copy-prop in the backend,
since it would be cheaper to not load repeated uniforms if we have the
registers to spare. This also saves many spills on
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20, which is what
motivated a bunch of my recent backend work in the first place:
before: 46 spills, 106 fills, 3062 instructions
after: 0 spills, 0 fills, 2611 instructions
Since using bitmasks we can easily check if we have any
current value that is potentially uploaded on array setup.
So check for any potential vertex program input that is not
already a vao enabled array. Only flag array update if there is
a potential overlap.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Fix the logic for buffer full check on alloc.
This patch just takes the fix Nicolai attached to the bug report
and updates it to work on master.
Fixes: e0f0d3675d ("radeonsi: factor si_query_buffer logic out of si_query_hw")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109561
The nir_builder swizzling improvement to not emit extra MOVs resulted in
nir_lower_tex() trying to rewrite an SSA def to itself, triggering the
assert on all texturing in v3d. There's no work to be done in this case,
so just stop asserting.
Fixes: 743700be1f ("nir/builder: Don't emit no-op swizzles")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If no framebuffer is bound, get the number of samples and the
image format from the render pass.
This fixes new CTS dEQP-VK.geometry.layered.*.secondary_cmd_buffer.
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is a common pattern from HLSL->SPIRV translation
and supported in HW by all current NIR backends.
vkpipeline-db results anv (SKL):
total instructions in shared programs: 6403130 -> 6402380 (-0.01%)
instructions in affected programs: 204084 -> 203334 (-0.37%)
helped: 208
HURT: 0
total cycles in shared programs: 1915629582 -> 1918198408 (0.13%)
cycles in affected programs: 1158892682 -> 1161461508 (0.22%)
helped: 107
HURT: 86
shader-db results on i965 (KBL):
total instructions in shared programs: 15284592 -> 15284568 (<.01%)
instructions in affected programs: 81683 -> 81659 (-0.03%)
helped: 24
HURT: 0
total cycles in shared programs: 375013622 -> 375013932 (<.01%)
cycles in affected programs: 40169618 -> 40169928 (<.01%)
helped: 13
HURT: 9
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
SPIR-V shifts are undefined for values >= bitsize, but SM5 shifts
are defined to only use the least significant bits.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For split indirect sends we have to put the EOT parameter in the
extended descriptor as well as the instruction itself so just calling
brw_inst_set_eot is insufficient. Moving the EOT handling handling into
the send_indirect_[split]_message helper lets us handle it properly.
The extension NV_depth_clamp is written against OpenGL 1.2.1, and
since GLES 2.0 is based on GL 2.0 there is no reason not to enable
this extension also for GLES >= 2.0.
v2: Use EXT_depth_clamp that has been proposed to Khronos
v3: - Fix check for extension availability (Erik Faya-Lund)
- Also fix the test in is_enabled
v4: - Test both, ARB and EXT extension (Erik)
v5: - Fix white space errors (Erik)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
I was converting them at pipe_surface creation time, but not when
answering queries about whether formats support rendering. This caused
a lot of FBO incomplete errors for formats that ought to be supported.
Fixes "Child of Light", which uses PIPE_FORMAT_R8G8B8X8_UNORM_SRGB.
Also fixes Witcher 1 using wined3d (GL) according to Timur Kristóf.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109738
For texture attachments, 'f' is texImg->_BaseFormat, but for
renderbuffer attachments, 'f' is att->Renderbuffer->InternalFormat.
InternalFormat may be something like GL_RGB8, which causes our
(f == GL_RGB) check to fail. Switch to using a proper _BaseFormat,
which drops the size.
Fixes dEQP-GLES31.functional.draw_buffers_indexed.random.
max_required_draw_buffers.15 on iris when combined with a driver fix.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
apply_implicit_conversion only converts and check base types but we
need actual type equality for function returns, otherwise you can
return a vec2 from a function declared as returning a float.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
On MRT-capable systems, the framebuffer format is encoded as a 64-bit
word in the render target descriptor. Previously, the two 32-bit
words were exposed as opaque hex values. This commit identifies a 12-bit
Mali swizzle and a 2-bit channel counter, removing some of the magic. It
also adds decoding support for the AFBC and MSAA enable bits, which were
already known but otherwise ignored in pandecode.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
These ops were discovered by invoking the correspondingly names GLSL
functions. The rounding ops here behave exact as expected and are mapped
to their corresponding NIR ops where applicable. The ffma behaves as a
LUT instruction and requires some special argument packing (since
Midgard normally only allows for 2 arguments); this quirk will be
addressed in the future, but for now FMA is still lowered.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This flag corresponds to what was MEM_COHERENT_LOCAL in the vendor
driver, which seems to influence the cache policy, necessary for the
varying temporary storage but nothing else.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Potentially, the kernel could optimize these allocations, or perhaps we
can save on mapping costs.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
For reasons that are still unclear (speculation included in the comment
added in this patch), the tiler? metadata has a fast path that we were
not enabling; there looks to be a possible time/memory tradeoff, but the
details remain unclear.
Regardless, this patch improves performance dramatically. Particular
wins are for geometry-heavy scenes. For instance, glmark2-es2's
Phong-shaded bunny, rendering at fullscreen (2400x1600) via GBM, jumped
from ~20fps to hitting vsync cap at 60fps. Gains are even more obvious
when vsync is disabled, as in glmark2-es2-wayland.
With this patch, on GLES 2.0 samples not involving FBOs, it appears
performance is converging with (and sometimes surpassing) the blob.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
The nir_swizzle helper is used some on it's own but it's also called by
nir_channel and nir_channels which are used everywhere. It's pretty
quick to check while we're walking the swizzle anyway whether or not
it's an identity swizzle. If it is, we now don't bother emitting the
instruction. Sure, copy-prop will clean it up for us but there's no
sense making more work for the optimizer than we have to.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This header has been unused since f8f2520e88 ("st/mesa: Remove
unnecessary headers"). And in the more than 8 years since, this
hasn't been useful. So let's just get rid of it.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
From the bash manual:
string1 == string2
string1 = string2
True if the strings are equal. = should be used with the test
command for POSIX conformance.
Test using array deref on vectors in loads and stores. These are
marked DISABLED_ as this optimization is currently not done.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Replace find_next_intrinsic(intrinsic, after) with
get_intrinsic(intrinsic, index). This makes slightly more convenient
to check the resulting loads/stores/copies, since in most tests we
know which one we care about. The cost is to perform more traversals,
but for such tests this is not a problem.
Added the ASSERT_EQ() on count to some tests missing it, so the
indices queried are always expected to find something.
Also, drop two nir_print_shader leftover calls in a test.
v2: Remove redundant assertions. nir_src_comp_as_uint already
assert what we need. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When a copy_entry is SSA, store not only the nir_ssa_def* for each
component, but also the source component they come from. At the
moment this is always a match (i.e. 'component[i] == i'), because all
the operations for a copy_entry happen using definitions with the same
size. This prepares the code for array_derefs of vectors, in which
'component[i] != i'.
Also, extract setting all SSA components into a function of its own.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Disabled by default, to be used during development. Adding those
so I don't rewrite some ad-hoc version of them everytime I'm working
with this pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For now these derefs are not handled, so don't let these get into the
copies list -- which would cause wrong propagations. For load_derefs,
do nothing. For store_derefs, invalidate whatever the store is
writing to. For copy_derefs, invalidate whatever the copy is writing
to.
These cases will happen once derefs to SSBOs/UBOs are kept around long
enough to get optimized by copy_prop_vars.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rather than only lowering if all srcs are scalarizable we instead
check that at least one src is scalarizable.
We change undef type to return false otherwise it will cause
regressions when it is the only scalarizable src.
total instructions in shared programs: 13219105 -> 13024547 (-1.47%)
instructions in affected programs: 1153797 -> 959239 (-16.86%)
helped: 581
HURT: 74
total cycles in shared programs: 333968972 -> 324807922 (-2.74%)
cycles in affected programs: 129809402 -> 120648352 (-7.06%)
helped: 571
HURT: 131
total spills in shared programs: 57947 -> 29130 (-49.73%)
spills in affected programs: 53364 -> 24547 (-54.00%)
helped: 351
HURT: 0
total fills in shared programs: 51310 -> 25468 (-50.36%)
fills in affected programs: 44882 -> 19040 (-57.58%)
helped: 351
HURT: 0
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fixes a bug where SWR will fail to render in cases with large
buffer allocations, e.g. very large meshes whose vertex buffers exceed
2GB
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Note that emit_intrinsic_load_image() already swaps a .3d flag with an
.a flag. I tried doing things the other way around (going back to .3d)
but that didn't work. And treating cube images as 2d array is also what
blob does, so let's just go with that.
Fixes dEQP-GLES31.functional.image_load_store.cube.load_store.*
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fixes nearly all of dEQP-GLES31.functional.texture.border_clamp.* when
run after a test that binds textures used in vertex shader.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.const_literal.vertex.samplercubeshadow
and few other similar tests that do multiple texture fetches into
individual components of a packet output. Mostly works around the
issue mentioned in ra_block_find_definers().
Signed-off-by: Rob Clark <robdclark@gmail.com>
vulkan_core.h defines non-dispatchable handles as (struct object *)
on 64-bit systems, but uint64_t on 32-bit systems. The former can be
implicitly cast to void *, but the latter requires an explicit cast.
While here, %lu is the wrong format specifier for uint64_t on 32-bit
systems, so use PRIu64, fixing a warning.
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
On one side, when emitting 3DSTATE_SF, VertexSubPixelPrecisionSelect is
used to select between 8 bit subpixel precision (value 0) or 4 bit
subpixel precision (value 1). As this value is not set, means it is
taking the value 0, so 8 bit are used.
On the other side, in the Vulkan CTS tests, if the reference rasterizer,
which uses 8 bit precision, as it is used to check what should be the
expected value for the tests, is changed to use 4 bit as ANV was
advertising so far, some of the tests will fail.
So it seems ANV is actually using 8 bits.
v2: explicitly set 3DSTATE_SF::VertexSubPixelPrecisionSelect (Jason)
v3: use _8Bit definition as value (Jason)
v4: (by Jason)
anv: Explicitly set 3DSTATE_CLIP::VertexSubPixelPrecisionSelect
This field was added on gen8 even though there's an identically defined
one in 3DSTATE_SF.
CC: Jason Ekstrand <jason@jlekstrand.net>
CC: Kenneth Graunke <kenneth@whitecape.org>
CC: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
float16 types can have non-flat interpolation so set up the HW
correctly for that.
Fixes: 62024fa775 "radv: enable VK_KHR_16bit_storage extension / 16bit storage features"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
It causes more trouble than it's worth. Now vl tries to create compute
shaders without all the proper checking. Since there's really no
(current) way to use compute on nv50, just mark it disabled.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109742
Fixes: f6ac0b5d71 ("gallium/auxiliary/vl: Add compute shader to support video compositor render")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixes the following building error:
including ./external/mesa/Android.mk ...
build/core/base_rules.mk:183: *** external/mesa/src/intel:
MODULE.TARGET.STATIC_LIBRARIES.libmesa_isl_tiled_memcpy already defined by external/mesa/src/intel.
make: *** [build/core/ninja.mk:164: out/build-android_x86_64.ninja] Error 1
ISL_TILED_MEMCPY_FILES is isl/isl_tiled_memcpy_normal.c
and that source file includes isl_tiled_memcpy.c source
Fixes: 96bb328 ("iris: add Android build")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This is needed for gen_clflush.h intrinsics to work on 32-bit builds.
i965 and anv both set these, and iris needs to as well.
Tested-by: Mark Janes <mark.a.janes@intel.com>
Even though the hardware spec claims that any "integer DWord multiply"
operation is affected by the regioning restrictions of CHV/BXT/GLK,
this is inconsistent with the behavior of the simulator and with
empirical evidence -- Return false from has_dst_aligned_region_restriction()
for such instructions as a micro-optimization.
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Strides up to 32B can be implemented for the source regions of most
instructions by leveraging either the vertical or the horizontal
stride of the hardware Align1 region. The main motivation for this is
that currently the lower_integer_multiplication() pass will happily
double the stride of one of the 32-bit sources, which can blow up if
the stride of the original source was already the maximum value
allowed by the hardware.
An alternative would be to use the regioning legalization pass in
order to lower such strides into the composition of multiple legal
strides, but that would be somewhat less efficient.
This showed up as a regression from my commit cbea91eb57
in Vulkan 1.1 CTS tests on CHV/BXT platforms, however it was really a
pre-existing problem that had affected conformance on other platforms
without native support for integer multiplication. CHV/BXT were
getting around it because the code I removed in that commit had the
"fortunate" side effect of emitting narrower regions that didn't hit
the hardware stride limit after lowering. Beyond fixing the
regression this fixes ~90 additional Vulkan 1.1 subgroup CTS tests on
ICL (that's why this patch is marked for inclusion in mesa-stable even
though the original regressing patch was not).
According to Jason, a nearly equivalent change had been committed
previously as e8c9e65185 and then (mistakenly?) reverted as
a31d038208.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is required in combination with the following commit, because
otherwise if a source region with an extended 8+ stride is present in
the instruction (which we're about to declare legal) we'll end up
emitting code that attempts to write to such a region, even though
strides greater than four are still illegal for the destination.
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Because the "low" temporary needs to be accessed with word type and
twice the original stride, attempting to preserve the alignment of the
original destination can potentially lead to instructions with illegal
destination stride greater than four. Because the CHV/BXT alignment
restrictions are now being enforced by the regioning lowering pass run
after lower_integer_multiplication(), there is no real need to
preserve the original strides anymore.
Note that this bug can be reproduced on stable branches, but
back-porting would be non-trivial, because the fix relies on the
regioning lowering pass recently introduced.
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently the execution type calculation will return a bogus value in
cases like:
mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u
Which will be considered to have a 32-bit integer execution type even
though the actual indirect move operation will be carried out with
16-bit precision.
Similarly there's no need to apply the CHV/BXT double-precision region
alignment restrictions to such control sources, since they aren't
directly involved in the double-precision arithmetic operations
emitted by these virtual instructions. Applying the CHV/BXT
restrictions to control sources was expected to be harmless if mildly
inefficient, but unfortunately it exposed problems at codegen level
for virtual instructions (namely the SHUFFLE instruction used for the
Vulkan 1.1 subgroup feature) that weren't prepared to accept control
sources with an arbitrary strided region.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes <mark.a.janes@intel.com>
Fixes: efa4e4bc5f "intel/fs: Introduce regioning lowering pass."
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This reduces the time spent in nir_opt_cse() by almost a half.
The massif tool from callgrind reported no change in peak
memory use with the large doliphin uber shaders I used for
testing.
Reviewed-by: Thomas Helland<thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This currently regresses KHR-GL4x.compute_shader.resource-texture,
but that's a pre-existing bug (https://bugs.freedesktop.org/109113)
which should be fixed up once we have fast clear support.
If we change the aux state for a given resource, we need to re-emit the
binding table pointers for any stage that has such resource bound. Since
we don't track that, flag IRIS_ALL_DIRTY_BINDINGS and emit all of them.
If iris_resource_get_handle() gets called without a context, we can't
resolve the resource. Hopefully it shouldn't be compressed anyway, so
let's just add an assert to ensure it's correct.
CCS_E can fall back to CCS_D with incompatible format views
CCS_D is pretty useless without fast clears and we may as well use NONE,
but we're surely going to hook those up at some point, so may as well
just go ahead and do it now...
We can safely assume that the given resource is depth, depth/stencil,
or stencil already. The stencil-only case is easily detectable with
a single format check, and all other cases are handled identically.
This saves some CPU overhead.
This just moves the code for dealing with pipe_shader_state /
pipe_compute_state / iris_uncompiled_shader to the end of the file.
Now that those do precompiles, they want to call the actual compile
functions. Putting them at the end eliminates the need for a bunch
of prototypes.
Caio noted that this is not necessary on Gen8+:
"Before Gen8, there was a historical configuration control field to
swizzle address bit[6] for in X/Y tiling modes. This was set in
three different places: TILECTL[1:0], ARB_MODE[5:4], and
DISP_ARB_CTL[14:13]. For Gen8 and subsequent generations, the
swizzle fields are all reserved, and the CPU's memory controller
performs all address swizzling modifications."
Since we don't support earlier hardware, we can skip it entirely.
The Vulkan driver only sets this if color writes are disabled, which
is more conservative - but would require us to inspect blend state.
(If color writes are enabled, we don't need to force anything, because
the internal signal is already correct. But it shouldn't hurt to do so.)
I was misreading i965 - the 3DSTATE_WM::PixelShaderKillsPixel bit from
Gen < 8 needed all of this, but the 3DSTATE_PS_EXTRA bit only needs
prog_data->uses_kill.
Suggested by Chris Wilson, if only to make it obvious to the human
readers that these are volatile reads. It may also be necessary for
the compiler in a few cases.
When switching from bo_wait to sync-points, I missed that we turned an
if (not landed) bo_wait into a while (not landed) check_syncpt(), which
has a timeout of 0. This meant, rather than sleeping until the batch
is complete, we'd busy-loop, continually asking the kernel "is the batch
done yet???". This is not what we want at all - if we wanted a busy
loop, we'd just loop on !snapshots_landed. We want to sleep.
Add an effectively infinite timeout so that we sleep.
Instead of allocating 4K BO per query object, we can create a large blob
of memory and split it into pieces as required.
Having one BO for multiple query objects, we don't want to wait on all
of them, instead when we write last snapshot, we create a sync point, and
check syncpoints while waiting on particular object.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
I inherited this from i965. It would be nice to track the state size
so INTEL_DEBUG=color,bat decoding can print the right number of e.g.
binding table entries or blend states, but...without a single point
of entry for state, it's a little tricky to get right. Punt for now,
and drop the dead code in the meantime.
i965 re-emits 3DSTATE_CONSTANT_* on every batch, so there's no point in
restoring the constants from the context. Iris actually re-pins the
constant buffers properly across the batch, and avoids re-emitting the
constant packets unless it's necessary. So, we don't want ISP_DIS.
Previously I had a hack in st/mesa to make it stop remapping
VARYING_SLOT_* into the naively compacted slots, which aren't
what we want. But that wasn't very feasible, as we'd have to
update all drivers, or add capability bits, and it gets messy fast.
It turns out that I can map back to VARYING_SLOT_* in about 5 LOC,
so let's just do that. It removes the need for hacks, and is easy.
This also fixes KHR-GL46.enhanced_layouts.xfb_capture_struct, which
apparently with my hack was still getting the wrong slot info.
1. Set a render condition. We emit it immediately on the render
engine, and stash q->bo as ice->state.compute_predicate in case
the compute engine needs it.
2. Clear the render condition. We were incorrectly leaving a stale
compute_predicate kicking around...
3. Dispatch compute. We would then read the stale compute predicate,
and try to load it into MI_PREDICATE_DATA. But q->bo may have been
freed altogether, causing us to try and use garbage memory as a BO,
adding it to the validation list, failing asserts, and tripping
EINVALs in execbuf.
Huge thanks to Mark Janes for narrowing this sporadic GL CTS failure
down to a list of 48 tests I could easily run to reproduce it. Huge
thanks to the Valgrind authors for the memcheck tool that immediately
pinpointed the problem.
In st_nir_lower_uniforms_to_ubo() all UBO access in the shader have
its index incremented to open room for uniforms in constbuf0. So if
we use UBOs, we always need to include the extra binding entry in the
table.
To avoid doing this checks both when compiling the shader and when
assigning binding tables, store the num_cbufs in iris_compiled_shader.
Fixes a bunch of tests from Piglit and CTS that use UBOs but don't use
uniforms or system values. Note that some tests fitting this criteria
were passing because the UBOs were moved to be push
constants (avoiding the problem).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
iris_bufmgr allocates addresses across the entire screen, since buffers
may be shared between multiple contexts. There used to be a single
special address, IRIS_BINDER_ADDRESS, that was per-context - and all
contexts used the same address. When I moved to the multi-binder
system, I made a separate memory zone for them. I wanted there to be
2-3 binders per context, so we could cycle them to avoid the stalls
inherent in pinning two buffers to the same address in back-to-back
batches. But I figured I'd allow 100 binders just to be wildly
excessive/cautious.
What I didn't realize was that we need 2-3 binders per *context*,
and what I did was allocate 100 binders per *screen*. Web browsers,
for example, might have 1-2 contexts per tab, leading to hundreds of
contexts, and thus binders.
To fix this, we stop allocating VMA for binders in bufmgr, and let
the binder handle it itself. Binders are per-context, and they can
assign context-local addresses for the buffers by simply doing a
ringbuffer style approach. We only hold on to one binder BO at a
time, so we won't ever have a conflicting address.
This fixes dEQP-EGL.functional.multicontext.non_shared_clear.
Huge thanks to Tapani Pälli for debugging this whole mess and
figuring out what was going wrong.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
We use > for IRIS_MEMZONE_DYNAMIC because IRIS_BORDER_COLOR_POOL_ADDRESS
lives at the very start of that zone. However, IRIS_MEMZONE_SURFACE and
IRIS_MEMZONE_BINDER are normal zones. They used to be a single zone
(surface) with a single binder BO at the beginning, similar to the
border color pool. But when I moved us to multiple binders, I made them
have a real zone (if a small one). So both zones should use >=.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
INTEL_DEBUG=reemit was breaking streamout tests, by re-emitting
3DSTATE_SO_BUFFER commands that tell the HW to zero the SO write
offsets. We would need to alter them to use 0xFFFFFFFF for the offset.
Also, have each upload function only flag bits relevant to its own
pipeline.
I think this was an attempt to work around various sample mask bugs I
had early on. It's not correct. A sample mask of 0 is legal and means
to disable all samples.
Fixes dEQP-GLES31.functional.texture.multisample.*.*sample_mask*
I had a hack in place earlier to pass the query type as q->index
for the regular statistics query, but we ended up adjusting the
interface and adding a new query type. Use that instead, fixing
pipeline statistics queries since the rebase.
We were dividing by 4 in calculate_result_on_gpu(), and also in
iris_get_query_result(). We should stop doing the latter, and instead
divide by 4 in calculate_result_on_cpu() as well.
Otherwise, if snapshots were available, and you hit the
calculate_result_on_cpu() path, but requested it be written to a QBO,
you'd fail to get a divide.
This is an awkward corner case. We create batches in order, each of
which creates and pins a BO. The other batches may not be set up yet,
so it may not be safe to ask whether they reference a BO.
Just avoid this for now. We could avoid it for other context-local BOs
too, but we currently don't have a flag for that (and I'm not certain
whether it's worth it).
Various places in the transfer code need to know whether they must
read the existing resource's values. Rather than checking both flags
everywhere, just make PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE also flag
PIPE_TRANSFER_DISCARD_RANGE - if we can discard everything, we can
discard a subrange, too.
Obviously, we can do better for PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE,
but eventually u_threaded_context should handle swapping out buffers
for new idle buffers, anyway. In the meantime, this is at least better.
BLORP uses the render engine to write to buffers, and we need to flush
that data out to the actual surface (finishing the write). Then, the
rest of this function invalidates any caches that might have stale data
which needs to be refetched.
I don't know if this is required - surprisingly, I haven't seen it
matter - but I'd like to use it for multi-slice transfer maps. We may
as well do the right thing.
There are a variety of ways to fix this, many of which are simple, but
I could use some advice on which ones other people prefer, and so we'll
punt until after the holidays.
We were relying on CSE/GVN/etc to coalesce all intrinsics that load the
same value, but that's a bad idea. We might have a couple intrinsics
that reload the same value. If so, we only want to set up the uniform
on the first one we see.
System values are built-in uniforms. We set them up as UBO values, and
might pull or push them. UBO push analysis will take care of that. We
only want to enable push constants if there's an actual range being
pushed. Otherwise, we might get into a scenario where 3DSTATE_PS
enables push constants but 3DSTATE_CONSTANT_PS isn't pushing anything.
This fixes GPU hangs in Broadwell image load store tests which have
unused image param system values but no other uniforms. (We shouldn't
be making those anyway, but that's a separate fix...)
cso_fb->layers is only valid for no-attachment framebuffers. Use the
helper function to get the real value, then stash it so we don't have
to call the helper function on the old value for comparison, or at draw
time for Force Zero RTA Index setting.
This fixes Force Zero RTA Index being set even when attempting layered
rendering.
Both 'z' and 'depth' are counted in slices, according to the Gallium
docs (context.rst). In our temporary memory, we allocate `box.depth`
slices, so we need to rebase the starting slice (box.z) down to 0,
and back again when writing on unmap.
There's nothing strange about cubes here.
Gen9-10 have fewer than 4 subslices per slice, so they need this to be
rounded up. Gen11 isn't documented as needing this hack, and it can
also have more than 4 subslices, so the hack actually can break things.
Fixes tests/spec/arb_enhanced_layouts/execution/component-layout/
sso-vs-gs-fs-array-interleave
I was using the Gallium API wrong. set_* functions with start_slot
and count parameters are supposed to update a subrange of the items.
I had been trashing all bound vertex buffers and starting over.
This should hopefully also make it easier to slot in additional
VERTEX_BUFFER_STATEs at draw time, say, for shader draw parameters.
Note that at least following additional libs/components require changes
since they refer to BOARD_GPU_DRIVERS variable which is used to select
the driver:
- mixins
- minigbm
- libdrm
- drm_gralloc
v2: (feedback by Gustaw Smolarczyk) Fix trailing \ in a few cases
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
(cleaned up by Ken - make sure a bunch of things were more obviously
not using res->surf, do allow checking res->surf.tiling == LINEAR,
drop format cpp checks that aren't needed, drop memzone handling for
images, assume buffers / non-buffers in a few places...)
Dave fixed it to consider whether the sampler view is a cube.
With that, there's no point (possibly harm) in looking if the original
resource was a cube...if it's an array view, we don't want to treat it
as a cube anymore...
This is supposed to exclude single address zones. We were getting
too many VMA allocators but failing to set them up, which worked out
because we also forgot to destroy them...
We were being very picky about things being Y tiled. But, not
everything can be - for example, > 16382 surfaces on SKL GT1-3
have to fall back to linear.
Instead, give ISL options and let it pick.
dead since my program cache API rework. we could still use it for one
function, but it's so trivial to pass the size, that it's probably not
worth the extra code
This exposes iris_upload_shader() without having to bind it, which will
be useful for precompiles. It also lets us examine the old programs and
flag dirty bits at a higher level, rather than cramming all that
knowledge into the cache layer.
When we blit, transfer, or copy_resource to a buffer, we need to flush
to ensure any stale data for that buffer is invalidated in the caches.
bind_history will inform us which caches need to be flushed.
Also, for any push constant buffers, we need to flag those dirty so
that we re-emit 3DSTATE_CONSTANT_*, causing the data to be re-pushed.
Size can be too large for a surf, blorp_buffer_copy chops things up
into segments we can actually handle
Fixes map_buffer_range_test and copy_buffer_coherency
When flushing a batch due to a data dependency, we need to not only
kick off the other batch's work, but stall our execution until it
completes. Just wait on last_syncpt after flushing it.
(adjusted by Ken to make the signalling sync object immediately on
batch reset, rather than batch finish time. this will work better
with deferred flushes...)
x should be in bytes, not cpp units
This generally worked out because PIPE_BUFFER is supposedly required
to be R8_UINT or R8_UNORM. I hear some state trackers pass
PIPE_FORMAT_NONE instead, however, which would make this break.
Just do the right thing directly, to be defensive and clear.
it doesn't do anything, we have no params. I guess I thought there
would be some, but they all get dead code eliminated even if we try
to make them exist in the first place.
The backend compiler used to do this for us, but after a rebase, it's
now the driver's responsibility. This lets us alter it for say, clip
vertex lowering, at the global level rather than the per-variant level.
we borrow the approach from anv rather than i965, as it works better
with pre-baked state that needs to contain scratch BO addresses
fixes a bunch of varying packing tests
The intention is that render and compute use their own contexts,
and each is PIPELINE_SELECT'd to the right pipeline. But we hadn't
actually made them, so we got the fd-default context.
Thanks to Chris Wilson for catching this!
This makes e.g. the render batch aware of the compute batch, so it can
ask questions like "is this BO referenced by some other batch?" and do
something about that.
3DSTATE_CONSTANT_* looks at prog_data->ubo_ranges. We were getting
saved by iris_set_constant_buffers() usually happening when changing
programs (as they usually change uniforms too), but with the clear
shader that doesn't use uniforms, we weren't getting one and were
leaving push constants enabled, screwing things up.
Also clean up a bit of a mess left by the hacks - we were missing
bindings in the VS/FS/CS case, among other issues...
It may be in the dynamic state buffer but the fact that we have a
resource takes care of that. We don't need to add in the address of
the dynamic state buffer again.
If the state tracker never gave us the framebuffer dimensions via
a set_framebuffer_state() call, just fall back to the unbound texture
null surface, which is 1x1x1. Otherwise we'd use a NULL resource
(no pun intended).
we don't want to emit piles of pipe controls to a compute batch if
it isn't necessary...
prevents double-batch-wraps in cs-op-selection-bool-bvec4-bvec4
(but it's still kinda a big ol' hack...)
now we only upload a new grid when it's actually changed, which saves us
from having to emit a new binding table every time it changes.
this also moves a bunch of non-gen-specific stuff out of iris_state.c
We only can push constants for compute shaders from one range.
Gallium glsl-to-nir (src/mesa/state_tracker/st_glsl_to_nir.cpp) lowers
all uniform accesses to a ubo.
Unfortunately we also load the subgroup-id as a uniform in the
compiler. Since we use the 1 push range for this subgroup-id, we then
lose the ability to actually push the ubo with all the normal user
uniform values.
In other words, there is lots of room for performance improvement, but
at least retrieving the uniforms as pull-constants is functional for
now.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
This can happen when faking Z32_S8X24 and setting StencilSampling = true
I guess we'll just turn it into S8_UINT...
Fixes KHR-GL45.texture_swizzle.functional
We were accidentally using the ISL_FORMAT_R32_FLOAT_X8X24_TYPELESS
format, which is NOT what we use. We just store R32_FLOAT depth.
fixes Piglit's texwrap GL_ARB_depth_buffer_float
chaining to a new batch reuses create_batch(), but we don't need to do
the work of pinning BOs we inherit from a previous batch...when that is
actually part of the same execbuf invocation.
instead, just flag it when setting primary_batch_size = 0, in
iris_batch_reset
this is more obviously correct. I think the two end up being the same
in practice, since this is in the alloc_from_cache case, and presumably
bo from the bucket has bo->size == bucket->size, and bo_size also is
bucket->size...
still. better to do the obvious thing.
brw_bufmgr already does it this way.
otherwise, get results may check q->map->snapshots_landed...before our
commands to initialize it to false have actually executed...so it'd get
some random garbage from the BO...
the passthrough shader doesn't need a real program string ID - that's
basically used for ARB programs indicating total program source code
changes, or other pre-baked uniform changes, etc...none of which a
passthrough shader has...so we don't need a unique identifier to
distinguish them. We want to use a consistent value so we find
existing passthrough shaders in the cache.
If no TCS is provided, create a "passthrough" TCS that will take the
default values set in the API as constants and pass to the TES, along
with any other inputs it expects. The code to create the NIR shader
is the same as in i965.
Tested with
./piglit run -t 'tess' quick_shader r
and fixed a dozen crashes from that list.
not sure why this is labeled const, I'm pretty sure we are taking the
reference and owning this, so there's no particular reason we can't
change it. it certainly seems to be working for non-compute. and,
freedreno's ir3_shader.c seems to do this as well. still...gross :/
The backend compiler expects the gl_TessLevel* variables to be mapped
as inputs instead of system values. Use the new PIPE_CAP to get this
behavior from GLSL compiler.
Tested with:
tests/spec/arb_tessellation_shader/execution/vs-tcs-tes-tessinner-tessouter-inputs-quads.shader_test
some of them had typos, didn't say 'authors or copyright holders',
or other mistakes. This is now https://opensource.org/licenses/MIT
text, formatted consistently.
This fixes some cases in fbo-none* and framebuffer_no_attachments.
I'm not sure this is correct otherwise, the tests don't all pass yet
No idea if this is in any way the correct answer
0xffffffff does not mean 1, it means enable as many as there actually
are. we don't get set_sample_mask() calls until some masking is
actually applied...i.e. it doesn't get updated based on # of samples
in the FBO changing.
looking at the freedreno code, this is totally unnecessary! we can just
store the NIR and be happy, and not have any vestiges of TGSI.
plus we can reuse this structure for compute shaders, without needing a
pipe_compute_state base.
create_surface happens before st_validate_attachment, which actually
does the "hey, this is a render target now, is that OK?" check
Fixes asserts in ./bin/arb_texture_view-rendering-formats, allowing the
rest of the tests to run.
st/nir offsets SSBO indexes by MaxABOs. This is not what we want,
as it bloats the binding tables. We'll need to adjust it to use
info->num_abos as the offset and buffer base instead. For now,
just use the inefficient format to get us rolling. We can add a
PIPE_CAP later.
not sure how useful this really is...
./bin/ext_transform_feedback-tessellation triangles flat_first
is hitting a case where we rebind the same VS program, but with
different streamout info...which isn't in the key...but is in the
cache...so we don't rebuild it...
It just does blits between layers, which is all we'd do anyway,
and it already should use BLORP because of iris_blit(). Plus it
handles 3D, which our code in i965 doesn't.
We used to not reset the batch, and just keep appending to it, so you'd
get the same invalid contents over and over.
I'd also really like to know about this, so aborting seems wise for now,
if not for the long term
seeing
set_viewport_state 0 1
set_viewport_state 1 15
which gives us a total of 16 viewports, updated incrementally
so keep old values around and update them...
we need proper batch chaining. without relocations, we can't grow,
since we've only allocated so much VMA for the batch, and the mechanism
only works if we can pin it at the old address
there's some bug here as Jason's patches for only emitting 3DS_DR once
got reverted by Mark later on, apparently they regressed MSAA tests.
need to sort that out.
It's useless to allocate SAMPLER_STATEs in GPU memory on creation like
we do for SURFACE_STATES, because they need to be organized into a
contiguous block of memory. But we can do that at bind time, rather
than draw time.
tomorrow, fix the build system to avoid symbol clashes somehow...
we're getting gen9 functions because they happen to be listed before 10
in the link list.
This commit introduces a new Gallium driver for Intel Gen8+ GPUs,
named 'iris_dri.so' after the hardware.
Developed by:
- Kenneth Graunke (overall driver)
- Dave Airlie (shaders, conditional render, overflow query, Gen8 port)
- Chris Wilson (fencing, pinned memory, ...)
- Jordan Justen (compute shaders)
- Jason Ekstrand (image load store)
- Caio Marcelo de Oliveira Filho (tessellation control passthrough)
- Rafael Antognolli (auxiliary buffer fixes)
- The rest of the i965 contributors and the Mesa community
Turns out we can write to tiled images as well as read. This avoids
having to linearize or do the tiling in the shader.
Signed-off-by: Rob Clark <robdclark@gmail.com>
On GLSL that info is set as a layout qualifier when redeclaring
gl_FragCoord, so somehow tied to a specific variable. But in practice,
they behave as a global of the shader. On ARB programs they are set
using a global OPTION (defined at ARB_fragment_coord_conventions), and
on SPIR-V using ExecutionModes, that are also not tied specifically to
the builtin.
This patch moves that info from nir variable and ir variable to nir
shader and gl_program shader_info respectively, so the map is more
similar to SPIR-V, and ARB programs, instead of more similar to GLSL.
FWIW, shader_info.fs already had pixel_center_integer, so this change
also removes some redundancy. Also, as struct gl_program also includes
a shader_info, we removed gl_program::OriginUpperLeft and
PixelCenterInteger, as it would be superfluous.
This change was needed because recently spirv_to_nir changed the order
in which execution modes and variables are handled, so the variables
didn't get the correct values. Now the info is set on the shader
itself, and we don't need to go back to the builtin variable to set
it.
Fixes: e68871f6a ("spirv: Handle constants and types before execution
modes")
v2: (Jason)
* glsl_to_nir: get the info before glsl_to_nir, while all the rest
of the info gathering is happening
* prog_to_nir: gather the info on a general info-gathering pass,
not on variable setup.
v3: (Jason)
* Squash with the patch that removes that info from ir variable
* anv: assert that OriginUpperLeft is true. It should be already
set by spirv_to_nir.
* blorp: set origin_upper_left on its core "compile fragment
shader", not just on some specific places (for this we added an
helper on a previous patch).
* prog_to_nir: no need to gather specifically this fragcoord modes
as the full gl_program shader_info is copied.
* spirv_to_nir: assert that we are a fragment shader when handling
this execution modes.
v4: (reported by failing gitlab pipeline #18750)
* state_tracker: update too due changes on ir.h/gl_program
v5:
* blorp: minor change after change on previous patch
* radeonsi: update due this change.
v6: (Timothy Arceri)
* prog_to_nir: remove extra whitespace
* shader_info: don't use :1 on origin_upper_left
* glsl: program.fs.origin_upper_left/pixel_center_integer can be
move out of the shader list loop
This initializes the nir shader that will be used by blorp. Right now
it doesn't do too much beyond calling nir_builder_init_simple_shader,
and setting a name. More stuff will be added on following patches.
v2: there is a case were it is used a VERTEX_SHADER (Alejandro)
The condition code in extended branches is repeated 8 times for unclear
reasons; accordingly, the code would be disassembled as "unknown5555",
"unknownAAAA", etc. This patch correctly masks off the lower two bits to
find the true code to print, verifying that the code is repeated as
believed to be necessary (providing some assurance for compiler quality
and an assert trip in case we encounter a shader in the wild that breaks
the convention).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
discard and discard_if are both implemented with the branching pipeline
on Midgard; essentially, we branch to the end of the fragment shader in
a special "discard" mode, setting the condition as necessary.
Previously, we hardcoded the form of this instruction, which worked for
very simple shaders but was incorrect for anything remotely interesting.
This patch instead emits logical branches in the IR, which are flattened
to real discard ops the same way other branches are, allowing targets to
be computed correctly.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Previously, we only emitted compact branches; however, the offset range
of these branches is too small for many real world shaders. This patch
implements support for emitting extended branches and switches to always
using them for control flow. This incurs a code size and possibly
performance penalty, but expands the range of working shaders and
provides opportunity for further optimization.
Support for emitting compact branches is retained but this code path is
presently unused. In the future, we'll want to heuristically determine
which type of branch should be emitted for optimal codegen.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Midgard features "compact branches" and "extended branches", i.e.
corresponds to short jumps and far jumps. The form of the extended
branch was previously incorrect in the ISA headers; this patch corrects
it and updates the disassembler (simultaneous to preserve
bisectability).
Additionally, we fix some a corner case in the disassembly of extended
branches, and we now prefix extended branches with "brx", to visually
differentiate from compact branches prefixed with "br".
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
An if-else statement is compiled to a conditional branch (from the start
to the second block) and an unconditional branch (from the end of the
first block to the end of the else). We previously incorrectly computed
the block index of the unconditional branch to be exactly one after that
of the conditional branch, valid for a single if-else statement but
nothing fancier. This patch correctly computes the unconditional branch
target, fixing more complex if-else chains.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Each Midgard instruction is scheduled to a particular instruction type
("tag"). Presumably the hardware prefetches memory based on tag, so it
is required to report out the first tag to the command stream and the
next tag of a branch target. This procedure was implemented in two
separate parts of the compiler (one time with a slight bug relating to
empty blocks); this patch refactors to unite the two routines and solve
the bug when branching to empty blocks.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Historically, Panfrost debugging entailed the use of the LD_PRELOADable
`panwrap` tool. This setup is a tad fragile; Panfrost can be traced
directly without the intermediate layer. pantrace implements the
quivalent functionality of panwrap into Panfrost proper, allowing dumps
to work regardless of the kernel layer in use.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
The `panwrap` utility can be LD_PRELOAD'd into a GLES app, intercepting
communication between the driver and the kernel. Modern panwrap versions
do no processing of their own; instead, they create a trace directory.
This directory contains the following files:
- control.log: a line-by-line plain text file, denoting important
syscalls (mmaps and job submits) along with their arguments
- memory_*.bin, shader_*.bin: binary dumps of mapped memory
Together, these files contain enough information to reconstruct the
command stream and shaders of (at minimum) a single frame.
The `pandecode` utility takes this directory structure as input,
reconstructing the mapped memory and using the job submit command as an
entrypoint. It then walks the descriptors as the hardware would, parsing
and pretty-printing. Its final output is the pretty-printed command
stream interleaved with the disassembled shaders, suitable for driver
debugging. For instance, the behaviour of two driver versions (one
working, one broken) can be compared by diff'ing their decoded logs.
pandecode/decode.c was originally a part of `panwrap`; it is the oldest
living code in the project. Its history is generally not worth
preserving.
panwrap itself will continue to live downstream for the foreseeable
future, as it is specifically written for the vendor kernel. It is
possible, however, to produce equivalent traces directly from Panfrost,
bypassing the intermediate wrapping layer for well-behaved drivers.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This is not yet functional, but it resolves a crash in various apps and
provides a framework for further work.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
v2: use tc.stream_uploader in si buffer_transfer_map if not called from
the driver thread
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This makes us properly handle gl_ClipDistance and gl_CullDistance.
Fixes: 19064b8c "nir: Add a pass for gathering transform feedback info"
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
We needed to better handle cases where a chunk of a variable starts at
some non-zero location_frac and rolls over into the next slot but may
not be more than 4 dwords. For example, if gl_CullDistance is an array
of 3 things and has location_frac = 2, it will span across two vec4s but
is not, itself, bigger than a vec4. If you ignore the clip/cull special
case, it's not allowed to happen for anything else because the only
things that can span more than one slot is dvec3 and dvec4 and they're
both bigger than a vec4. The current code uses this attrib_slot thing
where we count attribute slots and iterate over them. However, that
doesn't work in the case above because gl_CullDistance will have an
attrib_slot count of 1 even though it does span two slots. We could fix
this by adjusting attrib_slot but we already have comp_mask and it's
easier to just handle it that way.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Instead of going to all the work of to combine them into one array, just
make two arrays and use location_frac to colocate them within CLIP0.
Then the back-end can sort things out and stack them on top of each
other. Thanks to ef99f4c8, we also don't need to set compact anymore.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use the 'UNK31' bit (which should probably be called 'BUFFER') for
samplerBuffer case, which increases the size of supported buffer
texture beyond 2^15 elements.
Also need to fix the 2nd coord injected to handle the tex instructions
that take integer coords.
Fixes dEQP-GLES31.functional.texture.texture_buffer.render.as_fragment_texture.buffer_size_131071
and similar
Signed-off-by: Rob Clark <robdclark@gmail.com>
... instead of isam. It seems like when using isam, plus atomics, we
can have the problem of old data being in the texture cache. Plus this
way we don't have to load a component at a time.
Note that blob still seems to use isam in some cases. I suppose it might
be preferable in the case of loading a single component, when atomics
are not in the picture (or that the ssbo does not need to otherwise be
coherent).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Resync disasm and instr header from envytools, and add ldib encoding.
This replaces an opcode from a3xx which was never seen in practice,
since that seemed easier than dealing with the same opc # meaning a
different thing on a6xx. (Not really sure if 'sti' was actually a
real thing, I think it was only seen in fuzzing.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fixes
dEQP-GLES3.functional.shaders.indexing.varying_array.vec3_dynamic_write_dynamic_loop_read
regression.
Fixes: c1a27ba9ba freedreno/ir3: HIGH reg w/a for a6xx
Signed-off-by: Rob Clark <robdclark@gmail.com>
The variant will be NULL if RA failed. Which isn't ideal, but at least
lets not segfault and bring down the rest of the dEQP run with us.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The wrmask is handled in regmask_get()/regmask_set(), but it wasn't
being propagated from SSA src to dst. So for example, an SSBO read
value that is passed in as src2.y component to atomic op, wasn't
getting the (sy) flag set. Causing lots of fail.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Even in a very basic shader this reduces the time spent in
nir_copy_prop() by ~17%.
No shader-db changes for radeonsi NIR or i965.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
While I haven't tested them all, given that they're all using the same
allocation paths and modifiers in the kernel they should be fine to use in
the same way.
v2: Rebase on other kmsro changes.
v3: Skip repeated '[with_gallium_kmsro,' in the meson build.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Commit 08bfd710a2. (nir/dead_cf: Stop
relying on liveness analysis) introduced a new check that iterated
through a SSA def's uses, to see if it's used. But it only checked
normal uses, and not uses which are part of an 'if' condition. This
led to it thinking more nodes were dead than possible.
Fixes Piglit's variable-indexing/tcs-output-array-float-index-wr test
(and related tests) with the out-of-tree Iris driver.
Fixes: 08bfd710a2 nir/dead_cf: Stop relying on liveness analysis
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This gets stencil and depth resolves working properly.
Fixes:
dEQP-GLES3.functional.fbo.msaa.2_samples.depth32f_stencil8
dEQP-GLES3.functional.fbo.msaa.2_samples.depth24_stencil8
dEQP-GLES3.functional.fbo.msaa.4_samples.depth32f_stencil8
dEQP-GLES3.functional.fbo.msaa.4_samples.depth24_stencil8
dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_color
dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_color
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Blitter does support it after all. Previous attempt to use R8_UINT
failed because we overwrote the a6xx format in emit_blit_texture(),
but some of the later setup still looked at the gallium format.
If we overwrite it in the pipe_blit_info before we even call into
emit_blit_texture() it works properly.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Fullscreening and unfullscreening a totem window while playing a video
sometimes results in the video subsurface not changing size along. This
is also reproducible with epiphany.
If a surface gets resized while we have an active back buffer for it, the
resized dimensions won't get neither immediately applied on the resize
callback, nor correctly synchronized on update_buffers(), as the
(now stale) surface size and currently attached buffer size still do match.
There's actually 2 things to synchronize here, first the surface query
size might not be updated yet to the wl_egl_window's (i.e. resize_callback
happened while there is a back buffer), and second the wayland buffers
would need dropping if new surface size differs with the currently attached
buffer. These are done in separate steps now.
https://bugzilla.redhat.com/show_bug.cgi?id=1650929https://bugs.freedesktop.org/show_bug.cgi?id=109594
Fixes: a9fb331ea7 ("wayland/egl: update surface size on window resize")
Signed-off-by: Carlos Garnacho <carlosg@gnome.org>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Tested-by: Bastien Nocera <hadess@hadess.net>
Tested-by: Denys Kostin <denys.kostin@globallogic.com>
The cacheline size was a requirement for using the BLT engine, which
we don't use anymore except for a few things on old HW, so we drop it.
Fixes CTS's CL#3500 test:
dEQP-VK.api.image_clearing.core.clear_color_image.2d.linear.single_layer.r8g8b8_unorm
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This uses prog_to_nir to translate ARB assembly programs to NIR.
Co-authored by Tim Arceri, Dave Airlie, and Ken Graunke:
- [Tim Arceri]: original patch
- [Dave Airlie]: fix crashes with parameter names
- [Ken Graunke]:
- Rebase on SCALAR_ISA cap, lower wpos_ytransform too.
- Rebase on streamout fixes.
- Lower system values for fragcoord support.
- Don't try to use prog_to_nir for ATI_fragment_shader programs.
- Create TGSI for fixed-function or ARB vertex shaders even if the
driver prefers NIR, so we can create draw module shaders for
feedback/select emulation, which rely on TGSI.
Tested on:
- iris (Intel Skylake/Kabylake): Piglit & GL CTS - Ken Graunke
- radeonsi (AMD Vega 64): Piglit - Ken Graunke
- vc4/v3d - Piglit - Eric Anholt
- freedreno - dEQP - Kristian Høgsberg
Fixes lit_degenerate_case on vc4 and v3d, and vp-address-01,
vp-arl-constant-array-huge-offset-neg, and vp-arl-neg-array on v3d.
No Piglit regressions on radeonsi; no dEQP regressions on freedreno.
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Even if the driver wants to use NIR shaders, we may need to have TGSI
tokens for creating draw module vertex shaders for the feedback/select
render modes.
So...if the st_vertex_program has any TGSI...copy it to the variant.
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
ARB_vertex_program and ARB_fragment_program define 0^0 = 1 (while GLSL
leaves it undefined). Performing fpow lowering in NIR would break this
behavior, preventing us from using prog_to_nir.
According to llvm/lib/Target/AMDGPU/SIInstructions.td, POW_common
expands to <V_LOG_F32_e32, V_EXP_F32_e32, V_MUL_LEGACY_F32_e32>,
which presumably does a zero-wins multiply.
Lowering in NIR results in a non-legacy multiply, where:
pow(0, 0) = 2^(log2(0) * 0)
= 2^(-INF * 0)
= 2^(-NaN)
= -NaN
which isn't the desired result.
This reverts:
- commit d6b7539206
(ac/nir: remove emission of nir_op_fpow)
- commit 22430224fe
(radeonsi/nir: enable lowering of fpow)
and prevents a regression in gl-1.0-spot-light with AMD_DEBUG=nir
after enabling prog_to_nir in st/mesa later in this series.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The nine state tracker can produce NIR uniform variables
whose location is explicitly set. radeonsi did not take that
into account when calculating const_file_max, resulting in
rendering glitches. This patch fixes that.
Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
In the new Intel Iris driver, I am using Tim's new packed uniform
storage system. It works great, with one caveat: our scalar compiler
backend assumes that uniform offsets will be aligned to the underlying
data type. For example, doubles must be 64-bit aligned, floats 32-bit,
half-floats 16-bit, and so on. It does not need any other padding.
Currently, _mesa_add_parameter aligns everything to 32-bit offsets,
creating doubles that have an unaligned offset. This patch alters
that code to align doubles to 64-bit offsets.
This may be slightly less optimal for drivers which can support full
packing, and allow reads from unaligned offsets at no penalty. We could
make this extra alignment optional. However, it only comes into play
when intermixing double and single precision uniforms. Doubles are
already not too common, and intermixed values (floats then doubles)
is probably even less common. At most, we burn a single 32-bit slot
to the alignment, which is not that expensive. So, it doesn't seem
worthwhile to add the extra complexity.
Eventually, we'll likely want to update this code to allow half-float
values to be packed tighter than 32-bit offsets. At that point, we'll
probably want to revisit what drivers ultimately want, and add options.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
I'd like to use this in the prog_parameter.c code, so I need to move it
into C, make it non-static, and so on. This probably isn't the ideal
place for it, but I couldn't think of a better one.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Last commit limited the CI to master and MRs, but to avoid having to
manually trigger CI runs, let's add a 3rd, automatic way: by pushing to
a branch named `ci/*` (or `ci-*` or just `ci`) (which you can delete
afterwards, the pipeline results will remain).
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Runs on random other branches (stables RCs, personal forks) can still be
triggered manually via the web interface, or an app using the API.
This should massively help with the current voracious state of our CI.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
So that the signature is correct and consistent, the inputs to a export
intrinsic should always be 32-bit floats.
This and the previous commit fixes a large amount crashes from
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_int_*
tests
Fixes: b722b29f10 ('radv: add support for 16bit input/output')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
16-bit outputs are stored as 16-bit floats in the outputs array, so they
have to be bitcast.
Fixes: b722b29f10 ('radv: add support for 16bit input/output')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
For V3D 3.x, we emitted the ldvpms all at the top so that we didn't need
to do VPM setup when the load_inputs are out of order. For V3D 4.x, we
can reduce register pressure by delaying our loads until they're actually
needed. This also avoids a bunch of silly MOVs in the pre-opt VIR dump.
total instructions in shared programs: 6421415 -> 6419933 (-0.02%)
total uniforms in shared programs: 2393139 -> 2393140 (<.01%)
total threads in shared programs: 153864 -> 153906 (0.03%)
The execute.file check used to be good enough, until I stopped setting up
the execute mask for uniform ifs.
No known tests fixed, noticed while doing a refactor.
Fixes: 0805060573 ("v3d: Handle dynamically uniform IF statements with uniform control flow.")
Now that we don't have the vir_PF() magic, it's obvious that we were doing
the wrong thing for f2b32 by allowing -0.0 to produce true instead of
false.
You were allowed to pass in any old temp so that you could hopefully fold
the PF up into the def of the temp. If we couldn't find one, it
implicitly generated a MOV(nop, reg). However, that PF could have
different behavior depending on whether the def being folded into was a
float or int opcode, which the caller doesn't necessarily control.
Due to the fragility of the function, just switch all callers over to
vir_set_pf(). This also encourages the callers to use a _dest call for
the inst they're putting the PF on, eliminating a bunch of temps in the
pre-optimization VIR.
shader-db says the change is in the noise:
total instructions in shared programs: 6226247 -> 6227184 (0.02%)
instructions in affected programs: 851068 -> 852005 (0.11%)
Both were doing the same thing to try to get a condition to predicate on.
Noticed when I wanted to do this for discard_if as well.
No change in shader-db.
The NIR lowering works fine, though it causes some slight noise due to
what looks like choices about propagating constants up multiply chains
changing.
total instructions in shared programs: 6229671 -> 6229820 (<.01%)
total uniforms in shared programs: 2312171 -> 2312324 (<.01%)
Fixes some stalls in 3DMMES's main vertex shader.
total instructions in shared programs: 6280751 -> 6211270 (-1.11%)
instructions in affected programs: 2935050 -> 2865569 (-2.37%)
Apparently we need disable-EZ flagged, not just "does Z writes".
Fixes
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo
on 7278, even though it passed in simulation.
Signed-off-by: Eric Anholt <eric@anholt.net>
Fixes: 051a41d3d5 ("v3d: Add support for the early_fragment_tests flag.")
Fixes intermittent fails in
dEQP-GLES31.functional.draw_indirect.compute_interop.separate.drawelements_compute_cmd_and_data_and_indices
and others (particularly when run as part of a CTS run)
Otherwise, we might have pages accessible that shouldn't be and miss out
on errors. This is unlikely for most tests since v3d_hw_get_mem() is big
enough that it'll be a freshly zeroed mmap, but if screens are destroyed
and recreated then we'd be reusing the old v3d_hw_get_mem() contents.
From the table in isl_format.c, it appears that all generations
support blending on 32-bit float surfaces.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If EXT_float_blend is not supported, error out on blending of FP32
attachments in an ES2 context.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The new encoding returns a value via the 2nd src. The legalize pass
needs to be aware of this to set the correct needs_sy flag, otherwise we
can, in cases where the atomic dst is not used, overwrite the register
that hardware will asynchronously load result into without (sy) flag, so
it gets clobbered by the atomic result.
This fixes a whole lot of rando ssbo+atomic fails, like
dEQP-GLES31.functional.ssbo.layout.single_basic_type.packed.highp_vec4.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Since gl_HelperInvocation is lowered to:
!((1 << sample_id) & sample_mask_in))
Not setting these enable bits was causing it be broken. (And probably a
bunch of other stuff too.)
Fixes dEQP-GLES31.functional.shaders.helper_invocation.*
Signed-off-by: Rob Clark <robdclark@gmail.com>
This fixes invalid access to Attachment array which would occur if caller
would exceed MaxColorAttachments. In practice this should not ever happen
because DiscardFramebufferEXT specifies only GL_COLOR_ATTACHMENT0 to be
valid and InvalidateFramebuffer will error out before but this should
make coverity happy.
v2: const, remove _EXT (Ian)
CID: 1442559
Fixes: 0c42b5f3cb "mesa: wire up InvalidateFramebuffer"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This fixes issues where polygons that should be culled (due to negative
w, for instance) may not be.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
The idea here is to reassociate a * (b * c) into (a * c) * b, when
b is a non-constant value, but a and c are constants, allowing them
to be combined.
But nothing was enforcing that 'b' must be non-constant, which meant
that running opt_algebraic in a loop would never terminate if the IR
contained non-folded constant expressions like 256 * 0.5 * 2. Normally,
we call constant folding in such a loop too, but IMO it's better for
nir_opt_algebraic to be robust and not rely on that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109581
Fixes: 32e266a9a5 i965: Compile fp64 funcs only if we do not have 64-bit hardware support
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Object handles are local to the device fd, so double check we are not
mixing together objects from multiple screens on execbuf submission.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Soon we'll need this logic to deal w/ image/SSBO case, so split out a
helper rather than duplicate the logic.
Signed-off-by: Rob Clark <robdclark@gmail.com>
It seems like some instructions (noticed this w/ cat3), cannot read HIGH
regs.. cat1 (mov/cov) can, and possibly some/all of cat2.
The blob seems to stick w/ an extra mov into low regs. So lets do the
same.
This fixes WGID on a6xx, which unsurprisingly is related to a lot of
deqp compute fails.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Images and SSBOs don't map directly to the hw. They end up being part
texture and part something else. Starting with a6xx, the hack used for
a5xx to smash the image tex state into hw texture state starting from
MAX counting down won't work, because we start using tex state also for
SSBO read.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Note that image/ssbo support is currently only implemented for a5xx.
But the instruction encoding is the same for a4xx.
Signed-off-by: Rob Clark <robdclark@gmail.com>
We probably need to rethink how we detect which instruction first
defines higher register classes. But for now, this at least fixes
the symptom.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The elements added into a vector should have the same type as the
first one, otherwise this hits an assertion in LLVM.
Fixes: 4b3549c084 ("radv: reduce the number of loaded channels for vertex input fetches")
reported-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
After the previous changes to emulate the ETC/EAC formats using the
secondary shadow miptree, the etc_format field of the intel_mipmap_tree
struct became redundant and the remaining check that used it has been
replaced. (Nanley Chery)
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
OES_copy_image extension was disabled on Gen7 due to the lack of support
for ETC2 images. Enabled it back. (Kenneth Graunke)
v2:
- Removed the blank lines in the comments above OES_copy_image and
OES_texture_view extensions in intel_extensions.c (Nanley Chery)
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
For CopyImageSubData to copy the data during the 1st draw call, we need
to update the shadow tree right before the rendering.
v2:
- Added assertion that the miptree doesn't need update at the time we
update the texture surface. (Nanley Chery)
v3:
- As we now update the tree before the rendering we don't need to copy
the data during the unmap anymore. Removed the unnecessary update from
the intel_miptree_unmap in intel_mipmap_tree.c (Nanley Chery)
v4:
- Fixed unrelated empty line removal (Nanley Chery)
- As now the intel_upate_etc_shadow of intel_mipmap_tree.c is only
called inside its following function, we don't need to declare it at
the top of the file anymore. (Nanley Chery)
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
GPUs Gen < 8 cannot sample ETC2 formats. So far, they converted the
compressed EAC/ETC2 images to non-compressed RGBA images. When
GetCompressed* functions were called, the pixels were returned in this
RGBA format and not the compressed format that was expected.
Trying to fix this problem, we use a secondary shadow miptree to store the
decompressed data for the rendering and the main miptree to store the
compressed for the Get functions to work. Each time that the main miptree
is written with compressed data, we decompress them to RGB and update the
shadow. Then we use the shadow for rendering.
v2:
- Fixes in the commit message (Nanley Chery)
- Reversed the changes in brw_get_texture_swizzle and swapped the b, g
values at the time that we decompress the data in the function:
intel_miptree_update_etc_shadow of intel_mipmap_tree.c (Nanley Chery)
- Simplified the format checks in the miptree_create function of the
intel_mipmap_tree.c and reserved the call of the
intel_lower_compressed_format for the case that we are faking the ETC
support (Nanley Chery)
- Removed the check for the auxiliary usage for the shadow miptree at
creation (miptree_create of intel_mipmap_tree.c) as we won't use
auxiliary buffers with these types of trees (Nanley Chery)
- Set the etc_format of the non-ETC miptrees to MESA_FORMAT_NONE and
removed the unecessary checks (Nanley Chery)
- Fixed an unrelated indentation change (Nanley Chery)
- Modified the function intel_miptree_finish_write to set the
mt->shadow_needs_update to true to catch all the cases when we need to
update the miptree (Nanley Chery)
- In order to update the shadow miptree during the unmap of the
main and always map the main (Nanley Chery) the following change was
necessary: Splitted the previous update function that was updating all
the mipmap levels and use two functions instead: one that updates one
level and one that updates all of them. Used the first during unmap
and the second before the rendering.
- Removed the BRW_MAP_ETC_BIT flag and the mechanism to decide which
miptree should be mapped each time and reversed all the changes in the
higher level texture functions that upload data to textures as they
aren't needed anymore.
- Replaced the boolean needs_fake_etc with an inline function that
checks when we need to fake the ETC compression (Nanley Chery)
- Removed the initialization of the strides in the update function as
the values will be overwritten by the intel_miptree_map call (Nanley
Chery)
- Used minify instead of division in the new update function
intel_miptree_update_etc_shadow_levels in intel_mipmap_tree.c (Nanley
Chery)
- Removed the depth from the calculation of the number of slices in
the new update function (intel_miptree_update_etc_shadow_levels of
intel_mipmap_tree.c) as we don't need to support 3D ETC images.
(Nanley Chery)
v3:
- Renamed the rgba_fmt in function miptree_create
(intel_mipmap_tree.c) to decomp_format as the format is not always in
rgba order. (Nanley Chery)
- Documented the new usage for the shadow miptree in the comment above
the field in the intel_miptree struct in intel_mipmap_tree.h (Nanley
Chery)
- Removed the redundant flags from the mapping of the miptrees in
intel_miptree_update_etc_shadow of intel_mipmap_tree.c (Nanley Chery)
- Fixed the switch from surface's logical level to physical level in
the intel_miptree_update_etc_shadow_levels of intel_mipmap_tree.c
(Nanley Chery)
- Excluded the Baytrail GPUs from the check for the ETC emulation as
they support the ETC formats natively. (Nanley Chery)
- Simplified the check if the format is BGRA in
intel_miptree_update_etc_shadow of intel_mipmap_tree.c (Nanley Chery)
v4:
- Removed the functions intel_miptree_(map|unmap)_etc and the check if
we need to call them as with the new changes, they became unreachable.
(Nanley Chery)
- We'd rather calculate the level width and height using the shadow
miptree instead of the main in intel_miptree_update_etc_shadow_levels of
intel_mipmap_tree.c (Nanley Chery)
- Fixed the format in the mt_surface_usage, set at the miptree creation,
in miptree_create of intel_mipmap_tree.c (Nanley Chery)
v5:
- Fixed the levels calculations in intel_mipmap_tree.c (Nanley Chery)
- Update the flag shadow_needs_update outside the function
intel_miptree_update_etc_shadow (Nanley Chery)
- Fixed indentation error (Nanley Chery)
v6:
- Fixed typo in commit message (Nanley Chery)
- Simplified the assignment of the mt_fmt in the miptree_create of the
intel_mipmap_tree.c (Nanley Chery)
- Combined declarations and assignments where it was possible in the
intel_miptree_update_etc_shadow and
intel_miptree_update_etc_shadow_levels of the intel_mipmap_tree.c
(Nanley Chery)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81843
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104272
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This was probably useful when it was first written, however it
looks to be no longer necessary.
As far as I can tell these days dce is smart enough to remove useless
instructions from if branches. Once this is done
nir_opt_peephole_select() will end up removing the empty if.
Removing this support reduces the dolphin uber shader compilation
time spent in nir_opt_dead_cf() by a little over 7x.
No shader-db changes on i965 or radeonsi.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
- Ensure all threads have optimal floating-point control state
- Disable auto-generation of fused FP ops for VERTEX shader stage
- Disable "fast" FP ops for VERTEX shader stage
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
The intrinsic returns the number of leading zeros, not the bit number of
the first nonzero, so just flip it based on the mask size
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
normalized and scaled formats also return floats.
Fixes: 4b3549c084 ("radv: reduce the number of loaded channels for vertex input fetches")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
v2: Fix silly bug in logic. s/||/&&/
All but one of the affected shaders is in an Unreal4 demo. The other is
in Tomb Raider. All of the cases that Ian investigated appear to be
sequences like the following
if (int(uint(some_float)) < 0) /* other relations too */
...
At least in Tomb Raider, it's not obvious that this sequence came from
the original shader.
In some of the Unreal demos, the shader contains code like
if (int(uint(textureLod(...))) > 0)
...
which explicitly generates the offending sequence.
All Gen6+ platforms had similar results (Skylake shown):
total instructions in shared programs: 15437170 -> 15437187 (<.01%)
instructions in affected programs: 4492 -> 4509 (0.38%)
helped: 0
HURT: 17
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.05% max: 0.73% x̄: 0.66% x̃: 0.73%
95% mean confidence interval for instructions value: 1.00 1.00
95% mean confidence interval for instructions %-change: 0.57% 0.75%
Instructions are HURT.
total cycles in shared programs: 383007996 -> 383007992 (<.01%)
cycles in affected programs: 20542 -> 20538 (-0.02%)
helped: 6
HURT: 7
helped stats (abs) min: 2 max: 6 x̄: 5.33 x̃: 6
helped stats (rel) min: 0.11% max: 0.36% x̄: 0.32% x̃: 0.36%
HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel) min: 0.27% max: 0.27% x̄: 0.27% x̃: 0.27%
95% mean confidence interval for cycles value: -3.30 2.69
95% mean confidence interval for cycles %-change: -0.19% 0.19%
Inconclusive result (value mean confidence interval includes 0).
No changes on Iron Lake or GM45.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109404
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: nagrigoriadis@gmail.com
Tested-by: Danylo Piliaiev <danylo.piliaiev@gmail.com>
We emit an FBL instruction which only exists since Gen7. This prevents
the test from segfaulting when run with TEST_DEBUG=1.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Add compute shader initilization, assign and cleanup in vl_compositor API.
Set video compositor compute shader render as default when pipe support it.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Rename csc_matrix to shader_params, and increase shader_params size
to store more constants for compute shader,
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Split vl_compositor graphic shaders from vl_compositor API in order to share
vl_compositor API with vl_compositor compute shader later.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
In opt_peel_initial_if optimization, when moving the continue list to
end of the continue block, before the jump, could happen that the
continue list itself also ends with a jump.
This would mean that we would have two jump instructions in a row: the
first one from the continue list and the second one from the contine
block.
As inserting an instruction after a jump is not allowed (and it does not
make sense, as it will not be executed), remove the jump from the
continue block and keep the one from continue list, as it will be
executed first.
CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
opt_split_alu_of_phi moves ALU instruction to the end of continue block.
But if the continue block ends with a jump instruction (an explicit
"continue" instruction) then the ALU must be inserted before the jump,
as it is illegal to add instructions after the jump.
CC: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 0881e90c09 ("nir: Split ALU instructions in loops that read phis")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Instead of generating a GL_INVALID_ENUM error when the type or format
is incorrect while using glClear{Named}Buffer{Sub}Data, generate
GL_INVALID_VALUE.
From page 72 (page 94 of the PDF) of the OpenGL 4.6 spec:
" An INVALID_VALUE error is generated if type is not one of the
types in table 8.2.
An INVALID_VALUE error is generated if format is not one of the
formats in table 8.3."
Fixes the following test:
KHR-GL45.direct_state_access.buffers_errors
v2: correct the doxygen documentation.
Cc: Pi Tabred <servuswiegehtz@yahoo.de>
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
We've noticed the Team Fortress 2 engine seems to do many small
calls to glSubData(..). Let's pick our heuristic based on the
resource base width, not the size of a particular upload.
This will cause transfers to be batched together in the transfer
queue.
Revelant glbench microbenchmark --
Before: buffer_upload_dynamic_element_array_131072 = 131.17 mbytes_sec
After: buffer_upload_dynamic_element_array_131072 = 6828.24 mbytes_sec
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
This improves Unigine Valley benchmark by 3 to 10 fps (depending
on the scene).
It also improves the Team Fortress 2 benchmark from 6 fps to 13
fps (host: 20 fps).
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Transfers will be placed here at unmap time instead of incurring
a VM exit. There's an attempt to deduplicate intersecting 1D transfers,
which are surprisingly common.
This can also help with mipmapped texture upload and smaller
textures, where the majority of the time is spent in the guest
kernel / QEMU -- not virglrenderer. This is shown by the GLbench
texture upload benchmark:
Before:
texture_upload_rgba_teximage2d_32 = 64.23 mtexel_sec
After:
texture_upload_rgba_teximage2d_32 = 367.44 mtexel_sec
v2: Split up list iteration functions (@gerddie)
v3: Support for optimizing glBufferSubData
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
The idea is to have two command buffers:
1) One for transfers
2) One for commands, which can include transfers
At flush time, (2) will be filled. Otherwise, (1) will be
used to submit transfers if there are enough of them.
v2: Pass size directly to cmd_buf_create (@gerddie)
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Much of our logic is based around the idea the upper 16 bits
of a command dword can encode the length of the command.
Now that the command buffer >= 2^16 - 1, we should check for
this.
v2: alignment, and only check VIRGL_ENCODE_MAX_DWORDS
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Let's define a helper function and use it.
This commit also allows resources to be emitted into different command
buffers.
Like the ioctls, send 0 for layer_stride and stride. If we actually
send the real values, there are various assumptions in virglrenderer
for non-1D buffers that may need to be modified.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Mostly similar to VIRGL_CCMD_RESOURCE_INLINE_WRITE. However, this
uses the resource's already attached iovecs rather than the command
buffer to transfer the data.
v2: Used (1 << 16) not (1 << 15) [@gerddie]
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Since we're just uploading to guest memory, let's just align to dword
size.
Fixes: e0f932 ("u_upload_mgr: pass alignment to u_upload_data manually")
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
We have cases where we would not like to expose these.
v2: call the option allow_rgb565_configs for consistency
with existing allow_rgb10_configs (Eric, Jason)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There are a few differenes between Mali T860 (Panfrost's primary
reference target) and the older Midgard generations (T600/T700):
- Miscellaneous different magic numbers. It's not clear what these
numbers mean on either the old or new configurations yet.
- Errata fixes. T800 is the final Midgard generation and presumably the
least buggy. Older Midgard has some extra hardware errata we have to
workaround.
- SFBD vs MFBD split. Essentially, older Midgard use a Single
FrameBuffer Descriptor (SFBD), which corresponds to single
render-target rendering. Newer Midgard (T760+) use a Multiple
FrameBuffer Descriptor (MFBD), allowing multiple RTs. On ES 2.0, these
descriptors serve the same function, but we implement both, depending on
the version of the hardware.
- CPU bitness. 32-bit systems generally use 32-bit GPU descriptors, and
vice versa for 64-bit. Our target T760 systems are 32-bit whereas our
target T860 systems are 64-bit. More work is needed in this area.
This patch fixes support in these areas for supporting older Midgard
hardware. It is tested on Mali T760 and Mali T860.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
The liveness analysis pass is fairly expensive because it has to build
large bit-sets and run a fix-point algorithm on them. Instead of
requiring liveness for detecting if values escape a CF node, just take
advantage of the structured nature of NIR and use block indices instead.
This only requires the block index metadata which is the fastest we have
metadata to generate.
No shader-db changes on Kaby Lake
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We want to handle live SSA values differently and it's going to involve
walking the instructions. We can make it a single instruction walk if
we combine it with cf_node_has_side_effects.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This fixes a bug in runscape where we were optimizing x >> 16 to an
extract and then negating and converting to float. The NIR to fs pass
was dropping the negate on the floor breaking a geometry shader and
causing it to render nothing.
Fixes: 1f862e923c "i965/fs: Optimize float conversions of byte/word..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109601
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Unfortunately swr was missed in the original commit. The number of
varyings should generally match up to what's reported as the shader
caps for fragment inputs.
Fixes: 6010d7b8e8 (gallium: add PIPE_CAP_MAX_VARYINGS)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Alok Hota <alok.hota@intel.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Just a little higher up in the function we assert that the aspect masks
are actually equal so there's no reason for the weaker check. Also, the
temporary variables were causing compiler warnings in release builds.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
[28/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_gather_xfb_info.c.o'.
../src/compiler/nir/nir_gather_xfb_info.c: In function ‘nir_gather_xfb_info’:
../src/compiler/nir/nir_gather_xfb_info.c:171:13: warning: variable ‘max_offset’ set but not used [-Wunused-but-set-variable]
unsigned max_offset[NIR_MAX_XFB_BUFFERS] = {0};
^~~~~~~~~~
[36/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_instr_set.c.o'.
../src/compiler/nir/nir_instr_set.c:502:1: warning: ‘instr_each_src_and_dest_is_ssa’ defined but not used [-Wunused-function]
instr_each_src_and_dest_is_ssa(nir_instr *instr)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
spirv_to_nir can generate input/output variables which are illegal
for the current shader stage, which would cause nir_validate_shader
to balk. After my recent commit to start decorating arrays as compact,
dEQP-VK.spirv_assembly.instruction.graphics.module.same_module started
hitting validation errors due to outputs in a TCS (not intended for the
TCS at all) not being per-vertex arrays.
Thanks to Jason Ekstrand for suggesting this approach.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109573
Fixes: ef99f4c8d1 compiler: Mark clip/cull distance arrays as compact before lowering.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
My patch to switch from struct-based MOCS to numeric MOCS accidentally
divided all MOCS entries by 2 in the Vulkan driver.
MOCS on Gen9+ is just an array index into a table. But in the hardware
packets, the index starts at bit 1. So we need to shift it.
Fixes: 0b44644ca6 (genxml: Consistently use a numeric "MOCS" field)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
assert()-based tests make no sense without asserts, so make sure asserts
are compiled in, even if the rest of the code has asserts turned off.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
assert()-based tests make no sense without asserts, so make sure asserts
are compiled in, even if the rest of the code has asserts turned off.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
There was an issue recently caused by the system header being included
by mistake, so let's just get rid of this include path and always
explicitly #include "drm-uapi/FOO.h"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
These headers are used by a lot more than just the intel drivers nowadays.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
We should check that num_channels is 4, otherwise that breaks
the world. Sorry for the short breakage.
Fixes: 4b3549c084 ("radv: reduce the number of loaded channels for vertex input fetches")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
I think this will save an instruction and hopefully not increase any other
costs (possibly the immediate -1 and 1?), but I haven't actually tested.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Drops one instruction from fs-sign-int.shader_test. No change in
shader-db due to it having 0 instances of sign(genIType). This may hurt
isign64 if algebraic runs before int64 lowering, but I wasn't sure how to
mark the algebraic opt as "every bit size but 64".
v2: Update commit message about shader-db.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Everything should be in ssa form when we call this. This is a
hotpath so replace the check with an assert.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Everthing should be in ssa form when this is called. Checking
for it here is expensive so turn this into an assert instead.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
There is no need to hash the instruction twice, especially as we
end up adding it in the majority of cases.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently the Intel "anvil" driver races with the generation of genxml
files, while i965 has an explicit dependency. This patch adds the same
dependency to anvil.
Fixes: d1992255bb
("meson: Add build Intel "anv" vulkan driver")
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Allows drivers using `u_pipe_screen_get_param_defaults` to use a
fallback value for the new pipe cap. Default value of 8 based on GL 2.1
MAX_VARYING_FLOATS
Reviewed-by: Eric Anholt <eric@anholt.net>
Use ir3_next_varying() for iterating through varyings and unset the
global point coord invert bit.
Fixes:
dEQP-GLES3.functional.shaders.builtin_variable.pointcoord
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
We need to set UNK3 in GRAS_CNTL and RB_RENDER_CONTROL0 for the value
to be reliably delivered.
Fixes:
dEQP-GLES3.functional.shaders.builtin_variable.frontfacing
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
This pulls in changes for compute shaders and a6xx ssbo/image support.
FACENESS bit moved from position 1 to 2 and there's a global invert
bit for point coord.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Because none of them have been picked up for 19.0 due to this bug
being reintroduced.
v2: - Fix fixes tags
Fixes: e6b3a3b201
("bin/get-pick-list.sh: handle "typod" usecase.")
Fixes: fac10169bb
("bin/get-pick-list.sh: prefix output with "[stable] "")
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
I tried bumping the limit on make and scons instead, but that just
thrashed the runners, so let's not do that (sorry @daniels :]).
Instead, remove the automatic thread management from ninja and limit it
to 4 instead, in line with make and scons.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
if we have something like this:
loop {
...
if x {
break;
} else {
continue;
}
}
opt_if_loop_last_continue returns true marking progress allthough nothing
changes.
Fixes: 5921a19d4b "nir: add if opt opt_if_loop_last_continue()"
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Stop using 12.12 quantization for viewports that are not contained in
the lower 4k corner of the render target as the hardware needs to keep
both absolute and relative coordinates representable.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
This extension simply drops a draw time restriction:
"Furthermore, an INVALID_OPERATION error is generated by
DrawArrays and the other drawing commands defined in section
2.8.3 (10.5 in ES 3.1) if blending is enabled (see below) and
any draw buffer has 32-bit floating-point format components."
We never correctly enforced this restriction anyway, so we were
basically already implementing it. We just need to advertise it
for our behavior to be correct.
The extension requires EXT_color_buffer_float, but we already enable
that via dummy_true. So we can dummy_true this one as well.
Found while debugging WebGL conformance tests. Does not fix any.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Without using this function, we fail the -Wswitch flag when compiling
the default debugoptimized mode in Meson
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This can happen when we record a VkCmdDraw in a secondary buffer that
was created inheriting from the primary buffer, but with the framebuffer
set to NULL in the VkCommandBufferInheritanceInfo.
Vulkan 1.1.81 spec says that "the application must ensure (using scissor
if neccesary) that all rendering is contained in the render area [...]
[which] must be contained within the framebuffer dimesions".
While this should be done by the application, commit 465e5a86 added the
clamp to the framebuffer size, in case of application does not do it.
But this requires to know the framebuffer dimensions.
If we do not have a framebuffer at that moment, the best compromise we
can do is to just apply the scissor as it is, and let the application to
ensure the rendering is contained in the render area.
v2: do not clamp to framebuffer if there isn't a framebuffer
v3 (Jason):
- clamp earlier in the conditional
- clamp to render area if command buffer is primary
v4: clamp also x and y to render area (Jason)
v5: rename used variables (Jason)
Fixes: 465e5a86 ("anv: Clamp scissors to the framebuffer boundary")
CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This removes some scalar loads from shaders, but it increases
the number of SET_SH_REG packets. This is currently basic but
it could be improved if needed. Inlining dynamic offsets might
also help.
Original idea from Dave Airlie.
29077 shaders in 15096 tests
Totals:
SGPRS: 1321325 -> 1357101 (2.71 %)
VGPRS: 936000 -> 932576 (-0.37 %)
Spilled SGPRs: 24804 -> 24791 (-0.05 %)
Code Size: 49827960 -> 49642232 (-0.37 %) bytes
Max Waves: 242007 -> 242700 (0.29 %)
Totals from affected shaders:
SGPRS: 290989 -> 326765 (12.29 %)
VGPRS: 244680 -> 241256 (-1.40 %)
Spilled SGPRs: 1442 -> 1429 (-0.90 %)
Code Size: 8126688 -> 7940960 (-2.29 %) bytes
Max Waves: 80952 -> 81645 (0.86 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is needed in order to inline some push constants when possible.
This also adds a new helper for initializing the pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
"The C standard says that compound literals which occur inside of
the body of a function have automatic storage duration associated
with the enclosing block. Older GCC releases were putting such
compound literals into the scope of the whole function, so their
lifetime actually ended at the end of containing function. This
has been fixed in GCC 9. Code that relied on this extended lifetime
needs to be fixed, move the compound literals to whatever scope
they need to accessible in."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109543
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Patch adds nir_lower_tex_options as parameter to sample_plane so that
we don't need to extend nir_tex_instr for this.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
prog->SamplersUsed is set by the linker when validating resource limits,
while info->textures_used is gathered after NIR optimizations, which may
have eliminated some unused surfaces.
This may let us skip some work.
Reviewed-by: Eric Anholt <eric@anholt.net>
textures_used_by_txf is a subset of textures_used which is a subset
of prog->SamplerUnits. This should do nothing.
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric and I would like a bitmask of which samplers are used, similar to
prog->SamplersUsed, but available in NIR. The linker uses SamplersUsed
for resource limit checking, but later optimizations may eliminate more
samplers. So instead of propagating it through, we gather a new one.
While there, we also gather the existing textures_used_by_txf bitmask.
Gathering these bitfields in nir_shader_gather_info is awkward at best.
The main reason is that it introduces an ordering dependency between the
two passes. If gathering runs before lower_samplers_as_deref, it can't
look at var->data.binding. If the driver doesn't use the full lowering
to texture_index/texture_array_size (like radeonsi), then the gathering
can't use those fields. Gathering might be run early /and/ late, first
to get varying info, and later to update it after variant lowering. At
this point, should gathering work on pre-lowered or post-lowered code?
Pre-lowered is also harder due to the presence of structure types.
Just doing the gathering when we do the lowering alleviates these
ordering problems. This fixes ordering issues in i965 and makes the
txf info gathering work for radeonsi (though they don't use it).
Reviewed-by: Eric Anholt <eric@anholt.net>
Until now, prog_to_nir has been setting texture_index and sampler_index
directly. This is different than GLSL shaders, which create variable
dereferences and rely on lowering passes to reach this final form.
radeonsi uses variable dereferences for samplers rather than
texture_index and sampler_index, so it doesn't even make sense to set
them there. By moving to derefs, we ensure that both GLSL and ARB
programs produce the same final form that the driver desires.
Reviewed-by: Eric Anholt <eric@anholt.net>
An upcoming patch will start building derefs in prog_to_nir, at which
point we'll need to lower them to indexes.
This gets both GLSL and non-GLSL shaders using the same paths.
Reviewed-by: Eric Anholt <eric@anholt.net>
Passes like nir_lower_drawpixels add additional sampler variables,
and set an explicit binding which never changes. These extra samplers
don't have proper uniform storage associated with them, and there is no
way to update bindings via the API. So, for any 'hidden' variables,
just trust that there's an explicit binding set.
Reviewed-by: Eric Anholt <eric@anholt.net>
I would like to be able to run gl_nir_lower_samplers() to turn texture
and sampler variable dereferences into indexes and offsets, even for
ARB programs, and built-in shaders. This would make sampler handling
more consistent across the various types of shaders.
For GLSL programs, the gl_nir_lower_samplers_as_deref() pass looks up
the variable bindings in the shader program's uniform storage. But
ARB programs and built-in shaders don't have a gl_shader_program, and
uniform storage doesn't exist. In this case, we simply skip that
lookup, and trust var->data.binding to be set correctly by whoever
created the shader.
Reviewed-by: Eric Anholt <eric@anholt.net>
Piglit's vp-max-array test creates a vertex program containing a uniform
array sized to the value of GL_MAX_NATIVE_PROGRAM_PARAMETERS_ARB. Mesa
will then add additional state-var parameters for things like the MVP
matrix.
radeonsi currently exposes a value of 4096, derived from constant buffer
upload size. This means the array will have 4096 elements, and the
extra MVP state-vars would get a prog_src_register::Index of over 4096.
Unfortunately, prog_src_register::Index is a signed 13-bit integer, so
values beyond 4096 end up turning into negative numbers. Negative
source indexes are only valid for relative addressing, so this ends up
generating illegal IR.
In prog_to_nir, this would cause an out of bounds array access.
st_mesa_to_tgsi checks for a negative value, assumes it's bogus,
and remaps it to parameter 0 in order to get something in-range.
This isn't right - instead of reading the MVP matrix, it would read
the first element of the vertex program's large array. But the test
only checks that the program compiles, so we never noticed that it
was broken.
This patch limits the size of the program limits, with the understanding
that we may need to generate additional state-vars internally. i965 has
exposed 1024 for this limit for years, so I don't expect lowering it to
2048 will cause any practical problems for radeonsi or other drivers.
Fixes vp-max-array with prog_to_nir.c.
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes a rather astonishing problem that came up while debugging
an issue in the Vulkan CTS. Apparently the Vulkan CTS framework has
the tendency to create multiple VkDevices, each one with a separate
DRM device FD and therefore a disjoint GEM buffer object handle space.
Because the intel_dump_gpu tool wasn't making any distinction between
buffers from the different handle spaces, it was confusing the
instruction state pools from both devices, which happened to have the
exact same GEM handle and PPGTT virtual address, but completely
different shader contents. This was causing the simulator to believe
that the vertex pipeline was executing a fragment shader, which didn't
end up well.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The blitter doesn't seem to have a write mask, so for depth only and
stencil only blits to Z24S8 we cast the Z24S8 buffer to an RGBA UNORM8
buffer and fall back to pipeline blits with corresponding write mask.
Fixes
dEQP-GLES3.functional.fbo.blit.depth_stencil.depth24_stencil8_stencil_only
dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_depth
dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_depth
dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_depth
dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_depth
dEQP-GLES3.functional.fbo.msaa.2_samples.stencil_index8
dEQP-GLES3.functional.fbo.msaa.4_samples.stencil_index8
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
We need to allow overriding the format with that of the image or
sampler view, so we can't take it from the resource in fd6_tex_swiz().
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
The src coordinates are s24.8. For an inverted blit that ends at y=0
we need to program -1 for sy2, so we need to handle negative values
correctly.
Fixes
dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_mag_reverse_dst_y
dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_min_reverse_dst_y
dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_min_reverse_src_y
dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_color
dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_color
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
We can rewrite almost all depth stencil blits to various red-only
blits. The exception is depth-only or stencil-only blits into z24s8
combined depth stencil buffer. We can fall back for depth-only, but
stencil-only remains broken.
Fixes
dEQP-GLES3.functional.fbo.blit.depth_stencil.depth24_stencil8_basic
dEQP-GLES3.functional.fbo.blit.depth_stencil.depth24_stencil8_scale
dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_basic
dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_scale
dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_stencil_only
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
The explanation for the compressed format check is broken across two
comments:
/* We can blit if both or neither formats are compressed formats... */
/* ... but only if they're the same compression format. */
but the ok_format() checks were inserted between, breaking up the flow
of the sentence.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Call ctx->blit() and let it reject blits it can't do instead of giving
up on stencil blits and blits u_blitter can't do.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
We already check earlier in the call chain in fd_blit().
glBlitFramebuffer always sets render_condition_enable and thus we
would never try the blitter path for that.
Now that we get all of dEQP-GLES3.functional.fbo.blit.conversion.*
down this path, it turs out that the
fail_if(info->mask != util_format_get_mask(info->src.format));
fail_if(info->mask != util_format_get_mask(info->dst.format));
conditions weren't accurate. util_format_get_mask() returns
PIPE_MASK_RGBA for any format with any color channels, while
info->mask is the exact set of channels to blit. So we reject things
we could blit - for example, PIPE_FORMAT_R16G16_FLOAT where info->mask
is RG while util_format_get_mask() returns RGBA - and accept things we
can't. It turns out that the blitter is happy to blit different
number of channels, but fails to blit formats with different numerical
formats and srgb formats.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Update for a6xx.xml.h to incorporate a few new bits and changes to
blit src rect coordinate types.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Based on VA Spec,DeriveImage() returns VA_STATUS_ERROR_OPERATION_FAILED if driver
dont have support for internal surface formats.Currently vaDeriveImage()
failed for non-contiguous planes and operation failed error string is
required to support indirect manner i.e. vaCreateImage()+vaPutImage()
incase vaDeriveImage() failed with VA_STATUS_ERROR_OPERATION_FAILED.
This patch will notify to the client as operation failed with proper
error sting,so that client will fallback to vaCreateImage()+vaPutImage().
v2: updated commit message based on VA spec.
Signed-off-by: suresh guttula <suresh.guttula@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
When nir_rematerialize_derefs_in_use_blocks_impl was first written, I
attempted to optimize things a bit by not bothering to re-materialize
the sources of deref instructions figuring that the final caller would
take care of that. However, in the case of more complex deref chains
where the first link or two lives in block A and then another link and
the load/store_deref intrinsic live in block B it doesn't work. The
code in rematerialize_deref_in_block looks at the tail of the chain,
sees that it's already in block B and skips it, not realizing that part
of the chain also lives in block A.
The easy solution here is to just rematerialize deref sources of deref
instructions as well. This may potentially lead to a few more deref
instructions being created by the conditions required for that to
actually happen are fairly unlikely and, thanks to the caching, it's all
linear time regardless.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109603
Fixes: 7d1d1208c2 "nir: Add a small pass to rematerialize derefs per-block"
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
It's more clear and means we don't have to update the array every time
we add an optional texture instruction argument
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Instead of generating it from scratch in each forked repo. This should
save time, energy and storage. (The xserver & xf86-video-amdgpu CI
scripts do basically the same)
v2:
* Hardcode "mesa" instead of using $CI_PROJECT_NAME, to avoid breakage
if the project name is changed after forking (Eric Engestrom)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
For some reason we don't use view volume clipping by default, and use
scissors instead. These scissors were set to an 8k max fb size, while
the driver advertises 16k-sized framebuffers.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
Midgard has native support for QUADS and POLYGONS; Bifrost seemingly
does not. Thus, Midgard generally skips prim_convert whereas Bifrost
needs the pass; this patch allows the setting of allowed primitives to
occur on a per-context basis (for runtime hardware selection).
v2: Use (POLYGONS + 1) instead of LINES_ADJACENCY.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Robert Foss <robert.foss@collabora.com>
Various methods relating to resource management were previously marked
as kernel-specific, forcing them to stay downstream in the vendor
overlay and eventually be duplicated for DRM code. This patch adds back
this code in kernel-neutral space, allowing for code sharing and
minimising the diff to downstream.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Most Midgard instructions take two-arguments logically; there are always
two arguments at the assembly level. For the few instructions that take
only a single argument, generally the second argument slot is unused,
with a zero inline constant occupying the space. fmov/imov are the
exception, where the first argument is filled with r24 and the logical
argument is in the second slot.
Previously, these constraints were handled by a delicate, buggy series
of hacks. This commit removes these hacks. Instead, we look at the
logical number of arguments (from NIR), switching between two argument
and one-argument-one-zero style. We then introduce a quirk for the
flipped style, which applies to fmov/imov.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Nouveau apparently uses the u_screen helper but prints a warning in the
default case, so running any GL program would start grumbling.
Fixes: 8fa54bc549 gallium: Add a PIPE_CAP_NIR_COMPACT_ARRAYS capability bit.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
The sampler will be ignored since the underlying 'ld_mcs' operation
won't use it, so just fill the field with 0 instead of the texture to
make it clearer that's the case.
This will also avoid is_high_sampler() to kick in unnecessarily, in
case we are using the operation for a texture with index >= 16.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
anv and radv both happened to already return 2^14 for these, but
querying the ICD is safer and will help if vdreno (or whatever it's
called) doesn't have the same max.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Remove the original ALU instruciton after all of its readers are
modified to read the new ALU instruction.
v3: Fix an issue where a bcsel that may not be executed on a loop
iteration due to a break statement is converted to a phi (and therefore
incorrectly "executed"). Noticed by Tim.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109216
Fixes: 8fb8ebfbb0 ("intel/compiler: More peephole select")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
A single shader in Unigine Superposition is affected by this change.
A single iadd is moved to the end of a loop. This iadd is involved in
a complex set of logic to terminate the loop, and an extra mov
instruction is inserted. This shader really needs the optimization
suggested by bugzilla #94747, and I expect that to make this tiny
regression go away.
All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15047543 -> 15047545 (<.01%)
instructions in affected programs: 565 -> 567 (0.35%)
helped: 0
HURT: 2
total cycles in shared programs: 369977253 -> 369978253 (<.01%)
cycles in affected programs: 127910 -> 128910 (0.78%)
helped: 0
HURT: 2
v2: Skip nir_op_vec{2,3,4} and nir_op_[fi]mov instructions to avoid
infinite optimization loops. Remove the original ALU instruciton after
all of its readers are modified to read the new ALU instruction.
v3: Extend to the more general case. The if the prev-block value from
the phi is not undef, this means the ALU instruction has to be
duplicated in both the prev-block and the continue-block.
Fixes: 8fb8ebfbb0 ("intel/compiler: More peephole select")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This will be used in a couple more places soon.
The function name is... horribly long. Neither Matt nor I could think
of any thing that was shorter and still more descriptive than
"is_phi_foo". I'm willing to entertain suggestions.
Fixes: 8fb8ebfbb0 ("intel/compiler: More peephole select")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
For some reason, this warning only occurs for me in release builds.
In file included from src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c:25:0:
src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c: In function ‘brw_nir_lower_mem_access_bit_sizes’:
src/compiler/nir/nir_builder.h:501:26: warning: ‘src_swiz[2]’ may be used uninitialized in this function [-Wmaybe-uninitialized]
alu_src.swizzle[i] = swiz[i];
~~~~~~~~~~~~~~~~~~~^~~~~~~~~
src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c:225:16: note: ‘src_swiz[2]’ was declared here
unsigned src_swiz[4];
^~~~~~~~
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
There are a number of reasons for the rewrite.
1. Adding support for packing tess patch varyings in a sane way.
2. Making use of qsort allowing the code to be much easier to
follow.
3. Fixes a bug where different interp types caused component
packing to be skipped for all varyings in some scenarios.
4. Allows us to add a crude live range analysis for deciding
which components should be packed together. This support can
optionally be added in a future patch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will be used in the following patches to determine if we
support packing the components of a varying.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This adds support needed for marking the varyings as used but we
don't actually support packing patches in this patch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If the driver does not support rendering to these formats but does
support texturing, we can end up in incompatibilities between textures
and renderbuffers that are then copied to.
Fixes KHR-GL45.copy_image.functional on nvc0
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Some NVIDIA hardware can accept 128 fragment shader input components,
but only have up to 124 varying-interpolated input components. We add a
new cap to express this cleanly. For most drivers, this will have the
same value as PIPE_SHADER_CAP_MAX_INPUTS for the fragment shader.
Fixes KHR-GL45.limits.max_fragment_input_components
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
[imirkin: rebased, improved docs/commit message]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Regardless of whether the build uses kmsro, kmsro is the default driver
descriptor when the static loader is used. Thus, in an edge case where
the static loader is used, no static targets are loaded, and kmsro is
not compiled, a spurious warning is printed. There's no harm in
executing the stub function in this case, but it's not "an error" to not
have kmsro in the build; the driver missing warning should not printed
kmsro.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
This partially reverts a change from b7a93cbded ("radv: Handle
VK_ATTACHMENT_UNUSED in CmdClearAttachment") which fixed actual issues
but also started to accept invalid values for the colorAttachment
field.
This change asserts that the field is valid for the current pass.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b7a93cbded ("radv: Handle VK_ATTACHMENT_UNUSED in CmdClearAttachment")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This reverts commit d76e777988.
Let's make this obvious that there is an application issue if it tries
to access an attachment that doesn't exist in the current pass.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d76e777988 ("anv: Handle VK_ATTACHMENT_UNUSED in colorAttachment")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This variable was removed in commit 087af992a2 "travis: remove
unused linux code path" because it looked like it was only used by the
Linux build. Turns out I was wrong, so let's restore it.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
In addition to the DRM interface in active development, for legacy
kernels Panfrost has a small, optional, out-of-tree glue repository. For
various reasons, this legacy code should not be included in Mesa proper,
but this commit allows it to coexist peacefully with upstream Panfrost.
If the nondrm repo is cloned/symlinked to the directory
`src/gallium/drivers/panfrost/nondrm`, legacy functionality will be
built. Otherwise, the driver will build normally, though a runtime error
message will be printed if a legacy kernel is detected.
This workaround is icky, but it allows a nearly-upstream Panfrost to
work on real hardware, today. Ideally, this patch will be reverted when
the Panfrost kernel module is mature and we drop legacy support.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This patch includes the command stream portion of the driver,
complementing the earlier compiler. It provides a base for future work,
though it does not integrate with any particular winsys.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Switching to the defaults function cleans up pan_screen.h markedly and
futureproofs for when new PIPE_CAPs are added.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Suggested-by: Eric Anholt <eric@anholt.net>
As kmsro allows an essentially mix-and-match hodgepodge of display
drivers and renderonly GPUs, it doesn't make sense to couple the display
driver entrypoint definition with the driver. Instead, we move *all*
kmsro entrypoints to a shared kmsro block at the end (avoiding clutter
and distraction since this list may snowball in the future).
v2: Alphabetize driver list.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
The strategy is to keep a CPU-side counter of the direct invocations,
and a GPU-side counter of the indirect invocations, and then add them
together for queries.
The specific technique is a macro which multiplies a list of integers
together and accumulates the product into SCRATCH registers held inside
of the context. Another macro will read those values out and add them to
the passed-in cpu-side counter to be stored in a query buffer the same
way that all the other statistics are stored.
Original implementation by Rhys Perry, redone by Ilia Mirkin to use the
SCRATCH temporaries.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Not quite perfect, but at least we don't end up with random values in
the query buffer.
Fixes KHR-GL45.pipeline_statistics_query_tests_ARB.functional_default_qo_values
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
For the NO_WAIT variants, we would jump into the ALWAYS case for both
nested and inverted occlusion queries. However if the query had
previously completed, the application could reasonably expect that the
render condition would follow that result.
To resolve this, we remove the nesting distinction which unnecessarily
created an imbalance between the regular and inverted cases (since
there's no "zero" condition mode). We also use the proper comparison if
we know that the query has completed (which could happen as a result of
an earlier get_query_result call).
Fixes KHR-GL45.conditional_render_inverted.functional
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Looks like SUBFM.3D and SUEAU are perfectly capable of dealing with 3d
tiling, they just need the correct inputs. Supply them.
We also have to deal with the case where a 2d "layer" of a 3d image is
bound. In this case, we supply the z coordinate separately to the
shader, which has to optionally treat every 2d case as if it could be a
slice of a 3d texture.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
We used to pre-set a bunch of extra arguments to a texture instruction
in order to force the RA to allocate a register at the boundary of 4.
However with the levelZero optimization, which removes a LOD argument
when it's uniformly equal to zero, we undid that logic by removing an
extra argument. As a result, we could end up with insufficient alignment
on the second wide texture argument.
Instead we switch to a different method of achieving the same result.
The logic runs during the constraint analysis of the RA, and adds unset
sources as necessary right before being merged into a wide argument.
Fixes MISALIGNED_REG errors in Hitman when run with bindless textures
enabled on a GK208.
Fixes: 9145873b15 ("nvc0/ir: use levelZero flag when the lod is set to 0")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Atomic operations don't update the local cache, which means that we
would have to issue CCTL operations in order to get the updated values.
When we know that a buffer is primarily used for atomic operations, it's
easier to just avoid the caching at that level entirely.
The same issue persists for non-atomic buffers, which will have to be
fixed separately.
Fixes the failing dEQP-GLES31.functional.atomic_counter.* tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
The hardware does not natively support FIXED and DOUBLE formats. If
those are used in an indirect draw, they have to be converted. Our
conversion tries to be clever about only converting the data that's
needed. However for indirect, that won't work.
Given that DOUBLE or FIXED are highly unlikely to ever be used with
indirect draws, read the indirect buffer on the CPU and issue draws
directly.
Fixes the failing dEQP-GLES31.functional.draw_indirect.random.* tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
We used to restrict this to just PIPE_BIND_SAMPLER_VIEW resources, but
most resources benefit from being tiled.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
We're writing to the bo and the kernel needs to know for
fd_bo_cpu_prep() to work.
Fixes: f93e431272 ("freedreno/a6xx: Enable blitter")
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Needed for VK_EXT_buffer_device_address.
The pointers are implmemented as i8*, since I could not figure
out how to emulate setting struct offsets in LLVM based on the
SPIR-V offsets (and more weird stuff like row major matrices).
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We use a straight glsl->llvm type conversion so types should already be right.
Also even though the writemasks were changed we we not actually doing 32-bit
things, so this fails miserably.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
For example with VK_EXT_buffer_device_address or
VK_KHR_variable_pointers.
Fixes: a2b5cc3c39 "radv: enable variable pointers"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
For the implicit casts inherent in nir.
This should probably have been done for shared memory for
VK_KHR_variable_pointers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
`EGLDisplay` variables (the opaque Khronos type) have mostly been
consistently called `dpy`, as this is the name used in the Khronos
specs.
However, `_EGLDisplay` variables (our internal struct) have been
randomly called `dpy` when there was no local variable clash with
`EGLDisplay`s, and `disp` otherwise.
Let's be consistent and use `dpy` for the Khronos type, and `disp`
for our struct.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Anholt <eric@anholt.net>
Until the kernel side matures and the full driver is upstreamed, to
avoid end-user surprises, Panfrost should only be built for the
adventurous.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
I had a single function for "does this do float input unpacking" with two
major flaws: It was missing the most common thing to try to copy propagate
a f32 input nunpack to (the VFPACK to an FP16 render target) along with
several other ALU ops, and also would try to propagate an f32 unpack into
a VFMUL which only does f16 unpacks.
instructions in affected programs: 659232 -> 655895 (-0.51%)
uniforms in affected programs: 132613 -> 135336 (2.05%)
and a couple of programs increase their thread counts.
The uniforms hit appears to be a pattern in generated code of doing (-a >=
a) comparisons, which when a is abs(b) can result in the abs instruction
being copy propagated once but not fully DCEed.
If you only bound rt 1+, we'd still emit a write to the rt0 that isn't
present (noticed while debugging an
ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero
regression in another change).
Iris would like to use compact arrays for tesslevels and clip/cull
distances. radeonsi will likely want to switch to these at some point,
since it'll be necessary for GL_ARB_gl_spirv support, but it's not ready
for them just yet.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Today, st always sets LowerCombinedClipCullDistance, causing the GLSL IR
lowering to run, giving us vec4[2] arrays. I would like to disable this
and instead run the NIR lowering so that we get compact float[] arrays
instead.
Calling the new pass is a noop if the GLSL IR pass has already run, so
it's safe to call the pass unconditionally.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Compact arrays are used for special variables like clip and cull
distances, or tessellation levels. Drivers using compact arrays
assume that these values will always be actual arrays. We don't
want to turn a float[1] gl_CullDistance into a single float; that
would confuse drivers.
Today, i965 uses compact arrays, and Gallium drivers use
nir_lower_io_arrays_to_elements, so we haven't had any overlap
that would demonstrate the issue. Iris will use both.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
A couple places in st/nir assume that cull distances have been lowered
away, so it will need to call this lowering pass for drivers which opt
out of the GLSL IR lowering. The Intel backend also calls this pass,
for i965 and anv. We need to only do it once.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We have a GLSL IR pass to convert clip/cull distance float[] arrays
into vec4[2] arrays. In ff281e6204, we attempted to skip this pass
if the GLSL IR lowering had already run. But, that code was not quite
right, as we forgot to strip away the per-vertex IO array layer for
geometry and tessellation shader varyings.
If the GLSL IR pass has run, the variables will not be marked as
"compact". So we can simply check that and bail.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
nir_lower_clip_cull_distance_arrays() marks the combined clip/cull
distance array as compact. However, when translating in from GLSL
or SPIR-V, we were not marking the original float[] arrays as compact.
We should do so. That way, we can detect these corner cases properly.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
radeonsi uses a system value for gl_FragCoord rather than an input var.
These get translated into load_frag_coord NIR intrinsics, which lose the
pixel_center_integer and origin_upper_left decorations. To cope with
this, Tim added a shader_info field for pixel_center_integer, and made
glsl_to_nir set it accordingly.
prog_to_nir also needs to handle these fragcoord conventions. Instead
of duplicating the logic to set the info field, just move it to
nir_lower_system_values so it'll happen regardless of who makes the NIR.
(For what it's worth, we don't need an info flag for origin_upper_left,
because radeonsi lowers origin conventions in nir_lower_wpos_ytransform
before nir_lower_system_values destroys the variable and qualifiers.)
Reviewed-by: Eric Anholt <eric@anholt.net>
Some drivers, such as radeonsi, use a system value for gl_FragCoord
rather than an input variable. In this case, our Mesa IR will have
a PROGRAM_SYSTEM_VALUE register, which we need to translate.
This makes prog_to_nir work for Gallium drivers which expose the
PIPE_CAP_TGSI_FS_POSITION_IS_SYSVAL capability bit.
Reviewed-by: Eric Anholt <eric@anholt.net>
We implement the basic VS and FS, as well as the VS that does layered
clears by writing gl_Layer from the vertex shader. Drivers which need
a geometry shader for writing layer continue falling back to TGSI, as
I didn't need this and so didn't bother implementing it. (We certainly
could, however, if people want to add it in the future.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eric Anholt <eric@anholt.net>
This provides a native NIR version of the DrawPixels/Bitmap passthrough
vertex shader.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eric Anholt <eric@anholt.net>
The state tracker generates several built-in shaders in order to
perform scissored clears, upload/download PBOs, and so on. These
are currently constructed using TGSI, using ureg and u_simple_shader.
I want to have NIR versions of these shaders, for my Gallium driver
that has a NIR backend but no TGSI support. To that end, we'll want
a few helpers to help construct simple shaders.
This patch adds two new helpers:
- st_nir_finish_builtin_shader() takes a manually constructed NIR
shader, applies lowering passes (like st_link_nir would do for GLSL),
and constructs the pipe_shader_state.
- st_nir_make_passthrough_shader() makes a simple passthrough shader,
which copies inputs to outputs. This is similar to u_simple_shaders.
v2: Set info->fs.untyped_color_outputs for vc4/v3d (thanks Eric!).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eric Anholt <eric@anholt.net>
Ken's rework of mesa/st builtins to NIR means that we'll have more NIR
shaders with color output types that are mismatched with the render target
types. Since this is behavior that GLSL doesn't require, add it as a
shader_info option so the driver can know that it needs to ignore the FS
output's base type in favor of the actual render target's. This prevents
needing additional variants in several mesa/st paths (clear, pbo upload,
pbo download), given that the driver already has to handle the variants
for any TGSI being passed to it (from u_blitter, for example).
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gives me an performance boost of 0.2% in pixmark_piano on my gk106, gm204 and
gp107.
reduces the amount of generated convert instructions by roughly 30% in
shader-db.
v2: only for 32 bit operations
move some common code out of the switch
handle OP_SAT with modifiers
v3: only for registers and const memory
rework if clauses
merge isCvt into this patch
v4: merge isCvt into its use
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
When Mesa is compiled for gallium-xlib using e.g.
./configure --enable-glx=gallium-xlib --disable-dri --disable-gbm
-disable-egl
and is used by an X server (usually remotely via SSH X11 forwarding)
that does not support MIT-SHM such as XMing or MobaXterm, OpenGL
clients report error messages such as
Xlib: extension "MIT-SHM" missing on display "localhost:11.0".
ad infinitum.
The reason is that the code in src/gallium/winsys/sw/xlib uses
MIT-SHM without checking for its existence, unlike the code
in src/glx/drisw_glx.c and src/mesa/drivers/x11/xm_api.c.
I copied the same check using XQueryExtension, and tested with
glxgears on MobaXterm.
This issue was reported before here:
https://lists.freedesktop.org/archives/mesa-users/2016-July/001183.html
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: <mesa-stable@lists.freedesktop.org>
sizeof counts the terminating null character as well, so that also
contributed to the ID computed for the X11 atom. But the convention is
for only the non-null characters to contribute to the atom ID.
Fixes: 2e12fe425f "loader/dri3: Enable adaptive_sync via
_VARIABLE_REFRESH property"
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When a texture is still bound as an image and the context it was bound in
is destroyed but not the texture, then the texture will still hold the
resource and will not be freed when it is finally destroyed. Hence, release
these references when the context is destroyed.
This leak was triggered by virglrenderer:
https://gitlab.freedesktop.org/virgl/virglrenderer/issues/86
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ureg_get_tokens clears the reference to the tokens, and create_compute_state makes
a copy, hence the tokens must be explicitely released.
Fixes: Direct leak of 256 byte(s) in 1 object(s) allocated from:
#0 0x7ff729cf3c60 in realloc (/usr/lib64/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xdbc60)
#1 0x7ff721b1240c in tokens_expand ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:234
#2 0x7ff721b1c9c0 in get_tokens ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:257
#3 0x7ff721b1c9c0 in copy_instructions ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:2040
#4 0x7ff721b1c9c0 in ureg_finalize ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:2090
#5 0x7ff721b1e919 in ureg_get_tokens ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:2167
#6 0x7ff721f8b35a in si_create_dma_compute_shader ../../samba/mesa/src/gallium/drivers/radeonsi/si_shaderlib_tgsi.c:219
#7 0x7ff722043ed9 in si_compute_do_clear_or_copy ../../samba/mesa/src/gallium/drivers/radeonsi/si_compute_blit.c:156
#8 0x7ff7220448d3 in si_clear_buffer ../../samba/mesa/src/gallium/drivers/radeonsi/si_compute_blit.c:247
#9 0x7ff7220350e8 in vi_dcc_clear_level ../../samba/mesa/src/gallium/drivers/radeonsi/si_clear.c:274
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It is always false on Gen8+. Also, move the variable definition near
its use.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
All things being equal is better to keep the original order. Since
the new block is empty, push the phis in order to tail.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
This patch implements the free Midgard shader toolchain: the assembler,
the disassembler, and the NIR-based compiler. The assembler is a
standalone inaccessible Python script for reference purposes. The
disassembler and the compiler are implemented in C, accessible via the
standalone `midgard_compiler` binary. Later patches will use these
interfaces from the driver for online compilation.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
This patch adds an initial stub for the Gallium driver, containing
simple screen functions and the majority of the driver headers but no
actual functionality. It further adds the winsys glue for linking in
this stub driver via kmsro on Rockchip/Amlogic boards.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Commit 8b626a22b2 introduced a new
pipe_image_view::shader_access field, indicating the access mode
specified in the shader. st/mesa's built-in PBO download shader
creates a write-only image buffer, so we should flag it as such.
Nobody uses this field yet (Iris will), so we don't need to backport
this fix to stable branches.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Transform feedback did not set correct SO_DECL.ComponentMask for
varyings packed in VARYING_SLOT_PSIZ:
gl_Layer - VARYING_SLOT_LAYER in VARYING_SLOT_PSIZ.y
gl_ViewportIndex - VARYING_SLOT_VIEWPORT in VARYING_SLOT_PSIZ.z
gl_PointSize - VARYING_SLOT_PSIZ in VARYING_SLOT_PSIZ.w
Fixes: 36ee2fd61c "anv: Implement the basic form of VK_EXT_transform_feedback"
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
From the Vulkan 1.0.98 spec for vkCmdClearAttachments:
"If any attachment to be cleared in the current subpass is VK_ATTACHMENT_UNUSED,
then the clear has no effect on that attachment."
"If the aspectMask member of any element of pAttachments contains
VK_IMAGE_ASPECT_COLOR_BIT, then the colorAttachment member of that
element must either refer to a color attachment which is VK_ATTACHMENT_UNUSED,
or must be a valid color attachment."
"If the aspectMask member of any element of pAttachments contains
VK_IMAGE_ASPECT_DEPTH_BIT, then the current subpass' depth/stencil attachment
must either be VK_ATTACHMENT_UNUSED, or must have a depth component"
"If the aspectMask member of any element of pAttachments contains
VK_IMAGE_ASPECT_STENCIL_BIT, then the current subpass' depth/stencil attachment
must either be VK_ATTACHMENT_UNUSED, or must have a stencil component"
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
From the Vulkan 1.0.98 spec for vkCmdClearAttachments:
"If the aspectMask member of any element of pAttachments contains
VK_IMAGE_ASPECT_COLOR_BIT, then the colorAttachment member of that
element must either refer to a color attachment which is VK_ATTACHMENT_UNUSED,
or must be a valid color attachment."
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The Vulkan spec says:
"If pResolveAttachments is not NULL, for each resolve attachment
that does not have the value VK_ATTACHMENT_UNUSED, the
corresponding color attachment must not have the value
VK_ATTACHMENT_UNUSED."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Outgoing dependencies (ie. external) should happen after the subpass.
This doesn't change anything for subpass resolves as we already
make sure that attachments are shader readable.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The different masks should be accumulated. For example if two
subpasses declare an outgoing dependency (ie. dst ==
VK_SUBPASS_EXTERNAL).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
radv_render_pass_compile() is common to vkCreateRenderPass()
and vkCreateRenderPass2().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
That shouldn't change anything as we check if the last
subpass id is the final subpass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This reworks how the depth stencil attachment is used for
simplicity. This also introduces radv_render_pass_compile()
helper that will be used for further optimizations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
To unify some code in BeginRenderPass() and NextSubpass().
Based on Intel ANV driver.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Previously, we only applied the fix to shaders with a dispatch mode of
SIMD8 but the code it relies on for SIMD16 mode only applies to SIMD16
instructions. If you have a SIMD8 instruction in a SIMD16 shader,
neither would trigger and the restriction could still be hit.
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127..."
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
By just assigning dst.type to src[i].type, we ensure that the offset at
the end of the loop actually offsets it by the right number of
registers. Otherwise, we'll get into a case where we copy with a Q type
and then offset with a D type and things get out of sync.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we tried to combine all cases where the instruction being
CSE'd writes to more than one MOV worth of registers into one case with
a bit of special casing for LOAD_PAYLOAD. This commit splits things so
that LOAD_PAYLOAD is entirely it's own case. This makes tweaking the
LOAD_PAYLOAD case simpler in the next commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Missing check for shader stage in the fs_visitor would corrupt the
cs_prog_data.push information and trigger crashes / corruption later
when uploading the CS state.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We already defer handling the actual execution modes until after we've
created the shader. This just moves it a tiny bit further so we
actually have constants and types and can handle OpExecutionModeId.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Instead of handling it as part of the handling of constant instructions,
just stash the vtn_value when we see the decoration and handle it
explicitly later. This will let us re-order handling of constant
instructions without breaking the Vulkan SPIR-V requirement that
decorating a specialization constant as the WorkgroupSize built-in
overrides the workgroup size set as an execution mode.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The uint version is less typing, supports different bit sizes, and is
probably a bit more safe because we're actually verifying that the
SPIR-V value is an integer scalar constant.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Required for the following test:
bin/compressedteximage GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT
pass when emulating GL on GLES.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Without this we do not end up with a deterministic NIR because
temporary register variables are added in random order. NIR must
be deterministic because we use it to produce a sha for the
radeonsi backends disk cache.
This fixes the shader cache for a bunch of shaders.
Another positive is that this results in a large reduction in the
size of the NIR that the state tracker stores to the disk cache.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
for meson all C++ code is already compiled as C++11, so it's
unnecessary. It's also the wrong way to do this, if we really needed
this the correct way is to set:
```meson
executable(
...
override_options : ['cpp_std=c++11'],
)
```
Which ensures not only that the correct syntax for the current
compiler is used, but also that meson doesn't create arguments like
`-std=c++14 ... -std=c++11`
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The `:` in options should always have one space before and after `foo
: bar`, and lists do not get spaces around the braces: `[foo]` not `[
foo ]`
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Which is and has always been the default. This is largely an artifact
of how the building of these tools was controlled when the meson build
was originally created.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
We need to initialize all fields in rs->prim explicitly while
creating new rastpos stage.
Fixes: bac8534267 ("st/mesa: allow glDrawElements to work with GL_SELECT
feedback")
v2: Initializing all fields in rs->prim as per Ilia.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Android.mk and autotools disagree about where generated files should
go, which wasn't a problem until we wanted to build a dist
tarball. This corrects the problem by changing the output and include
paths to be the same on android and autotools (meson already has the
correct include path).
Fixes: 7d7b30835c
("automake: Fix path to generated source")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This was copy-and-paste fail, that oddly showed up in the CTS's
reinterprets of r32f, rgba8, and srgba8 to rgba8i, but not r32ui and r32i
to rgba8i or reinterprets to other signed int formats.
Fixes: 6281f26f06 ("v3d: Add support for shader_image_load_store.")
One of the CTS cases tries to invalidate just stencil of packed
depth/stencil, and we incorrectly lost the depth contents.
Fixes dEQP-GLES3.functional.fbo.invalidate.whole.unbind_read_stencil
Fixes: 0c42b5f3cb ("mesa: wire up InvalidateFramebuffer")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
As of Nov/30/2018 the extension is also valid for OpenGL >= 1.2, so
enable it accordingly and also add the required view class entry.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes serious stuttering in Shadow Of The Tomb Raider.
Fixes: 50fd253bd6 ("radv/winsys: Add priority handling during submit.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reported by Coverity: in the case of unsupported modifier request, the
code does not jump to the “fail” label to destroy the acquired resource.
CID: 1435704
Signed-off-by: Ernestas Kulik <ernestas.kulik@gmail.com>
Fixes: 45bb8f2957 ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.")
Reported by Coverity: in the case where there exist hardware and
non-hardware queries, the code does not jump to err_free_query and leaks
the query.
CID: 1430194
Signed-off-by: Ernestas Kulik <ernestas.kulik@gmail.com>
Fixes: 9ea90ffb98 ("broadcom/vc4: Add support for HW perfmon")
I can't imagine the new HW block being paired with a v6 CPU, so don't
bother with the CPU detection that vc4 had to do.
Improves 1024x1024 TexImage on my 7278 by 47.3229% +/- 0.679632%
Earlier commit addressed 7 of the 8 instances available.
v2: Rebase patch back to master (by anholt)
Cc: Carsten Haitzler (Rasterman) <raster@rasterman.com>
Cc: Eric Anholt <eric@anholt.net>
Fixes: 300d3ae8b1 ("vc4: Declare the cpu pointers as being modified in NEON asm.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Use the trick of adding and then subtracting 2**52 (52 is the number of
explicit mantissa bits a double-precision floating-point value has) to
implement round-to-even.
Cuts the number of instructions on SKL of the piglit test
fs-roundEven-double.shader_test from 109 to 21.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This batch->cleared value is only used to decide to use sysmem rendering
or not, so it should include any buffers that are affected by a clear.
This is required because the a2xx fast clear doesn't work with sysmem
rendering. The a22x "normal" clear path doesn't work with sysmem either.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Depth can be used even when there is no restore/resolve of depth. This
happens when the depth buffer is invalidated after rendering to avoid
the resolve operation.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Set dirty bits on invalidate to trigger invalidate logic in fd_draw_vbo.
Also, resource_written for color needs to be after the invalidate logic.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
And before someone actually starts implementing DiscardFramebuffer()
lets rework the interface to something that is actually usable.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows freedreno to be aware of the depth invalidate when flushing
batches on flush_resource.
AFAIK, the only other driver which might care about this change is vc4,
where I think it should help by allowing the depth invalidate to work with
GALLIUM_HUD.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Check if a pixel format is supported by the Wayland servers gpu driver
before exposing it to the client via wl_drm, so we avoid reporting formats
to the client which the server gpu can't handle.
Restrict this reporting to the new color depth 30 formats for now, as the
ARGB/XRGB8888 and RGB565 formats are probably supported by every gpu under
the sun.
Atm. this is mostly useful to allow proper PRIME renderoffload for depth
30 formats on the typical Intel iGPU + NVidia dGPU "NVidia Optimus" laptop
combo.
Tested on Intel, AMD, NVidia with single-gpu setup and on a Intel + NVidia
Optimus setup.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Support PRIME render offload between a Wayland server gpu and a Wayland
client gpu with different channel ordering for their color formats,
e.g., between Intel drivers which currently only support ARGB2101010
and XRGB2101010 import/display and nouveau which only supports ABGR2101010
rendering and display on nv-50 and later.
In the wl_visuals table, we also store for each format an alternate
sibling format which stores colors at the same precision, but with
different channel ordering, e.g., ARGB2101010 <-> ABGR2101010.
If a given client-gpu renderable format is not supported by the server
for import, but the alternate format is supported by the server, expose
the client-gpu renderable format as a valid EGLConfig to the client. At
eglSwapBuffers time, during the blitImage() detiling blit from the client
backbuffer to the linear buffer, the client format is converted to the
server supported format. As we have to do a copy for PRIME anyway,
this channel swizzling conversion comes essentially for free.
Note that even if a server gpu in principle does support sampling
from the clients native format, this conversion will be a performance
advantage if it allows to convert to the servers preferred format
for direct scanout, as the Wayland compositor may then be able to
directly page-flip a fullscreen client wl_buffer onto the primary
plane, or onto a hardware overlay plane, avoiding an extra data copy
for desktop composition.
Tested so far under Weston with: nouveau single-gpu, Intel single-gpu,
AMD single-gpu, "Optimus" Intel server iGPU for display + NVidia
client dGPU for rendering.
v2: Implement minor review comments by Eric Engestrom: Add some
comment and assert, and some style fixes for clarity.
No functional change.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Surface reads don't need them because they just have the one address
payload. With surface writes, on the other hand, we can put the address
and the data in the different halves and avoid building the payload all
together.
The decrease in register pressure and added freedom in register
allocation resulting from this change reduces spilling enough to improve
the performance of one customer benchmark by about 2x.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We're about to add some more if cases so let's have the giant re-indent
in it's own patch to make review easier.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
These have clearly never seen any use.... On gen8, the bottom 4 bits are
missing so we need to shift them off before we call set_bits and shift
again when we get the bits. Found by inspection.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Instead of fetching the information out of the instruction directly,
fetch the descriptor and then pluck the information out of the
descriptor. The current scheme works ok for SEND but with SENDS, it all
falls to pieces because the descriptor is completely shuffled around.
This commit doesn't actually convert everything. One notable exception
is URB messages which don't even use descriptors in emit_urb_WRITE yet.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We want to be able to extract data from descriptors as well as unify a
bit of the descriptor construction.
One of the unifications we do is to unify the read/write and dataport
descriptors. On gen4-5, read/write are substantially different and the
read descriptors change between gen4 and gen4.x. On gen6, they unified
layouts between read, write, and dataport. Then, on gen8, they added
one bit to the message type field but left it reserved MBZ for
read/write messages. This commit chooses to treat that as if they
expanded the field everywhere and just didn't have enough enum values
for read/write to bother with the extra bit.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This commit pulls the surface descriptor helpers out into brw_eu.h and
makes them no longer depend on the codegen infrastructure. This should
allow us to use them directly from the IR code instead of the generator.
This change is unfortunately less mechanical than perhaps one would like
but it should be fairly straightforward.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Instead of magically falling back to SIMD8 for atomics and typed
messages on Ivy Bridge, explicitly figure out the exec size and pass
that into brw_surface_payload_size.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Like all the other sends, it's just mlen * REG_SIZE.
Fixes: 3cbc02e469 "intel: Use TXS for image_size when we have..."
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
If you pass a bool in as the value to set, the C standard says that it
gets converted to an int prior to shifting. If you try to set a bool to
bit 31, this lands you in undefined behavior. It's better just to add
the explicit cast and let the compiler delete it for us.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
There are piles of fields that it doesn't check so using it is a lie.
The only reason why it's not causing problem is because it has exactly
one user which only uses it for MOV instructions (which aren't very
interesting) and only on Sandy Bridge and earlier hardware. Just get
rid of it and inline it in the one place that it's actually used.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Previously we tried to normalize nr_samples to MAX2(1, nr_samples) to
avoid having to deal with 0 vs 1 everywhere. But this causes problems
in mesa/st, for example st_finalize_texture() will think there is a
nr_samples mismatch and recreate the texture. Somehow this manifests
as corrupt x11 font rendering on generations that do not support MSAA
(but apparently works fine on a5xx and a6xx which do support MSAA.)
Fixes: cf0c7258ee freedreno/a5xx: MSAA
Signed-off-by: Rob Clark <robdclark@gmail.com>
Switched to the raw bo list api to avoid having to use 2 arrays for
everything.
This was introduced in libdrm 2.4.97 which we already depend upon.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This has been disabled some months ago because it introduced
rendering issues with Shadow Of Warrier II (DXVK). This game is
no longer affected, I wonder if 824cfc1ee5 ("radv: rework the
TC-compat HTILE hardware bug with COND_EXEC") fixed the problem.
I checked The Forest on my Polaris, and it renders fine too.
According to Phillip, this gives +5.5% with Rise Of The Tomb
Raider and DXVK. This is because DXVK uses 16-bit depth surfaces
while the native port from Feral uses 32-bit depth surfaces.
Unfortunately, Shadow Of The Tomb Raider isn't affected because
it clears each layer of a D16 array texture individually. So it
doesn't hit the fast clear path.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The core Mesa with_asm_arch and USE_ARM_ASM flags are disabled for meson
cross-builds because of the need to run host binaries on the build system.
vc4 doesn't need to do that, so skip with_asm_arch to enable NEON on my
cross-builds.
Fixes: ebcb4c2156 ("meson: Enable VC4's NEON assembly support.")
Otherwise, the compiler is free to reuse the register containing the input
for another call and assume that the value hasn't been modified. Fixes
crashes on texture upload/download with current gcc.
We now have to have a temporary for the cpu2 value, since outputs must be
lvalues.
(commit message by anholt)
Fixes: 4d30024238 ("vc4: Use NEON to speed up utile loads on Pi2.")
This fixes the depth/stencil clear on a20x, and adds a fast clear path.
The fast clear path is only used for a20x, needs performance tests on a22x.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
This allows us to avoid expensive string compares since we already have
a map to the pointers.
These compares were taking ~30 seconds for a single shader compile
in Godot due to it using 64,000+ uniforms.
Fixes: c4cff5f402 ("glsl: add basic support for resource list to shader cache")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109229
Enable using etnaviv for KMS renderonly. This still needs KMS driver
name mapping to kmsro to be used automatically.
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Rob Herring <robh@kernel.org>
This allows vc4 to initialize on the Adafruit PiTFT 3.5" touchscreen with
the hx8357d tinydrm driver
v2: Whitespace fix noted by Eric Engestrom, update commit message for the
driver being merged.
v3: Rebase on Rob Herring's pipe-loader changes.
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Acked-by: Emil Velikov <emil.velikov@collabora.com> (v1)
If we can't find a driver matching by name, then use the kmsro driver.
This removes the need for needing a driver descriptor for every possible
KMS driver.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
The vc4 driver can do prime sharing to many different KMS-only devices,
such as the various tinydrm drivers for SPI-attached displays. Rename the
driver away from "pl111" to represent what it will actually support:
various sorts of KMS displays with the renderonly layer used to attach a
GPU.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Instead of using this useless array_params_mask variable.
This should set these two attributes to streamout buffers too.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Use EXT_framebuffer_sRGB to expose EXT_sRGB_write_control on GLES. Remove
the checks for desktion GL in the enable calls, since EXT_framebuffer_sRGB
now also indicates support for switching the linear-sRGB color
space conversion on GLES.
Thanks to Ilia Mirkin for all the helpful discussions that helped to rework
this series.
v2: Fix alphabetical listing of extensions (Tapani Pälli)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
GLES 3.0 does not actually require support for EXT_framebuffer_sRGB, it
only needs support for sRGB attachments to framebuffers and framebuffer
objects as defined in ARB_framebuffer_objects.
v2: Clarify that ARB_framebuffer_objects is needed.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
All drivers that support EXT_framebuffer_sRGB also support EXT_sRGB, but
in order to keep this commit minial, and not to break any drivers both
flags are checked.
v2: - Use only EXT_sRGB (Ilia Mirkin)
- Move adding the flag EXT_sRGB to gl_extensions to a separate patch
v3: use _mesa_has_EXT_framebuffer_sRGB instead of extension flag
The _mesa_has function also checks for the correct versions and
should be preferred over using the flags directly (Erik)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For GLES sRGB framebuffer attachemnt support is provided in two steps:
sRGB attachments like described in EXT_sRGB (and GLES 3.0) that enable
linear to sRGB color space transformation automatically, and the ability
to switch formats of the render target surface between sRGB and linear
that introduces full support for EXT_framebuffer_sRGB.
Set the according flags to reflect these two levels of sRGB support.
As a difference between desktopm GL and GLES, on desktop GL for a sRGB
framebuffer attachment the linear-sRGB conversion is turned off by default,
and for GLES it is turned on. This needs to be taken into account when
initally creating a surface, i.e. on desktop GL creation of a sRGB surface
is preferred, but on GLES sRGB surfaces are only created when explicitely
requested.
v2: - Use the new CAPS name
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
EXT_sRGB is an (incomplete) GLES extension that provides support for sRGB
framebuffer attachments, hence it can be used to check for this support
as an alternative to EXT_framebuffer_sRGB that provies the same
functionality but also sRGB write control support.
However, since EXT_sRGB is incomplete and superseted by GLES 3.0 it will
not be exposed as an extension.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: - Use the renamed CAPS
- add assetions to make sure that mesa doesn't try to switch
destination surface formats when it is not supported. (Ilia Mirkin)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Add a new cap that indicates whether the drivers supports
enabling/disabling the conversion from linear space to sRGB
for a framebuffer attachment. In Driver terms that this CAP indicates
whether the driver can switcht between a linear and and a sRGB surface
format for draw destinations witout changing the sourface itself.
v2: rename CAP to DEST_SURFACE_SRGB_CONTROL to reflect its
purpouse better (pointed out by Ilia Mirkin)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Under Vulkan, the double vertex attributes take up the same size
regardless of whether they are vertex inputs or any other stage
interface.
Under OpenGL (ARB_gl_spirv), from GLSL 4.60 spec, section 4.3.9
Interface Blocks:
"It is a compile-time error to have an input block in a vertex
shader or an output block in a fragment shader. These uses are
reserved for future use."
So we also don't need to check if it is an vertex input or not, and
use false in any case.
v2: (changes made by Alejandro Piñeiro)
* Update required after "spirv: Handle location decorations on
block interface members" own updates (original patch was sent
several months ago)
* After Neil suggesting it, confirm that this change can be also
done for OpenGL (ARB_gl_spirv). Expand commit message.
v3: update after changing name of main method on a previous patch
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
glsl_count_attribute_slots takes a parameter to specify whether the
type is being used as a vertex input because on GL double attributes
only take up one slot. Vulkan doesn’t make this distinction so this
patch renames the argument to is_gl_vertex_input in order to make it
more clear that it should always be false on Vulkan.
v2: minor variable renaming (s/member/member_type) (Tapani)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Previously the code was taking any location decoration on the block
and using that to calculate the member locations for all of the
members. I think this was assuming that there would only be one
location decoration for the entire block. According to the Vulkan spec
it is possible to add location decorations to individual members:
“If the structure type is a Block but without a Location, then each
of its members must have a Location decoration. If it is a Block
with a Location decoration, then its members are assigned
consecutive locations in declaration order, starting from the
first member which is initially the Block. Any member with its own
Location decoration is assigned that location. Each remaining
member is assigned the location after the immediately preceding
member in declaration order.”
This patch makes it instead keep track of which members have been
assigned an explicit location. It also has a space to store the
location for the struct as a whole. Once all the decorations have been
processed it iterates over each member to fill in the missing
locations using the rules described above.
So, this commit is needed to get working a case like this, on both
Vulkan and OpenGL using SPIR-V (ARB_gl_spirv):
out block {
layout(location = 2) vec4 c;
layout(location = 3) vec4 d;
layout(location = 0) vec4 a;
layout(location = 1) vec4 b;
} name;
v2: (changes made by Alejandro Piñeiro)
* Update after introducing struct member splitting (See commit b0c643d)
* Update after only exposing interface_type for blocks, not to any struct
* Update after last changes done for xfb support
v3: use "assign" instead of "add" on the new method added (Tapani)
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Defines how sampler (and pixel pipes) needs to access the data
represented with a resource. The used default is mode is
ETNA_ADDRESSING_MODE_TILED.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
The text segment is shared among multiple contexts, while each one has
its own bufctx. So when reallocating the text segment, some contexts may
end up with stale values in their bufctx's. Instead limit the exposure
to the bufctx to within a single draw.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The sampler border color is encoded in the TMU's blending format (half
floats, 32-bit floats, or integers) and must be clamped to the format's
range unorm/snorm/int ranges by the driver. Additionally, the TMU doesn't
know about how we're abusing the swizzle to support BGRA, A, and LA, so we
have to pre-swizzle the border color for those.
We don't really want to spend half a kb on sampler states in most cases,
so skip generating the variants when the border color is unused or is
0,0,0,0.
When the sampler view is in sample-stencil mode, we need to return uint
stencil values. To do that, fill in the format table to return R8I, and
have the sampler view point at the separate stencil buffer.
Fixes dEQP-GLES31.functional.stencil_texturing.format.depth32f_stencil8_2d
Fixes OOMs in the CTS's packed_pixels.varied_rectangle.* tests -- the
series of texture uploads at the start before texturing occurred would end
up all sitting around as cached jobs for reuse. By flushing immediately,
peak active BO usage goes from 150M to 40M.
We could maybe put some limits on how many jobs we keep around, but blits
seem particularly unlikely to get reused for other drawing.
This is the GLES 3.2 minmax, and also what the closed source driver does.
Avoids hitting OOMs in the CTS's
dEQP-GLES3.functional.texture.units.all_units.only_cube.1.
From the vulkan spec 33.3 "Extension Dependencies":
"Any device extension that has an instance extension dependency that is
not enabled by vkCreateInstance is considered to be unsupported, hence
it must not be returned by vkEnumerateDeviceExtensionProperties for any
VkPhysicalDevice child of the instance."
Therefore we need to check whether the instance-level extensions are
actually enabled when deciding to support a device-level extension or
not.
Furthermore, we need to do this for all instance-level extensions of any
(transitive) device-level extension dependency, due to the following
paragraph:
"If an extension is supported (as queried by
vkEnumerateInstanceExtensionProperties or
vkEnumerateDeviceExtensionProperties), then required extensions of that
extension must also be supported for the same instance or physical
device."
Finally, because some of these vulkan extensions may be implicitly
promoted to future vulkan core API versions, we can also satisfy the
dependency if the vulkan API version is high enough.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
From the vulkan spec 3.2 "Instances":
"Providing a NULL VkInstanceCreateInfo::pApplicationInfo or providing an
apiVersion of 0 is equivalent to providing an apiVersion of
VK_MAKE_VERSION(1,0,0)."
Fixes: ffa15861ef "radv: UseEnumerateInstanceVersion for the default version."
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Section 7.6.2.2 (Standard Uniform Block Layout) of the GL spec says:
The base offset of the first member of a structure is taken from the
aligned offset of the structure itself. The base offset of all other
structure members is derived by taking the offset of the last basic
machine unit consumed by the previous member and adding one.
The current code does not reflect this last sentence - it effectively
instead aligns up the next offset up to the alignment of the previous
member. This causes an issue in exactly one case:
layout(std140) uniform block {
layout(offset=0) vec3 var1;
layout(offset=12) float var2;
};
As per section 7.6.2.1 (Uniform Buffer Object Storage) and elsewhere, a
vec3 consumes 3 floats, i.e. 12 basic machine units. Therefore, `var1`
in the example above consumes units 0-11, with 12 being the first
available offset afterwards. However, before this commit, mesa
incorrectly assumes `var2` must start at offset=16 when using explicit
offsets, which results in a compile-time error. Without explicit
offsets, the shaders actually work fine, indicating that mesa is already
correctly aligning these fields internally. (Just not in the code that
handles explicit buffer offset parsing)
This patch should fix piglit tests:
ssbo-explicit-offset-vec3.vert
ubo-explicit-offset-vec3.vert
Signed-off-by: Niklas Haas <git@haasn.xyz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This only implements the actual opcodes and does not implement support
for using them with specialization constants.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We handle forward declarations by creating the pointer type with it's
storage type based on storage class and just waiting to fill out the
actual deref type until we get the OpTypePointer. Because any
composites using the forward declared type only care about the storage
type (i.e. uint64_t, uvec2, etc.) when creating their glsl_type, this
works fine and we can defer the actual deref_type as far as we need.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
This was valid back when the only valid types of pointers were uint32
and uvec2. Now that we're allowing more variety, it could be just about
anything so we'll just drop the assert.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
These correspond roughly to reading/writing OpenCL global pointers. The
idea is that they just take a bare address and load/store from it. Of
course, exactly what this address means is driver-dependent.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
We want to have debug info as well if using
meson's debugoptimized when ndebug is off.
v2: use u_debug functions that do something
even if DEBUG is not set.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Fixes regression caused by
42d672fa6a
st/nine: Bind src not dst in nine_context_box_upload
Before that patch, for user provided textures,
when the texture was destroyed, the safety
check for pending uploads, which according to
the code "Following condition cannot happen currently",
was flushing the queue and thus triggering the upload.
After the patch, the texture destruction was delayed after
the upload. However the user frees the texture buffer,
as it thinks the texture released.
Instead of reverting the faulty patch,
this patch instead flushes the csmt queue right away
after queuing the upload for this type of textures.
This is more future-proof, as we may want to bind the
surface for other reasons in the future.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Cc: 18.3 <mesa-stable@lists.freedesktop.org>
Compilation of user-specified shaders with software fp64 works by
compiling on demand an "fp64-funcs" shader implementing various fp64
operations and then linking it into the "user shader".
In
commit 64b8c86d37
Author: Timothy Arceri <tarceri@itsqueeze.com>
Date: Thu Jan 17 17:16:29 2019 +1100
glsl: be much more aggressive when skipping shader compilation
we changed the behavior of the shader cache to skip compilation earlier
when we get a cache hit.
After the aforementioned commit, compiling a user program using fp64
would store into the cache an entry for the fp64-funcs shader.
Subsequent compilations of uncached user shaders using fp64 would fail
in compile_fp64_funcs() after finding a cache entry for the fp64-funcs,
but being unprepared to read from the cache.
It's unclear to me how to retrieve the cached NIR of the fp64-funcs (if
it even is cached), so just call _mesa_glsl_compile_shader() with
force_recompile=true in order to ensure we generate the fp64-funcs
successfully.
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows creating a fd_screen with a renderonly object which will be
used to allocated scanout resources.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
[slight tweak to fix uninitialized 'prsc' in debug print]
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fixes the following piglit test on my VEGA and matches the behaviour in the
tgsi backend.
tests/spec/glsl-1.10/execution/samplers/glsl-fs-shadow2D-clamp-z.shader_test
Fixes: 625dcbbc45 ("amd/common: pass address components individually to ac_build_image_intrinsic")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The util helpers were looking for a non-void channels in a non-mixed
format and returning its snorm/unorm state. However, compressed formats
don't have non-void channels, so they always returned false. V3D wants to
use util_format_is_[su]norm for its border color clamping workarounds, so
fix the functions to return the right answer for these.
This now means that we ignore .is_mixed. I could retain the is_mixed
check, but it doesn't seem like a useful feature -- the only code I could
find that might care is freedreno's blit, which has some notes about how
things are wonky in this area anyway.
Reviewed-by: <Roland Scheidegger sroland@vmware.com>
These tests don't need swrast, so we can always enable them when
build_tests is set. Most of them run to successful completion quickly
(.9s on my SKL).
Reviewed-by: <Roland Scheidegger sroland@vmware.com>
Earlier commit aimed to remove unneeded function declarations. Namely
OpenGL entrypoints which are not applicable for OpenGLES*
Although it did not consider the shared glapi which needs all,
including hidden ones. Resulting in warning/errors like the following
../build/src/mapi/shared-glapi/glapi_mapi_tmp.h:26014:15:
error: no previous prototype for ‘shared_dispatch_stub_1414’ [-Werror=missing-prototypes]
This patch addressed that.
Cc: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reported-by: Eric Anholt <eric@anholt.net>
Fixes: 6148cce388 ("mapi: drop unneeded gl_dispatch_stub declarations")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
1ce5d757d0 dropped this limit.. which is probably the right thing to
do. But it results in an extra tiled->linear blit for glReadPixels()
(ie. dEQP/piglit) which is hitting some intermittent corruption (looks
like cache) on a6xx, causing a lot of spurious fails.
Since we are getting close to 19.0 branchpoint, re-instate this limit
for now, until the blitter problems are resolved.
Fixes: 1ce5d757d0 freedreno: core buffer modifier support
Signed-off-by: Rob Clark <robdclark@gmail.com>
This reverts commits 00ad77b9f6 and
5334dafee2.
This avoids Appveyor build breakage due to Cygwin, but more importantly,
there are several problems with these patches, as highlighted to my
recent mesa-dev mail. So better to revert for now, and pursue Cygwin
support after these have been address.
It should be incremented by one according to
how it is calculated by 'emit_vertex_buffer_state':
"\#if GEN_GEN < 8
.BufferAccessType = step_rate ? INSTANCEDATA : VERTEXDATA,
.InstanceDataStepRate = step_rate,
\#if GEN_GEN >= 5
.EndAddress = ro_bo(bo, end_offset - 1),
\#endif
\#endif"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109449
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This mirrors what autotools does in src/gallium/state_trackers/vdpau/Makefile.am
and src/gallium/targets/vdpau/Makefile.am:
VDPAU_MAJOR = 1
VDPAU_MINOR = 0
libvdpau_gallium_la_LDFLAGS = -version-number $(VDPAU_MAJOR):$(VDPAU_MINOR)
Reported-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Fixes: 68076b8747 "meson: build gallium vdpau state tracker"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Accessing bo->map and then pool->center_bo_offset without a lock is
racy. One way of avoiding such race condition is to store the bo->map +
center_bo_offset into pool->map at the time the block pool is growing,
which happens within a lock.
v2: Only set pool->map if not using softpin (Jason).
v3: Move things around and only update center_bo_offset if not using
softpin too (Jason).
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reported-by: Ian Romanick <idr@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109442
Fixes: fc3f588320
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
ICC tries to be helpful by not erroring when it sees something that it
doesn't understand, which is completely the opposite of helpful. Meson
0.49.0 does much better at handling this by really trying to make ICC
error, but there are some things in mesa that still get ignored until
0.49.1
v2: - Fix id check, which is 'intel' not 'icc'
Cc: 18.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
This is a bit fragile, as the way this "fixes" the check is to move the
one that we know is correct before the one that is incorrectly reported
as working. In meson 0.49.1 (which isn't out yet) this is fixed that the
incorrect check is reported as a failure.
Fixes: e0b037d697
("meson: Build SWR driver")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109129
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
LLVM uses the single instruction "FRINTI" to implement llvm.nearbyint.
Fixes the rounding tests of lp_test_arit.
Bug: https://bugs.gentoo.org/665570
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
NEON (now called ASIMD) is available on all aarch64 CPUs. Our code was
missing an aarch64 path, leading to util_cpu_caps.has_neon always being
false on aarch64.
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes the drisw paths to use the new shm2 interface, so that
we don't trigger the X server overflow checks when the x offset is non-zero.
This just hides the versioning in drisw, and either passes the src_x
or adds the offset fixup for the fallback path.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
This adds a new interface to the swrast interface to fix an shm put image bug.
The current code adds the x,y src offsets into the offset parameters,
however if the x offset is > 0, and the put image copies up to the height
of the image, this can trigger an X server validation check to fail and
the renderering to get BadMatch.
This patch fixes it to pass the x offset coord in as a src x.
We cannot pass the Y coordinate due to the horrible code mangling the
image w/h vs stride in swrastXPutImage.
v2: drop srcx,y from api
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
With the previous scripts API from the following was incorrectly
exported. Drop them from the list, since they're no longer around.
GL_EXT_blend_func_extended
GL_EXT_texture_integer
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Now we use the upstream XML file and a cleaner generator. Thus the
symbols are no longer exported and we can drop them from this list.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
As some point in the past we fixed the scripts so, these are no longer
exported. Drop them from the list.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
This reverts commit a1f5d9412cf7cacb3534635f6c2409fafbe6574e.
We no longer needed to sort - it was meant only to ease compare against
the old generated files.
The output produced functionally identical, with the following changes:
- A cosmetic: swapped ABI compatible types [ GLclampf -> GLfloat, etc ]
- B cosmetic: renamed parameters [ zNear -> n, etc ]
- C dropped extension entrypoints - invalid/incorrect
To make things easier to validate, normalise both old/new headers run
the sed patterns A, B and C to both sets.
A
s/\<GLclampf\>/GLfloat/g; s/\<GLclampx\>/GLfixed/g;
s/\<GLvoid\>/void/g;
B
s/\ \* / */g; s/\<texture\>/target/g;
s/\<plane\>/p/g; s/\<depth\>/d/g; s/\<modeAlpha\>/modeA/g;
s/\<shader\>/program/g; s/\<obj\>/shaders/g; s/\<equation\>/eqn/g;
s/\<param\>/data/g; s/\<params\>/data/g; s/\<buffers\>/buffer/g;
s/\<src\>/mode/g; s/\<count\>/n/g; s/\<zNear\>/n/g; s/\<zFar\>/f/g;
s/\<zfail\>/dpfail/g; s/\<zpass\>/dppass/g; s/\<buf\>/index/g;
s/\<value\>/target/g; s/\<cap\>/target/g; s/\<maskNumber\>/index/g;
s/\<srcRGB\>/sfactorRGB/g; s/\<dstRGB\>/dfactorRGB/g;
s/\<srcAlpha\>/sfactorAlpha/g; s/\<dstAlpha\>/dfactorAlpha/g;
s/\<primitiveMode\>/mode/g; s/\<primcount\>/instancecount/g;
s/\<top\>/t/g; s/\<bottom\>/b/g; s/\<left\>/l/g; s/\<right\>/r/g;
s/\<x\>/v0/g; s/\<y\>/v1/g; s/\<z\>/v2/g; s/\<w\>/v3/g;
s/\<sfactor\>/mode/g; s/\<dfactor\>/dst/g; s/\<attribindex\>/bindingindex/g;
s/\<internalFormat\>/internalformat/g; s/\<bufSize\>/bufsize/g;
C
glMultiDrawArraysEXT
glMultiDrawElementsEXT
glBindFragDataLocationEXT
glGetTexParameterIivEXT
glGetTexParameterIuivEXT
glTexParameterIivEXT
glTexParameterIuivEXT
v2:
- gl_dispatch_stub declarations are addressed with previous patch
- the public_entries table is no longer generated
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
We already do it a few lines above - drop the duplicate.
Note that for consistency sake, we keep the substitution since the GL
API is a mixed bad - some use GLvoid while others a normal void.
We might want to merge this back in GLVND.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
This way we can reuse the latter, which is already present in the
headers that we use. Thus we can drop the manual typedef we generate.
We might want to merge this back in GLVND.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
There is no need for the noop functions, the public_stubs and
public_entries table or table size defines. Remove those.
Pretty much all of this is applicable to GLVND, although it
requires preparatory work.
v2:
- python style fixes (Dylan)
- use "gldispatch" instead of not "glesv1" "glesv2"
- remove the public_entries table/array (Erik)
v3:
- use if == "gldispatch", instead of "in" (Kyle)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> (v2)
The only instance that requires the public_entries table is the
dispatch library - split that into another function.
We have to be careful with when undefining the guard, so split it out.
We might want to merge this back in GLVND.
Minor GLVND cleanup will be needed first.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Strictly speaking we can rework the rest of the code so we do not need
those. That said, this will require a series on it's own so let's carry
this local quirk for now.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Otherwise the incorrect ones will be used, effectively breaking the ABI.
Note: some entries in static_data.py list a suffixed API, while (for ES*
at least) we expect the one w/o suffix.
v2:
- rework path handling (Dylan)
- use else if chain (Erik)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Currently we have over 20 scripts that generate the libGL* dispatch and
various other functionality. More importantly we're using local XML
files instead of the Khronos provides one(s). Resulting in an
increasing complexity of writing, maintaining and bugfixing.
One fairly annoying bug is handling of statically exported symbols.
Today, if we enable a GL extension for GLES1/2, we add a special tag to
the xml. Thus the ES dispatch gets generated, but also since we have no
separate notion of GL/ES1/ES2 static functions it also gets exported
statically.
This commit adds step one towards clearing and simplifying our setup.
It imports the mapi generator from GLVND.
012fe39 ("Remove a couple of duplicate typedefs.")
v2: use local genCommon.py
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
The helper will also be used by the new Khronos gl.xml aware generator.
v2: Move existing one, instead of duplicating it.
v3: Correct genCommon.py references in meson [Erik]
v4: Drop the file from the EGL EXTRA_DIST [Erik]
Suggested-by: Kyle Brenneman <kbrenneman@nvidia.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Currently various parts of mesa use the glapi_table differently.
Some use _glapi_get_proc_offset() to get the offset, while others
directly reference the specific offset via _gloffset_Function.
Add all static entries, to ensure things don't break as we flip to the
upstream XML + new mapi generator.
Note: the offsets are also used for the alias remap table, thus we need
to ensure we honour the correct offsets range or it will break.
Currently this is done via MAX_OFFSETS constant, although a better
solution is in the works.
v2: add FramebufferTexture2DMultisampleEXT
v3: add MAX_OFFSETS guard
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> (v1)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This reverts commit f1998e15ff.
This changes the ABI, such that glGetnTexImageARB entry-point from the
GLAPI gets removed. Thus accessing many functions by offset (as we do)
will result in getting the wrong one.
Follow-up work will swap the by-offset handling, but for now revert
this patch.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
These declarations are not used anywhere - be that generated code or
otherwise.
[Emil: format the hunk from Erik into a patch]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
With Windows in mind, using forward slash isn't the right thing to do.
Even if it just works, we might want to fix it.
As here, use __file__ instead of argv[0] and sys.path.insert over
sys.path.append. With the path tweak being reportedly faster.
Suggested-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Cheating a tiny bit as these headers aren't in the Khronos repo yet, but
I expect them to be within a couple days.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
emit_uniformize() emits SHADER_OPCODE_FIND_LIVE_CHANNEL with its
flag_subreg set, so that the IR knows which flag is accessed. However
the flag is only used on Gen7 in Align1 mode.
To avoid setting unnecessary bits in the instruction words, get the
information we need and reset the default flag register. This allows
round-tripping through the assembler/disassembler.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This reverts commit ff621a5055.
with default warnings configuration, this commit generates:
../src/egl/main/eglapi.c:2654:1: error: no previous prototype for
‘eglGetDisplayDriverConfig’ [-Werror=missing-prototypes]
When this functionality was added, the PRIMITIVES_GENERATED query was
accidentally omitted. This causes issues for drivers that support
transform feedback."
Fixes: d644698b44 ("gallium: Add the ability to query a single
pipeline statistics counter")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Sending MRs from the main Mesa repository increase clutter in the
repository, and decrease visibility of project-wide branches. So it's
better if MRs are sent from forks instead.
Let's add a note about this, in case its not obvious to everyone.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixes an assert we start hitting with kms/gbm:
#0 0x0000007fbf3d6e3c in raise () from /lib64/libc.so.6
#1 0x0000007fbf3c4a68 in abort () from /lib64/libc.so.6
#2 0x0000007fbf3d04e8 in __assert_fail_base () from /lib64/libc.so.6
#3 0x0000007fbf3d0550 in __assert_fail () from /lib64/libc.so.6
#4 0x0000007fbf5a73c4 in gbm_dri_bo_create (gbm=0x5820f0, width=2160, height=1440, format=875713112, usage=0, modifiers=0x695e00, count=1) at ../src/gbm/backends/dri/gbm_dri.c:1150
#5 0x0000007fbf5a49c4 in gbm_bo_create_with_modifiers (gbm=0x5820f0, width=2160, height=1440, format=875713112, modifiers=0x695e00, count=1) at ../src/gbm/main/gbm.c:491
#6 0x0000007fbbac3d64 in get_back_bo (dri2_surf=0x6f4cc0) at ../src/egl/drivers/dri2/platform_drm.c:258
#7 0x0000007fbbac4318 in dri2_drm_image_get_buffers (driDrawable=0x704490, format=4098, stamp=0x6fc730, loaderPrivate=0x6f4cc0, buffer_mask=1, buffers=0x7fffffe210) at ../src/egl/drivers/dri2/platform_drm.c:409
#8 0x0000007fbf5a5318 in image_get_buffers (driDrawable=0x704490, format=4098, stamp=0x6fc730, loaderPrivate=0x70e150, buffer_mask=1, buffers=0x7fffffe210) at ../src/gbm/backends/dri/gbm_dri.c:135
#9 0x0000007fbe4308c4 in dri_image_drawable_get_buffers (drawable=0x6fc730, images=0x7fffffe210, statts=0x6f2660, statts_count=1) at ../src/gallium/state_trackers/dri/dri2.c:339
#10 0x0000007fbe430c44 in dri2_allocate_textures (ctx=0x614b30, drawable=0x6fc730, statts=0x6f2660, statts_count=1) at ../src/gallium/state_trackers/dri/dri2.c:466
#11 0x0000007fbe435580 in dri_st_framebuffer_validate (stctx=0x714160, stfbi=0x6fc730, statts=0x6f2660, count=1, out=0x7fffffe3b8) at ../src/gallium/state_trackers/dri/dri_drawable.c:85
#12 0x0000007fbe7b2c84 in st_framebuffer_validate (stfb=0x6f2190, st=0x714160) at ../src/mesa/state_tracker/st_manager.c:222
#13 0x0000007fbe7b4884 in st_api_make_current (stapi=0x7fbf0430d8 <st_gl_api>, stctxi=0x714160, stdrawi=0x6fc730, streadi=0x6fc730) at ../src/mesa/state_tracker/st_manager.c:1074
#14 0x0000007fbe434f44 in dri_make_current (cPriv=0x703c20, driDrawPriv=0x704490, driReadPriv=0x704490) at ../src/gallium/state_trackers/dri/dri_context.c:301
#15 0x0000007fbe42c910 in driBindContext (pcp=0x703c20, pdp=0x704490, prp=0x704490) at ../src/mesa/drivers/dri/common/dri_util.c:579
#16 0x0000007fbbabab40 in dri2_make_current (drv=0x69d170, disp=0x69c6e0, dsurf=0x6f4cc0, rsurf=0x6f4cc0, ctx=0x70cb40) at ../src/egl/drivers/dri2/egl_dri2.c:1456
#17 0x0000007fbbaa8ef4 in eglMakeCurrent (dpy=0x69c6e0, draw=0x6f4cc0, read=0x6f4cc0, ctx=0x70cb40) at ../src/egl/main/eglapi.c:862
#18 0x0000007fbf5736ac in InternalMakeCurrentVendor (dpy=dpy@entry=0x614fb0, draw=draw@entry=0x6f4cc0, read=read@entry=0x6f4cc0, context=context@entry=0x70cb40, apiState=apiState@entry=0x6fc940, vendor=0x6975f0) at libegl.c:861
#19 0x0000007fbf573764 in InternalMakeCurrentDispatch (dpy=0x614fb0, draw=0x6f4cc0, read=0x6f4cc0, context=0x70cb40, vendor=0x6975f0) at libegl.c:630
#20 0x0000000000403640 in init_egl (egl=0x5805a8 <gl>, gbm=0x580528 <gbm>, samples=0) at ../common.c:263
#21 0x0000000000403c1c in init_cube_smooth (gbm=0x580528 <gbm>, samples=0) at ../cube-smooth.c:225
#22 0x0000000000408618 in main (argc=1, argv=0x7fffffe8d8) at ../kmscube.c:145
Fixes: 1ce5d757d0 freedreno: core buffer modifier support
Signed-off-by: Rob Clark <robdclark@gmail.com>
It's invalid to emit a ZPASS_DONE event on the compute queue, and
the fence BO is unused on the compute queue (ie. we don't flush
CB or DB caches).
This saves some space in the upload BO.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This parameter is actually useless as the immediate value
can always be zero.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For example, if a pipeline has two stages VS and FS. And if only
the fragment stage needs dynamic bindings, we shouldn't allocate
an extra user SGPR for the vertex stage. Of course, if the vertex
stage loads constants, it needs an user SGPR.
This should reduce the number of SET_SH_REG packets that are emitted.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In the Intel backend, it makes the most sense to treat gl_TessLevelInner
and gl_TessLevelOuter as ordinary shader inputs. For Radeon, it makes
more sense to treat them as system values which get special handling.
We already have a compiler option for this, but the Iris driver will
need a capability bit so we can set it appropriately.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We may have to flush the cache if there are any textures presently bound
that refer to the outgoing framebuffer. This is only checked at
validation time.
Fixes a number of dEQP-GLES3.functional.fbo.color.repeated_clear.sample.*
tests, which would bind a texture, then clear it while the binding was
in effect, and then render to a different texture. This seems legal
under the "no feedback loops" rule.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Normally modifiers take precendence over use flags, as they are more
explicit. But if the driver supports modifiers, but the xserver does
not, then we should fallback to the old mechanism of allocating a buffer
using 'use' flags.
Fixes: 069fdd5f9f
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Fixes a static assertion which broke the build.
Fixes: 3ee240890 "gallium: add SINT formats to have exact counterparts to SNORM formats"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Neha Bhende<bhenden@vmware.com>
Annoyingly, this requires that we implement integer division on the
command streamer. Fortunately, we're only ever dividing by constants so
we can use the mulh+add+shift trick and it's not as bad as it sounds.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In order to allow nir_gather_xfb_info to be used on OpenGL,
specifically ARB_gl_spirv.
So, from OpenGL 4.6 spec, section 11.1.2.1, "Output Variables":
"outputs specifying both an *XfbBuffer* and an *Offset* are
captured, while outputs not specifying both of these are not
captured. Values are captured each time the shader writes to such
a decorated object."
This implies that are captured if both are present, and not if one of
those are lacking. Technically, it doesn't explicitly point that
having just one or the other is a mistake. In some cases, glslang is
adding some extra XfbBuffer without XfbOffset around, and mentioning
that technically that is not a bug (see issue#1526)
And for the case of Vulkan, as the same glslang issue mentions, it is
not clear if that should be a mistake or not. But even if it is a
mistake, it is not really needed to be checked on the driver, and we
can let the validation layers to check that.
v2: simplify explicit_xfb_buffer and explicit_offset checks (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Before, we were double-counting the component slots when we had a dvec3
or dvec4. Instead, just add them in once and manually offset the
recorded output offset.
Fixes: 19064b8c "nir: Add a pass for gathering transform feedback info"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
If we have a transform feedback output like:
float[2] x2_out (VARYING_SLOT_VAR1.x, 0, 0)
which is lowered by nir_lower_io_arrays_to_elements to,
float x2_out (VARYING_SLOT_VAR1.x, 0, 0)
float x2_out@5 (VARYING_SLOT_VAR2.x, 0, 0)
We have to update the destination offset to avoid overwriting
the same value.
v2 (Jason Ekstrand):
- Compute the correct offsets for arrays of vectors and/or doubles
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
When a xfb buffer is explicitely declared on a varying
variable, we shouldn't remove it at link time.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Instead of setting interface_type to whatever the per-vertex type is, we
only set it on blocks. This allows later passes to tell the difference
between variables that are in blocks and those that aren't.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Instead of splitting every per-vertex struct, just split the ones that
are actually blocks. The reason for the split is so that we have
separate variables for separate locations, qualifiers, and builtin
decorations. The vulkan spec only allows these on members of blocks.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This seems to make the simulator happier. The early return wasn't
really protecting anything and the code that follows will happily
initialize the dummy element to STORE_0 and emit it.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Issue was hit with this configuration:
--disable-{egl,gbm} --with-platform=drm
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes: 3208fd2e46 ("configure: move platform handling further up")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Two cases:
* replacing srcs which refer to MOV instructions
* replacing MOVs used to write to exports
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
If we want to use a scalar instruction with two sources, both sources have
to be in the same register. This covers a common case by inserting a scalar
MOV into a previous instruction with only a vector alu instruction.
A better method would be to have the sources end up in the same register in
the first place, but when one source is a constant this is the only way.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
This patch replaces the a2xx TGSI compiler with a NIR compiler.
It also adds several new features:
-gl_FrontFacing, gl_FragCoord, gl_PointCoord, gl_PointSize
-control flow (including loops)
-texture related features (LOD/bias, cubemaps)
-filling scalar ALU slot when possible
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Take away const qualifier from return type of these functions as
-Wignored-qualifiers points out it is ignored for these cases.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
vtn supports these, so don't squalk if user is happy with enabling
these.
v2: add new members sorted
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Both the Intel and RADV people have been really bad about adding things
to the release notes. We should start actually paying attention.
Acked-by: Tapani Pälli <tapani.palli@intel.com>
It's common in some applications to bind a new graphics pipeline without
ending up changing any context registers.
This has a pipline have two command buffers: one for setting context
registers and one for everything else. The context register command buffer
is only emitted if it differs from the previous pipeline's.
v2: ensure late scissor emission is done when radv_emit_rbplus_state() is
called
v2: make use of cmd_buffer->state.workaround_scissor_bug
v3: rename "workaround_scissor_bug" to
"context_roll_without_scissor_emitted"
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
On a20x, set VGT_VERTEX_REUSE_BLOCK_CNTL to 2 and don't change it. Small
rearrangement on a220 to reduce the size of draw commands.
Only set DEALLOC_CNTL on a20x because the correct a220 value is not known.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Fixes cases where previous viewport values might case gmem2mem to fail.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Doesn't change much, but reduces the size of fd2_emit_state
gmem2mem does not need to change the value: no Z clipping on resolve
mem2gmem now needs to restore the common value after rendering
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Only 3 vertices are used so we can drop the data for vertex 4
It doesn't make sense to have 1.1 for some coordinates, use 1.0 instead
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
v2: add assert to verify we have at least one valid bit_size
v3: fix use of load_front_face in nir_lower_two_sided_color and tgsi_to_nir
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
With OpenCL some system values match the address bits, but in GLSL we also
have some system values being 64 bit like subgroup masks.
With this it is possible to adjust the builder functions so that depending
on the bit_sizes the correct bit_size is used or an additional argument is
added in case of multiple possible values.
v2: validate dest bit_size
v3: generate hex values in python code
remove useless imports
rename and move bit_sizes
v4: add 1 to legal bit_sizes for front_face
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
fixes a couple of deqp tests (on nvc0 and potential other drivers):
dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_1
dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_2
dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_3
dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_1
dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_2
dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_3
dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_1
dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_2
dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_3
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
A little bit of explanation regarding how vkCmdPipelineBarrier()
works.
v2: Avoid referring to data port cache when it's actually sampler
caches (Jason)
Complete explanation for indirect draws (Jason)
v3: s/samplers/sampler/ (Jason)
s/UBOs/data port/
Add documentation for VK_ACCESS_CONDITIONAL_RENDERING_READ_BIT_EXT (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
In commit 9a7b319903 ("anv/query: flush render target before
copying results") we tracked all the render target writes to apply a
flushes in the vkCopyQueryResults(). But we can narrow this down to
only when we write a buffer (which is the only input of
vkCopyQueryResults).
v2: Drop newer render target write flags introduce by 1952fd8d2c
("anv: Implement VK_EXT_conditional_rendering for gen 7.5+")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Currently we only add a cache key for a shader once it is linked.
However games like Team Fortress 2 compile a whole bunch of shaders
which are never actually linked. These compiled shaders can take
up a bunch of memory.
This patch changes things so that we add the key for the shader to
the cache as soon as it is compiled. This means on a warm cache we
can avoid the wasted memory from these shaders. Worst case scenario
is we need to compile the shaders at link time but this can happen
anyway if the shader has been evicted from the cache.
Reduces memory use in Team Fortress 2 from 1.3GB -> 770MB on a
warm cache from start up to the game menu.
V2: only add key to cache when compilation is successful.
Acked-by: Marek Olšák <marek.olsak@amd.com>
The docs are fairly incomplete and inconsistent about it, but this
seems to be the reason why half-float destinations are required to be
DWORD-aligned on BDW+ projects. This way the regioning lowering pass
will make sure that the destination components of W to HF and HF to W
conversions are aligned like the corresponding conversion operation
with 32-bit execution data type.
Tested-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This builds on the recent interpolate fix by Rhys ee8488ea3b.
This fixes the arb_gpu_shader5 interpolateAt* tests that contain
arrays.
Fixes: ee8488ea3b ("ac/nir,radv,radeonsi/nir: use correct indices for interpolation intrinsics")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The color swap isn't available for tiled formats and it's not needed
either. We pick one channel order and use for all non-linear formats.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Staging blit downloads would wait on the src resource instead of the
staging resource and didn't make sure to submit the blit batch first.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Currently we only add a cache key for a shader once it is linked.
However games like Team Fortress 2 compile a whole bunch of shaders
which are never actually linked. These compiled shaders can take
up a bunch of memory.
This patch changes things so that we add the key for the shader to
the cache as soon as it is compiled. This means on a warm cache we
can avoid the wasted memory from these shaders. Worst case scenario
is we need to compile the shaders at link time but this can happen
anyway if the shader has been evicted from the cache.
Reduces memory use in Team Fortress 2 from 1.3GB -> 770MB on a
warm cache from start up to the game menu.
Acked-by: Marek Olšák <marek.olsak@amd.com>
This basically reverts c2bc0aa7b1.
By running the opts we reduce memory using in Team Fortress 2
from 1.5GB -> 1.3GB from start-up to game menu.
This will likely increase Deus Ex start up times as per commit
c2bc0aa7b1. However currently 32bit games like Team Fortress 2
can run out of memory on low memory systems, so that seems more
important.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Passes' function names, separated by comma, listed in NIR_SKIP
environment variable will be skipped in debug mode. The mechanism is
hooked into the _PASS macro, like NIR_PRINT.
The extra macro NIR_SKIP is available as a developer convenience, to
skip at pointer other than the passes entry points.
v2: Fix typo in NIR_SKIP macro. (Bas)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Conditional rendering affects next functions:
- vkCmdDraw, vkCmdDrawIndexed, vkCmdDrawIndirect, vkCmdDrawIndexedIndirect
- vkCmdDrawIndirectCountKHR, vkCmdDrawIndexedIndirectCountKHR
- vkCmdDispatch, vkCmdDispatchIndirect, vkCmdDispatchBase
- vkCmdClearAttachments
Value from conditional buffer is cached into designated register,
MI_PREDICATE is emitted every time conditional rendering is enabled
and command requires it.
v2: by Jason Ekstrand
- Use vk_find_struct_const instead of manually looping
- Move draw count loading to prepare function
- Zero the top 32-bits of MI_ALU_REG15
v3: Apply pipeline flush before accessing conditional buffer
(The issue was found by Samuel Iglesias)
v4: - Remove support of Haswell due to possible hardware bug
- Made TMP_REG_PREDICATE and TMP_REG_DRAW_COUNT defines to
define registers in one place.
v5: thanks to Jason Ekstrand and Lionel Landwerlin
- Workaround the fact that MI_PREDICATE_RESULT is not
accessible on Haswell by manually calculating
MI_PREDICATE_RESULT and re-emitting MI_PREDICATE
when necessary.
v6: suggested by Lionel Landwerlin
- Instead of calculating the result of predicate once - re-emit
MI_PREDICATE to make it easier to investigate error states.
v7: suggested by Jason
- Make anv_pipe_invalidate_bits_for_access_flag add CS_STALL
if VK_ACCESS_CONDITIONAL_RENDERING_READ_BIT is set.
v8: suggested by Lionel
- Precompute conditional predicate's result to
support secondary command buffers.
- Make prepare_for_draw_count_predicate more readable.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: by Jason Ekstrand
- Move out of the draw loop population of registers
which aren't changed in it.
- Remove dependency on ALU registers.
- Clarify usage of PIPE_CONTROL
- Without usage of ALU registers patch works for gen7+
v3: set pending_pipe_bits |= ANV_PIPE_RENDER_TARGET_WRITES
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Native file support in command line serialization isn't present in meson
0.49, but will be for 0.49.1 and 0.50
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
In some shaders, you can end up with a stride in the source of a
SHADER_OPCODE_MULH. One way this can happen is if the MULH is acting on
the top bits of a 64-bit value due to 64-bit integer lowering. In this
case, the compiler will produce something like this:
mul(8) acc0<1>UD g5<8,4,2>UD 0x0004UW { align1 1Q };
mach(8) g6<1>UD g5<8,4,2>UD 0x00000004UD { align1 1Q AccWrEnable };
The new region fixup pass looks at the MUL and sees a strided source and
unstrided destination and determines that the sequence is illegal. It
then attempts to fix the illegal stride by replacing the destination of
the MUL with a temporary and emitting a MOV into the accumulator:
mul(8) g9<2>UD g5<8,4,2>UD 0x0004UW { align1 1Q };
mov(8) acc0<1>UD g9<8,4,2>UD { align1 1Q };
mach(8) g6<1>UD g5<8,4,2>UD 0x00000004UD { align1 1Q AccWrEnable };
Unfortunately, this new sequence isn't correct because MOV accesses the
accumulator with a different precision to MUL and, instead of filling
the bottom 32 bits with the source and zeroing the top 32 bits, it
leaves the top 32 (or maybe 31) bits alone and full of garbage. When
the MACH comes along and tries to complete the multiplication, the
result is correct in the bottom 32 bits (which we throw away) and
garbage in the top 32 bits which are actually returned by MACH.
This commit does two things: First, it adds an assert to ensure that we
don't try to rewrite accumulator destinations of MUL instructions so we
can avoid this precision issue. Second, it modifies
required_dst_byte_stride to require a tightly packed stride so that we
fix up the sources instead and the actual code which gets emitted is
this:
mov(8) g9<1>UD g5<8,4,2>UD { align1 1Q };
mul(8) acc0<1>UD g9<8,8,1>UD 0x0004UW { align1 1Q };
mach(8) g6<1>UD g5<8,4,2>UD 0x00000004UD { align1 1Q AccWrEnable };
Fixes: efa4e4bc5f "intel/fs: Introduce regioning lowering pass"
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
For a long time, we based exec sizes on destination register widths.
We've not been doing that since 1ca3a94427 but a few remnants
accidentally remained.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Otherwise writes get propagated across atomics if no barrier is
used. Without barrier writes should still be visible in the same
invocation, so an atomic has to be considered a write.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b3c6146925 "nir: Copy propagation between blocks"
Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
Add a test that checks that we can use the extra space allocated for
padding while allocating larger anv_states.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If softpin is supported, create new BOs for the required size and add the
respective BO maps. The other main change of this commit is that
anv_block_pool_map() now returns the map for the BO that the given
offset is part of. So there's no block_pool->map access anymore (when
softpin is used.
v3:
- set fd to -1 on softpin case (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We are not going to use userptr for anv block pool BOs anymore. However,
so far we have been relying on the fact that userptr BOs are snooped on
non-llc platforms. Let's make sure that the block pool BOs are still
snooped, and we can also remove the clflush'ing that we do on all state
buffers.
And since we plan to remove the flushes, set the anv_bo_pool BOs to
cached (snooped on non-LLC platforms) too. For LLC platforms, they are
all cached by default, so this becomes a no-op.
v5:
- Add snooping to anv_bo_pool BOs too (Jason).
- Remove anv_gem_set_domain.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It's possible that we still have some space left in the block pool, but
we try to allocate a state larger than that state. This means such state
would start somewhere within the range of the old block_pool, and end
after that range, within the range of the new size.
That's fine when we use userptr, since the memory in the block pool is
CPU mapped continuously. However, by the end of this series, we will
have the block_pool split into different BOs, with different CPU
mapping ranges that are not necessarily continuous. So we must avoid
such case of a given state being part of two different BOs in the block
pool.
This commit solves the issue by detecting that we are growing the
block_pool even though we are not at the end of the range. If that
happens, we don't use the space left at the end of the old size, and
consider it as "padding" that can't be used in the allocation. We update
the size requested from the block pool to take the padding into account,
and return the offset after the padding, which happens to be at the
start of the new address range.
Additionally, we return the amount of padding we used, so the caller
knows that this happens and can return that padding back into a list of
free states, that can be reused later. This way we hopefully don't waste
any space, but also avoid having a state split between two different
BOs.
v3:
- Calculate offset + padding at anv_block_pool_alloc_new (Jason).
v4:
- Remove extra "leftover".
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This commit tries to rework the code that split and returns chunks back
to the state pool, while still keeping the same logic.
The original code would get a chunk larger than we need and split it
into pool->block_size. Then it would return all but the first one, and
would split that first one into alloc_size chunks. Then it would keep
the first one (for the allocation), and return the others back to the
pool.
The new anv_state_pool_return_chunk() function will take a chunk (with
the alloc_size part removed), and a small_size hint. It then splits that
chunk into pool->block_size'd chunks, and if there's some space still
left, split that into small_size chunks. small_size in this case is the
same size as alloc_size.
The idea is to keep the same logic, but make it in a way we can reuse it
to return other chunks to the pool when we are growing the buffer.
v2:
- Include Jason's suggestions to the algorithm that returns chunks.
- Update comments.
v3:
- Disallow returning 0 blocks (Jason).
- fix min_size in the loop (Jason).
- remove temporary variables (Jason)
v4:
- return_chunk() should never return blocks larger than
pool->block_size.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We now have multiple BOs in the block pool, but sometimes we still
reference only the first one in some instructions, and use relative
offsets in others. So we must be sure to add all the BOs from the block
pool to the validation list when submitting commands.
v2:
- Don't add block pool BOs to the dependency list right before
execbuf (Jason)
- Call anv_execbuf_add_bo() to each BO in the block pools (Jason)
- Use anv_execbuf_add_bo_set() to add surface state dependencies to
execbuf.
v3:
- Add comment to the non-softpin case (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This part of the anv_execbuf_add_bo() code is totally independent of the
BO being added. Let's split it out, so we can reuse it later.
v3: rename to anv_execbuf_add_bo_set (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
So far we use only one BO (the last one created) in the block pool. When
we switch to not use the userptr API, we will need multiple BOs. So add
code now to store multiple BOs in the block pool.
This has several implications, the main one being that we can't use
pool->map as before. For that reason we update the getter to find which
BO a given offset is part of, and return the respective map.
v3:
- Simplify anv_block_pool_map (Jason).
- Use fixed size array for anv_bo's (Jason)
v4:
- Respect the order (item, container) in anv_block_pool_foreach_bo
(Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Change block_pool->bo to be a pointer, and update its usage everywhere.
This makes it simpler to switch it later to a list of BOs.
v3:
- Use a static "bos" field in the struct, instead of malloc'ing it.
This will be later changed to a fixed length array of BOs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
After switching to using anv_state_table, there are very few places left
still using pool->map directly. We want to avoid that because it won't
be always the right map once we split it into multiple BOs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Use anv_state_pool_return_blocks() to return blocks to the pool, instead
of manually pushing them.
v3:
- return blocks from the end of the chunk (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The use of anv_state_table_add() combined with anv_state_table_push(),
specially when adding a bunch of states to the table, is very verbose.
So we add this helper that makes things easier to digest.
We also already add the anv_state_table member in this commit, so things
can compile properly, even though it's not used.
v2: assert that the states are always aligned to their size (Jason)
v3: Add "table" member to anv_state_pool in this commit.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We will need the anv_block_pool_map to find the map relative to some BO
that is not at the start of the block pool.
v2: just return a pointer instead of a struct (Jason)
v4: Update comment (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Add a structure to hold anv_states. This table will initially be used to
recycle anv_states, instead of relying on a linked list implemented in
GPU memory. Later it could be used so that all anv_states just point to
the content of this struct, instead of making copies of anv_states
everywhere.
One has to call anv_state_table_add(), which returns an index for the
state in the table, and then get a pointer to such index, and finally
fill in the rest of the struct.
TODO:
1) There's a lot of common code between this table backing store
memory and the anv_block_pool buffer, due to how we grow it. I think
it's possible to refactory this and reuse code on both places.
2) Add unit tests.
v3:
- Rename state table memfd (Jason)
- Return VK_ERROR_OUT_OF_HOST_MEMORY on more places (Jason)
- anv_state_table_grow returns VkResult (Jason)
- Rename variables to be more informative (Jason)
- Return errors on state table grow.
- Rename anv_state_table_push/pop to anv_free_list_push2/pop2
This will be renamed again to remove the trailing "2" later.
v4:
- Remove exit(-1) from anv_state_table (Jason).
- Use uint32_t "next" field in anv_free_entry (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There were 2 problems with this test.
First it was comparing highest, which was -1, with an uint32_t. So the
current value would never be higher than that, and the assert would
always be false. It just never reached this point because of the next
problem.
It was always looking for the highest value of each thread and storing
it in thread_max. So a test case like this wouldn't work:
[Thread]: [Blocks]
[0]: [0, 32, 64, 96]
[1]: [128, 160, 192, 224]
[2]: [256, 288, 320, 352]
Not only that would skip values and iterate only over thread number 2,
instead of walking through all of them, but thread_max was also
initialized to -1. And then compared to unsigned blocks[i][next[i].
We fix that by getting the smallest value of each thread, and checking
if it is lower than thread_min, which is initialized to INT32_MAX. And
then we end up walking through all the blocks of all threads. We also
change "blocks" to be int32_t instead of uint32_t, since in some places
(alloc_blocks) it was already referenced as int32_t, and that fixes the
comparison to -1.
v2:
- keep highest initialized to -1, and change blocks to be int32_t.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We had defined MAX_IMAGES as 8, which we used to size the array for
image push constant data. The comment there stated that this was for
gen8, but anv_nir_apply_pipeline_layout runs for all gens and writes
that array, asserting that we don't exceed that number of images,
which imposes a limit of MAX_IMAGES on all gens.
Furthermore, despite this, we are exposing up to 64 images per shader
stage on all gens, gen8 included.
This patch lowers the number of images we expose in gen8 to 8 and
keeps 64 images for gen9+ while making sure that only pre-SKL gens
use push constant space to handle images.
v2:
- <= instead of < in the assert (Eric, Lionel)
- Change the way the assertion is written (Eric)
v3:
- Revert the way the assertion is written to the form it had in v1,
the version in v2 was not equivalent and was incorrect. (Lionel)
v4:
- gen9+ doesn't need push constants for images at all (Jason)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v3)
Fixes following failing vk-gl-cts cases on Linux desktop:
dEQP-VK.api.external.memory.android_hardware_buffer.suballocated.buffer.info
dEQP-VK.api.external.memory.android_hardware_buffer.suballocated.image.info
dEQP-VK.api.external.memory.android_hardware_buffer.dedicated.image.info
dEQP-VK.api.external.memory.android_hardware_buffer.dedicated.buffer.info
Fixes: 517103abf1 "anv/android: add ahardwarebuffer external memory properties"
Reported-by: Juan A. Suarez <jasuarez@igalia.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Noticed while debugging V3D -- the ro->gpu_fd was freshly opened in ro
setup, and it needs to stay open until screen close (since it may be used
by renderonly) and should be the same one used by the vc4 screen.
Fixes: 7029ec05e2 ("gallium: Add renderonly-based support for pl111+vc4.")
The CTS was running out of fds, because of the ro->gpu_fd never being
closed. ro->gpu_fd should match the screen (in case the caller of
v3d_drm_screen_create_renderonly() has a scanout_for_resource() that uses
gpu_fd) and the screen is expected to close its fd at the end, fixing the
resource leak.
Fixes: e113b21cb7 ("v3d: Add renderonly support.")
I had bugs in the old path where I was laying out as tiled (so we'd render
tiled) but then only allocating space in the shared object for linear
rendering. The resource_from_handle makes it so the same layout choices
are made in both the import and export scanout cases. Also, fixes a leak
of the fd that was tripping up the CTS.
Now that we're checking PIPE_BIND_SHARED to choose to use RO, the
DRM_FORMAT_MOD_LINEAR check wasn't needed any more.
Fixes visual corruption and MMU faults in X in renderonly mode.
Fixes: bd09bb1629 ("v3d: SHARED but not necessarily SCANOUT buffers on RO must be linear.")
The current code only strips off arrays and cannot find the type
for images that are struct members.
Instead of trying to get the image type from the variable, we just
get it directly from the deref instruction.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This matches the behaviour of AMDVLK and hides banding.
It is also seems to be allowed by the Vulkan spec.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This improves cache filesystem performance, especially during CI tests
Also updated jitcache magic number due to codegen parameter changes
Removed 2 `if constexpr` to prevent C++17 requirement
Avoids confusion with other defaulted integer parameters
- fixed some unspecified usages
- removed unnecessary includes
- removed unecessary protected access specifier in buckets framework
- added graphics address translation in odd gathers
- added support for unaligned gathers in fetch shader
- changed how 2+ GB offsets are handled to make them compatible with
unaligned offsets
- updated sample from TRTT surfaces correctly
- implemented mapped status return for TRTT surfaces
- implemented per-sample instruction minLod clamp
- updated bilinear filter weight calculation to be closer to D3D specs
- implemented "ReducedTexcoordRange" operation from D3D specs to avoid
loss of precision on high-value normalized coordinates
This was already enabled for gallium based osmesa with gallium drivers
in 9d10581897, so do the same for classic
driver with classic osmesa.
Fixes: cbbd5bb889
("meson: build classic osmesa")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Various recreation scenarios lead to API thread getting stuck in
swr_fence_finish(). This is a multi-context issue, whereby one context
overwrites the fence read-value with a previous sync's lesser value.
The fence sync value is supposed to be always increasing.
In swr_fence_cb(), only update the "read" value if the new value is
greater.
(This may seem like we're not waiting on the other context to finish, but
had we needed for it to finish there would have been a wait prior to
submitting a new sync.)
cc: mesa-stable@lists.freedesktop.org
Intel's blending hardware does not properly return 1.0 for destination
alpha for RGBX formats; it requires the factors to be overridden to
either zero or one. Broadcom vc4 and v3d also could use this override.
While overriding these factors is safe in general, Nouveau and Radeon
would prefer not to. Their blending hardware already returns correct
values for RGB/RGBX formats, and would like to avoid the resulting
per-buffer blending and independent blend factors (rgb != a) since it
can cause additional overhead.
I considered simply handling this in the driver, but it's not as nice.
pipe_blend_state doesn't have any format information, so we'd need the
hardware blend state to depend on both pipe_blend_state and
pipe_framebuffer_state. Furthermore, Intel GPUs don't have a native
RGBX_SNORM format, so I avoid exposing one, which makes Gallium fall
back to RGBA_SNORM. The pipe_surfaces we get in the driver have an RGBA
format, making it impossible to tell that there shouldn't be an alpha
channel. One could argue that st not handling it in that case is a bug.
To work around this, we'd have to expose RGBX pipe formats, mapped to
RGBA hardware formats, and add format swizzling special cases. All
doable, but it ends up being more code than I'd like.
st_atom_blend already has access to the right information and it's
trivial to accomplish there, so we just add a cap bit and do that.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
`with_gallium_icd` is never used throughout the different Meson build
files, whereas `with_opencl_icd` tracks whether or not `gallium-opencl`
was set to "icd".
Fixes: 42ea0631f1
("meson: build clover")
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Gallium historically has treated pipeline statistics queries as a single
query, PIPE_QUERY_PIPELINE_STATISTICS, which returns a block of 11
values. This was originally patterned after the D3D1x API. Much later,
Brian introduced an OpenGL extension that exposed these counters - but
it exposes 11 separate queries, each of which returns a single value.
Today, st/mesa simply queries all 11 values, and returns a single value.
While pipeline statistics counters aren't typically performance
critical, this is still not a great fit. A D3D1x->GL translator might
request all 11 counters by creating 11 separate GL queries...which
Gallium would map to reads of all 11 values each time, resulting in a
total 121 counter reads. That's not ideal.
This patch adds a new cap, PIPE_CAP_QUERY_PIPELINE_STATISTICS_SINGLE,
and corresponding query type PIPE_QUERY_PIPELINE_STATISTICS_SINGLE.
When calling create_query(), q->index should be set to one of the
PIPE_STAT_QUERY_* enums to select a counter. Unlike the block query,
this returns the value in pipe_query_result::u64 (as it's a single
value) instead of the pipe_query_data_pipeline_statistics group.
We update st/mesa to expose ARB_pipeline_statistics_query if either
capability is set, preferring the new SINGLE variant when available.
Thanks to Roland, Ilia, and Marek for helping me sort this out.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This just changes the order of the switch statements, so we only
look at target if the query type is PIPE_QUERY_PIPELINE_STATISTICS.
The next commit will introduce a new SINGLE query type which can be
used for the same GL query types, and it won't want this processing.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Gallium handles pipeline statistics queries as a single query
(PIPE_QUERY_PIPELINE_STATISTICS) which returns a struct with 11 values.
Sometimes it's useful to refer to each of those values individually,
rather than as a group. To avoid hardcoding numbers, we define a new
enum for each value. Here, the name and enum value correspond to the
index in the struct pipe_query_data_pipeline_statistics result.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
A simple Vulkan extension that allows apps to query size and
usage of all exposed memory heaps.
The different usage values are not really accurate because
they are per drm-fd, but they should be close enough.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Looking at -pro we need to enable it for pipelines with just a
GS too.
This seems to reduce the hangs from
https://bugs.freedesktop.org/show_bug.cgi?id=109242 on a RX 550 to
the point where I can't reproduce, after the false start with the
wd_switch_on_eop patch due to flakiness.
(but people are reporting it does not fix the issue completely for
them on polaris 11)
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We get a payload for the ivec3 workgroup and an int local invocation
index, and we use the core lowering to turn into the global invocation id
and the local invocation id ivec3s.
This is only exposed on V3D 4.1+, because we didn't have the TMU write
operations for images on 3.3 (To do GLES 3.1 there, you have to lower it
to SSBO load/stores, which is a problem to solve later).
This was an arbitrary "we support lots of stuff" value when I started the
driver. However, at 400 we expose OES_gpu_shader5, which claims support
for dynamically indexing samplers, which the driver doesn't do yet.
We've been relying on linking splitting up our varying matrices into
separate vectors, but with SSO that doesn't happen. Supporting matrix
inputs isn't too hard, though.
Otherwise, the simulator raises the GMP interrupt and waits for it to be
handled, and v3d ends up spinning in v3d_hw_tick(). Aborting right when
violation happens gives us a chance to look at the backtrace of whatever
thread triggered the violation.
Fix crashes with
dEQP-VK.spirv_assembly.instruction.compute.workgroup_memory.*16
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Causes hangs on some machines.
What works for dEQP-VK.tessellation.shader_input_output.barrier:
- running num_patches = 6 (which limits LDS to 32 KiB)
- running num_patches = 8, and artificially cutting LDS size at 32 KiB.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
From @jekstrand's nir-1-bit-bool branch, with improved ior/inot lowering.
ior: fmax instead of fadd allows removing the fsat.
inot: seq(x, 0) can be better than fsub(1, x). On a2xx, it works better
with the scalar instruction set.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Replace calls to create hash tables and sets that use
_mesa_hash_pointer/_mesa_key_pointer_equal with the helpers
_mesa_pointer_hash_table_create() and _mesa_pointer_set_create().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Replace calls to create hash tables and sets that use
_mesa_hash_pointer/_mesa_key_pointer_equal with the helpers
_mesa_pointer_hash_table_create() and _mesa_pointer_set_create().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Eric Engestrom <eric@engestrom.ch>
These combinations are common enough and deserve a shortcut.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Eric Engestrom <eric@engestrom.ch>
This function is modeled after the aux_op functions except that it has a
lot more parameters because it deals with two images as well as source
and destination regions.
Function's out variable could be an array dereferenced by an array:
func(v[w[i]]);
or something more complicated.
Copy index in any case.
Fixes: 76c27e47b9 ("glsl: Copy function out to temp if we don't directly ref a variable")
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The original idea was that the backend compiler could eliminate
surfaces, so we would have it mark which ones are actually used,
then shrink the binding table accordingly. Unfortunately, it's a
pretty blunt mechanism - it can only prune things from the end,
not the middle - since we decide the layout before we even start
the backend compiler, and only limit the size. It also basically
gives up if it sees indirect array access.
Besides, we do the vast majority of our surface elimination in NIR
anyway, not the backend - and I don't see that trend changing any
time soon. Vulkan abandoned this plan a long time ago, and I don't
use it in Iris, but it's still been kicking around in i965.
I hacked shader-db to print the binding table size in bytes, and
observed no changes with this patch. So, this code appears to do
nothing useful.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
DeprecationWarning: the imp module is deprecated in favour of importlib
Instead of complicated logic, just import the file directly.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Instead of emitting all of the conditions for the cases of a switch
statement up-front, emit them on-the-fly as we emit the code for each
case. The original justification for this was that we were going to
have to build a default case anyway which would need them all. However,
we can just trust CSE to clean up the mess in that case. Emitting each
condition right before the if statement that uses it reduces register
pressure and, in one customer benchmark, reduces spilling and improves
performance by about 2x.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Even though no one's been brave enough to ever use this pass, I like to
keep it functionally working.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of applying the workaround universally, detect semi-old GLSLang
via the generator ID and only enable the workaround on old GLSLang.
This isn't nearly as precise as one would like it to be because the
first GLSLang generator id version bump was on October 7, 2017 which is
about 1.5 years after the bug was fixed. However, it at least lets us
disable it for non-GLSLang and for more modern versions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
A long time in a galaxy far far away, there was a GLSLang bug with how
it handled samplers passed in as function parameters. (The bug can be
found here: https://github.com/KhronosGroup/glslang/issues/179.)
Unfortunately, that version was shipped in several apps and has been
causing heartburn for our SPIR-V parser ever since.
Recent changes to NIR uncovered a moderately old bug in how we work
around this issue. In particular, we ended up with a deref_cast from
uniform to local which is not a no-op cast so nir_opt_deref wasn't
getting rid of the cast. The only reason why it worked before was
because someone just happened to call nir_fixup_deref_modes which
"fixed" the cast (that shouldn't be happening) and then a later round of
copy-prop would get rid of it. The fact that the deref_cast survived
that long without causing trouble for other parts of NIR is a bit
surprising.
Just whacking the mode of the pointer seems to fix it fairly
unobtrusively. Currently, only apps with this bug will have a local
variable containing an image or sampler.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109304
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
If the TCS and TES are linked together, we can simply replace the TES's
gl_PatchVerticesIn system value with a constant, possibly allowing extra
optimization or letting the driver avoid uploading a special value.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
With the new handling of bool types, the conversion to float in glsl_to_nir
should not apply to bool types anymore.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
out_type in the default cast case is always GLSL_TYPE_FLOAT, so we get a
mov otherwise.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
All alu instructions emitted with native_integers=false expect float
(or bool in some cases) constants, so this change is necessary.
This will cause changes with some intrinsics which had integer sources,
such as nir_intrinsic_load_uniform. Apparently it might cause issues with
some opt passes, but perhaps those don't apply in OpenGL ES 2.0 cases?
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The num_components value passed into get_mul_for_src is used to only
compose the parts of the swizzle that we know will be used so we don't
compose invalid swizzle components. However, we had a bug where we
passed the number of components of the add all the way through. For the
given source, we need the number of components read from that source.
In the case where we have a narrow add, say 2 components, that is
sourced from a chain of wider instructions, we may not compose all the
swizzles. All we really need to do is pass through the right number of
components at each level.
Fixes: 2231cf0ba3 "nir: Fix output swizzle in get_mul_for_src"
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GL_ARB_gl_spirv does not provide a sampler deref for e.g. texelFetch(),
so we can't assume that both are present and identical. Simply lower
each if it is present.
Fixes regressions in GL_ARB_gl_spirv tests since I switched everyone to
using this pass. Thanks to Alejandro Piñeiro for catching these.
Fixes: f003859f97 nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Alejandro Piñeiro <apinheiro@igalia.com>
Fixes the following errors:
usage: which [-as] program ...
/Users/travis/.travis/job_stages: line 110: --version: command not found
... caused by the use of an undefined $LLVM_CONFIG
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This will allow drivers to pin shader buffers if necessary.
i965 and anv do not need to do this today, but iris will.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently, BLORP expects drivers to provide two functions for dealing
with buffers: blorp_emit_reloc and blorp_surface_reloc. Both record a
relocation and combine the BO address and offset into a full 64-bit
address. Traditionally, blorp_surface_reloc has written that combined
address to an implicitly-known buffer where surface states are stored.
(In contrast, blorp_emit_reloc returns the value.)
The upcoming Iris driver stores surface states in multiple buffers,
which makes it impossible for blorp_surface_reloc to write the combined
address - it only takes an offset, not the actual buffer to write to.
This commit adds a third function, blorp_get_surface_address, which
combines and returns an address, which is then passed to ISL's surface
state fill functions. Softpin-only drivers can return a real address
here and skip writing it in blorp_surface_reloc. Relocation-based
drivers are have options. They can simply return 0 from the new
function, and continue writing the address from blorp_surface_reloc.
Or, they can return a presumed address from blorp_get_surface_address,
and have other relocation processing write the real value later.
For now, i965 and anv simply return 0.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Make sure that the next line starts with spaces so that bullets are
maintained throughout, add `` around a few more special tokens, and fix
SAMPLE_COUNT_TEXTURE -> SAMPLE_COUNT.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
In all GLSL ES versions output variables in fragment shader are allowed
to be invariant.
From Section 4.6.1 ("The Invariant Qualifier") GLSL ES 1.00 spec:
"Only the following variables may be declared as invariant:
...
- Built-in special variables output from the fragment shader."
From Section 4.6.1 ("The Invariant Qualifier") GLSL ES 3.00 spec:
"Only variables output from a shader can be candidates for invariance."
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107842
This adds a second level of caching for the pre-lowered NIR that's only
based off of the shader module, entrypoint and specialization constants.
This is enough for spirv_to_nir as well as our first round of lowering
and optimization. Caching at this level should allow for faster shader
recompiles due to state changes.
The NIR caching does not get serialized to disk via either the
VkPipelineCache serialization mechanism or the transparent on-disk
cache. We could but it's usually not that expensive to fall back to
SPIR-V for the odd cache miss especially if it only happens once for
several misses and it simplifies the cache.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The stuff hashed by anv_pipeline_hash_shader is exactly the inputs to
anv_shader_compile_to_nir so it can be used for NIR caching.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
They have glsl_sampler_dim enum values of 8 and 9 which don't work when
you & them with 0x7. Fortunately, we have plenty of bits.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Thanks to the new NIR load_descriptor intrinsic added by the UBO/SSBO
lowering series, we weren't getting UBO pushing because the UBO range
detection pass couldn't see the constants it needed. This fixes that
problem with a quick round of constant folding. Because we're folding
we no longer need to go out of our way to generate constants when we
lower the vulkan_resource_index intrinsic and we can make it a bit
simpler.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The last round of fixing 3d layer+level layout skipped the tiled case,
since tiled texture support was not in place yet. This finishes the
job.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This is known when the CSO is created, so no need to patch it in later.
Also, it seems like smaller textures where the first level is small
enough to be linear, it seems like we should set linear tile mode.
See: dEQP-GLES3.functional.texture.format.unsized.rgb_unsigned_byte_3d_pot
Signed-off-by: Rob Clark <robdclark@gmail.com>
Previously we'd use format/etc from the primary (z32) buffer for the
stencil (s8), due to confusion about rsc vs psurf. Rework this to drop
extra arg and push down handling of separate stencil case (and make sure
we take the fmt from the right place).
This doesn't completely fix separate-stencil, but at least it avoids the
GPU scribbling over random other cmdstream buffers and causing a bunch
of bogus fails in dEQP.
Signed-off-by: Rob Clark <robdclark@gmail.com>
If nothing else, this will make problems with cmdstream getting blit
over with pixels easier to track down (ie. faults when it first happens
rather than strange failures later from corrupted cmdstream when a
stateobj is later reused).
(NOTE this somewhat depends on the kernel supporting the flag, and the
iommu implementation. But the worst case is just that the cmdstream
ends up writeable as before.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
The check for location aliasing was always asuming output variables
but this validation is also called for input variables.
Fixes: e2abb75b0e ("glsl/linker: validate explicit locations for SSO programs")
Cc: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Patch moves intel_tiled_memcpy[_sse41] libraries to isl, renames some
functions and types and makes the required build system changes for
meson, automake and Android. No functional changes are introduced.
v2: code cleanups, move isl_get_memcpy_type to i965 (Jason)
v3: move isl_mem_copy_fn to priv header, cleanups (Jason, Dylan)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we have software implementations of ARB_gpu_shader_int64 and
ARB_gpu_shader_fp64 we can unconditionally enable these extensions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Shaders containing software implementations of double-precision
operations can be very large such that we cannot stack-allocate
an array of grf_count*16.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Shaders containing software implementations of double-precision
operations can be very large such that we have more the 2^16 virtual
registers during optimization.
Move the 'nr' field to the union containing the immediate storage and
expand it to 32-bits.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The next patch replaces an unsigned bitfield with a plain unsigned,
which triggers gcc to begin warning on signed/unsigned comparisons.
Keeping this patch separate from the actual move allows bisectablity and
generates no additional warnings temporarily.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A follow on commit will move nr to the same union as the immediate
data, so we should assert these invariants before we overwrite the nr
field.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A follow on patch will move the 'nr' field to the union containing the
immediate field, so prepare by checking that we're only testing these
assertions if the .file is correct.
The assertions with != ARF were kind of silly to begin with because the
<128 check is specifically only for things in the GRF.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NIR metadata validation verifies that the debug bit was unset (by a call
to nir_metadata_preserve) if a NIR optimization pass made progress on
the shader. With the expectation that the NIR shader consists of only a
single main function, it has been safe to call nir_metadata_preserve()
iff progress was made.
However, most optimization passes calculate progress per-function and
then return the union of those calculations. In the case that an
optimization pass makes progress only on a subset of the functions in
the shader metadata validation will detect the debug bit is still set on
any unchanged functions resulting in a failed assertion.
This patch offers a quick solution (short of a larger scale refactoring
which I do not wish to undertake as part of this series) that simply
unsets the debug bit on unchanged functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Will be used to communicate that a shader uses 64-bit operations to the
concerned lowering passes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We're going to have multiple functions, so nir_shader_get_entrypoint()
needs to do something a little smarter.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously it assumed that only a single function (the entrypoint)
existed and attempted to lower constant initializers of shader outputs
for each function, for instance.
v2: use mix and findMSB to optimise.
v3: [Sagar] Fix zFrac0 == 0u case in __normalizeRoundAndPackFloat64
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
The following patches will add implementations of various
double-precision operations to this file.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Will be used to convert the .glsl source file containing software fp64
routines to a .h file that can be included while building the compiler.
This commit contains two squashed together: the first from Ian adding
the utility (with the existing title), and the second from Dylan making
the code both python2 and python3 compatible.
This is somewhat modeled after the xxd utility that comes with Vim.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
xxd.py: Make python2 and 3 compatible
This makes use of unicode_literals, so that undecorated strings are
considered text (python2 unicode, python3 str) and not bytes in python2
and text in python3. It makes use of io.open, which provides python2
with python3's open behavior (it's an alias in python3), in particular
support for the 't' and 'b' option. Finally, it decorates all of the
string literals with the 'b' prefix, so that python interprets them as
bytes.
I've removed the stdin and stdout options, as python2 always requires
these to be bytes, but python3 always treats them as text (there is a
way to get at the underlying bytes buffer, but that's even more
complexity), and makes the input files required arguments.
In the meson we use the '@INPUT@' shorthand instead of listing each
input, as meson will expand that to [prog_python, '@INPUT0@', @INPUT1@,
..., @OUTPUT@, ...]
Otherwise we can end up with IR that looks like this:
(
(declare (temporary ) vec4 f@8)
(assign (xyzw) (var_ref f@8) (var_ref f) )
(call f16 ((swiz y (var_ref f@8) )))
(assign (xyzw) (var_ref f) (var_ref f@8) )
))
When we really need:
(declare (temporary ) float inout_tmp)
(assign (x) (var_ref inout_tmp) (swiz y (var_ref f) ))
(call f16 ((var_ref inout_tmp) ))
(assign (y) (var_ref f) (swiz y (swiz xxxx (var_ref inout_tmp) )))
(declare (temporary ) void void_var)
The GLSL IR function inlining code seemed to produce correct code
even without this but we need the correct IR for GLSL IR -> NIR to
be able to understand whats going on.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Based on a patch from Tim Arceri, but I had to substantially rewrite it
as a result of the NIR derefs rework.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are broken on a future platform, but it turns out we don't need
to fix them, since they're just type-converting moves with strided
source. Kill them.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This legalization pass is meant to handle situations where the source
or destination regioning controls of an instruction are unsupported by
the hardware and need to be lowered away into separate instructions.
This should be more reliable and future-proof than the current
approach of handling CHV/BXT restrictions manually all over the
visitor. The same mechanism is leveraged to lower unsupported type
conversions easily, which obsoletes the lower_conversions pass.
v2: Give conditional modifiers the same treatment as predicates for
SEL instructions in lower_dst_modifiers() (Iago). Special-case a
couple of other instructions with inconsistent conditional mod
semantics in lower_dst_modifiers() (Curro).
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Currently the visitor attempts to enforce the regioning restrictions
that apply to double-precision instructions on CHV/BXT at NIR-to-i965
translation time. It is possible though for the copy propagation pass
to violate this restriction if a strided move is propagated into one
of the affected instructions. I've only reproduced this issue on a
future platform but it could affect CHV/BXT too under the right
conditions.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
I triggered this bug while prototyping code for a future platform on
IVB. Could be a problem today though if a strided move is
copy-propagated into a type-converting move with DF destination.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This seems to be a problem in combination with the lower_regioning
pass introduced by a future commit, which can modify a SIMD-split
instruction causing its execution size to become illegal again. A
subsequent call to lower_simd_width() would hit this bug on a future
platform.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Align16 is no longer a thing, so a new implementation is provided
using Align1 instead. Not all possible swizzles can be represented as
a single Align1 region, but some fast paths are provided for
frequently used swizzles that can be represented efficiently in Align1
mode.
Fixes ~90 subgroup quad swap Vulkan CTS tests.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
lower_integer_multiplication() implements 32x32-bit multiplication on
some platforms by bit-casting one of the 32-bit sources into two
16-bit unsigned integer portions. This can give incorrect results if
the original instruction specified a source modifier. Fix it by
emitting an additional MOV instruction implementing the source
modifiers where necessary.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Looks like it is impossible that 'last' variable is a null
because at least the get_vs_prog_data shouldn't return a null pointer.
So this check is unnecessary starts from commit:
99d497c5b6 "anv/pipeline: Replace get_fs_input_map with ..."
This small issue is found by cppcheck.
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This ensures that during encoding, applications can get
the correct status of the surface before submitting
more operations on the same.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Indrajit Das <indrajit-kumar.das@amd.com>
This reverts commit f6a6da8131.
With this commit we see massive amounts of asserts triggering
in lp_fence_wait(), assert(f->issued), for instance with libgl_xlib
state tracker and piglit. Not entirely sure if the assert could
just be removed.
We have found some pipe_surface leaks internally.
This is the same code as surface_destroy in radeonsi.
Ideally, surface_destroy would be in pipe_screen.
Cc: 18.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
With Mesa 18.1, commit be973ed21f, si_llvm_load_input_vs()
changed the number of source 32-bit wide dword components
used for fetching vertex attributes into the vertex shader
from a constant 4 to a variable num_channels number, depending
on input data format, with some special case handling for
input data formats like 64-Bit doubles.
In the case of a GL_DOUBLE input data format with one
or two components though, e.g, submitted via ...
a) glTexCoordPointer(1, GL_DOUBLE, 0, buffer);
b) glTexCoordPointer(2, GL_DOUBLE, 0, buffer);
... the input format would be SI_FIX_FETCH_RG_64_FLOAT,
but no special case handling was implemented for that
case, so in the default path the number of 32-bit
dwords would be set to the number of float input components
derived from info->input_usage_mask. This ends with corrupted
input to the vertex shader, because fetching a 64-bit double
from the vbo requires fetching two 32-bit dwords instead of 1,
and fetching a two double input requires 4 dword fetches
instead of 2, so in these cases the vertex shader receives
incomplete/truncated input data:
a) float v = gl_MultiTexCoord0.x; -> v.x is corrupted.
b) vec2 v = gl_MultiTexCoord0.xy; -> v.x is assigned
correctly, but v.y is corrupted.
This happens with the standard TGSI IR compiled shaders.
Under NIR with R600_DEBUG=nir, we got correct behavior
because the current radeonsi nir code always assigns
info->input_usage_mask = TGSI_WRITEMASK_XYZW, thereby
always fetches 4 dwords regardless of what the shader
actually needs.
Fix this by properly assigning 2 or 4 dword fetches for
one or two component GL_DOUBLE input.
Fixes: be973ed21f ("radeonsi: load the right number of
components for VS inputs and TBOs")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Fixes artifacts in World of Warcraft when Multi-sample Alpha-Test is
enabled with DXVK.
It also fixes artifacts with Fallout 4's god rays with DXVK.
Various piglit interpolateAt*() tests under NIR are also fixed.
v2: formatting fix
update commit message to include Fallout 4 and the Fixes tag
Fixes: f4e499ec79 ('radv: add initial non-conformant radv vulkan driver')
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106595
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
The Vulkan spec 1.1.97 says:
"variablePointers specifies whether the implementation supports
the SPIR-V VariablePointers capability. When this feature is
not enabled, shader modules must not declare the
VariablePointers capability."
As the SPIR-V feature is enabled, we should turn on the
extension feature as well.
All dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.*
pass with the khronos internal repo. Note that a bunch of them
fails with the public repo, but it's expected as they violate the
specification.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
From glibc printf(3):
Z A nonstandard synonym for z that predates the appearance of z.
Do not use in new code.
Z may not exist on non-glibc systems. Prefer the standard symbol.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
If there is no last fence, due to no rendering happening yet, just
create a new signaled fence and return it, to match the expectations of
the EGL sync fence API.
Fixes random "Could not create sync fence 0x3003" assertion failures from
Skia on Android, coming from the following code:
https://android.googlesource.com/platform/frameworks/base/+/master/libs/hwui/pipeline/skia/SkiaOpenGLPipeline.cpp#427
Reproducible especially with thread count >= 4.
One could make the driver always keep the reference to the last fence,
but:
- the driver seems to explicitly destroy the fence whenever a rendering
pass completes and changing that would require a significant functional
change to the code. (Specifically, in lp_scene_end_rasterization().)
- it still wouldn't solve the problem of an EGL sync fence being created
and waited on without any rendering happening at all, which is
also likely to happen with Android code pointed to in the commit.
Therefore, the simple approach of always creating a fence is taken,
similarly to other drivers, such as radeonsi.
Tested with piglit llvmpipe suite with no regressions and following
tests fixed:
egl_khr_fence_sync
conformance
eglclientwaitsynckhr_flag_sync_flush
eglclientwaitsynckhr_nonzero_timeout
eglclientwaitsynckhr_zero_timeout
eglcreatesynckhr_default_attributes
eglgetsyncattribkhr_invalid_attrib
eglgetsyncattribkhr_sync_status
v2:
- remove the useless lp_fence_reference() dance (Nicolai),
- explain why creating the dummy fence is the right approach.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
The binding is checked against the limits later in the function, so we
need to make sure we don't overflow before the check here.
Fixes this valgrind warning (and sometimes segfault):
==1460== Invalid write of size 4
==1460== at 0x74C98DD: ast_declarator_list::hir(exec_list*, _mesa_glsl_parse_state*) (ast_to_hir.cpp:4943)
==1460== by 0x74C054F: _mesa_ast_to_hir(exec_list*, _mesa_glsl_parse_state*) (ast_to_hir.cpp:159)
==1460== by 0x7435C12: _mesa_glsl_compile_shader (glsl_parser_extras.cpp:2130)
in
dEQP-GLES31.functional.debug.negative_coverage.get_error.compute.
exceed_atomic_counters_limit
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
I think this was copy-and-paste mistake -- nir_opt_large_constants was
passing in glsl_get_natural_size_align_bytes() given brw_nir.c's arguments
to the opt pass.
I wanted to reuse this function for handling constant offsets of arrays of
images in V3D.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
V3D returns the texels in a different order in the resulting vec4 from
what GLSL wants, so we need to put in a swizzle. Fixes
dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.base_level.level_1
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This way they can be shared. Build tested with meson, but not too sure
on the autotools stuff though.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Initialize the variable with NULL. Fixes the following
In file included from ../src/compiler/nir/nir_lower_io.c:34:
../src/compiler/nir/nir_lower_io.c: In function ‘nir_lower_explicit_io’:
../src/compiler/nir/nir.h:668:11: warning: ‘addr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
return src;
^~~
../src/compiler/nir/nir_lower_io.c:735:17: note: ‘addr’ was declared here
nir_ssa_def *addr;
^~~~
v2: Avoid using a 'default' case so we get help from the compiler when
new deref types are added. (Lionel)
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
"pad" was missing in Mesa's msm_drm.h. sizeof(drm_msm_gem_info)
remains the same, but now the compiler initializes the field to
zero.
Buffer allocation results in EINVAL without this for me.
Cc: Rob Clark <robdclark@gmail.com>
Cc: Kristian Høgsberg <hoegsberg@gmail.com>
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
the naming is a bit confusing no matter how you look at it. Within SPIR-V
"global" memory is memory accessible from all threads. glsl "global" memory
normally refers to shader thread private memory declared at global scope. As
we already use "shared" for memory shared across all thrads of a work group
the solution where everybody could be happy with is to rename "global" to
"private" and use "global" later for memory usually stored within system
accessible memory (be it VRAM or system RAM if keeping SVM in mind).
glsl "local" memory is memory only accessible within a function, while SPIR-V
"local" memory is memory accessible within the same workgroup.
v2: rename local to function as well
v3: rename vtn_variable_mode_local as well
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This has never functioned and probably wont ever function, due to the
way gallium media state trackers are architected and the tegra video
decoder is architected.
Cc: Thierry Reding <thierry.reding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixes: 1755f608f5
("tegra: Initial support")
The version exported by LLVM in its CMake configuration files can
include the “svn” suffix when building a development version (for
example “8.0.0svn”). However the exported clang headers are still found
under “lib/clang/8.0.0/”, without the “svn” suffix.
Meson takes care of removing the “svn” suffix from the version when
using the dependency’s `version()` method.
This processing is already performed in “configure.ac” when using
autotools.
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
In the following scenario :
1. Create image format R8G8B8A8_UNORM
2. Create image view format R8G8B8A8_SRGB
3. Clear the view through a sub pass to a particular color
4. Barrier on the image to from color attachment to source transfer
5. Copy the image into a linear buffer to check the content
The step 4 resolving the clear color is unaware of the SRGB format of
the view, because the blorp resolve operations operate on images the
color associated with the resolve will not operate on SRGB format but
UNORM. Leading to the wrong color being written into surfaces.
This change forces a clear color resolve at the end of the render pass
so following resolves won't have to deal with the clear color with a
format that doesn't match the image's format.
On gfxbench vulkan_5_normal 1280x720, this appear to cost us ~0.5fps,
from 49.316 down to 48.949.
v2: Only fast clear resolve when image & view have different formats
(Lionel)
v3: Update warning (Jason)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108911
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Fixes following errors from valgrind output:
==23388== Conditional jump or move depends on uninitialised value(s)
==23388== at 0x48B4924: loader_dri3_drawable_init (loader_dri3_helper.c:381)
==23388== by 0x48A97D2: dri3_create_drawable (dri3_glx.c:386)
==23388== by 0x489E190: driFetchDrawable (dri_common.c:369)
==23388== by 0x48A9187: dri3_bind_context (dri3_glx.c:195)
==23388== by 0x488B75C: MakeContextCurrent (glxcurrent.c:220)
==23388== by 0x488B8DB: glXMakeCurrent (glxcurrent.c:267)
==23388== by 0x10A987: ??? (in /usr/bin/glxgears)
==23388== by 0x4BEB412: (below main) (in /usr/lib64/libc-2.28.so)
==23388==
==23388== Conditional jump or move depends on uninitialised value(s)
==23388== at 0x48B5A40: loader_dri3_swap_buffers_msc (loader_dri3_helper.c:923)
==23388== by 0x48A9B7E: dri3_swap_buffers (dri3_glx.c:587)
==23388== by 0x4887A81: glXSwapBuffers (glxcmds.c:857)
==23388== by 0x10ADED: ??? (in /usr/bin/glxgears)
==23388== by 0x4BEB412: (below main) (in /usr/lib64/libc-2.28.so)
Fixes: 2e12fe425f "loader/dri3: Enable adaptive_sync via _VARIABLE_REFRESH property"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
This stores the raster state and calls the correct primconvert interface
using the currently bound raster state.
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For now, it's hidden behind a cap. Hopefully, we can eventually drop
that along with all the manual offset code in spirv_to_nir.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The choice of whether or not we should use block_load/store isn't a
choice between external and not so much as a choice between deref
instructions and manually calculated offsets. In vtn_pointer_from_ssa,
we guard the index+offset case behind vtn_pointer_uses_ssa_offset and
then branch out from there.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Instead of baking in uvec2 for UBO and SSBO pointers and uint for push
constant and shared memory pointers, make it configurable.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Previously, we hard-coded the rule about workgroup variables and the
builder lower_workgroup_access_to_offsets flag. Instead base it on the
handy helper we have for exactly this sort of thing.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Variable pointers being well-defined across the block boundary requires
a couple of very specific SPIR-V validation rules. Normally, we'd trust
the validator to catch these but since CTS tests have been found in the
wild which violate them, we'll carry our own checks.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This new pass is for lowering explicitly laid out memory coming in from
SPIR-V or a similar source. It's quite a bit more complicated than the
normal lower_io because we have to be able to handle matrices. The
way the stride information is stored for matrices is awkward and dealing
with row-major matrices is especially painful.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This commit adds a new num_components value for intrinsic sources of -1
which means that it consumes everything and the number of components
effectively isn't validated. This is useful for deref sources which
just take the result of the deref and we leave it up to the driver to
decide what that size should be.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We added this assert when first moving derefs over to instructions to
ensure that deref chains could go all the way back to the variables.
Now that we're going to start using derefs for things that we can do
variable pointers on such as UBOs and SSBOs, we need to be able to run
derefs through phi nodes, selects, and basically anything else.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We already detect any incomplete deref chains (where the deref is used
for something other than another deref or a load/store) and flag the
variable as used thanks to deref_used_for_not_store. All that's left to
do is to properly skip casts when cleaning up.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This pass is used when, for instance, we lazily change the mode of
variables rather than replacing the variable with a new one. Since we
only do this in cases where we know we have full deref chains, it's ok
to just skip them in fixup_deref_modes.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The code which constructs deref paths already gives you the path
starting at the nearest deref_cast or deref_var. All we need to do for
casts is handle the case where the start of the path isn't a deref_var.
For ptr_as_array derefs, we just bail if we have any after the
divergence point between the two derefs. We may be able to do better in
the future but this works for now.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
When handling casts, we can't blindly propagate the parent of a cast
into a ptr_as_array deref because doing so might loose the stride
information from the cast. Instead, before we can propagate into
ptr_as_array derefs, we need to check that the cast is a cast of an
array deref and that the stride matches. For other types of derefs, we
can continue to propagate casts as normal because they don't need the
stride. We also add an optimization which can combine a ptr_as_array
deref with it parent if it is also an array deref of some form.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
These correspond directly to SPIR-V's OpPtrAccessChain. As such, they
treat whatever their parent gives them as if it's the first element in
some array and dereferences that array. If the parent is, itself, an
array deref, then the two indices can just be added together to get the
final array deref. However, it can also be used in cases where what you
have is a dereference to some random vec2 value somewhere. In this
case, we require a cast before the ptr_as_array and use the ptr_stride
field in the cast to provide a stride for the ptr_as_array derefs.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We're going to want to do more deref optimizations going forward and
this gives us a central place to do them. Also, cast propagation will
get a bit more complicated with the addition of ptr_as_array derefs.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Instead of just storing the decorations in the vtn_type, propagate them
all the way through to the glsl_type. For array strides, this means we
need to handle them earlier so we break array stride handling into it's
own function and explicitly call it for both pointer and array types.
Due to type deduplication in the SPIR-V, we may have explicit layout
decorations on all sorts of types that don't actually want them. In
order to prevent these leaking into unfortunate places in NIR, we
explicitly strip them off before creating NIR variables and when casting
pointers to non-external memory.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
SPIR-V allows for matrix and array types to be decorated with explicit
byte stride decorations and matrix types to be decorated row- or
column-major. This commit adds support to glsl_type to encode this
information. Because this doesn't work nicely with std430 and std140
alignments, we add asserts to ensure that we don't use any of the std430
or std140 layout functions with explicitly laid out types.
In SPIR-V, the layout information for matrices is applied to the parent
struct member instead of to the matrix type itself. However, this is
gets rather clumsy when you're walking derefs trying to compute offsets
because, the moment you hit a matrix, you have to crawl back the deref
chain and find the struct. Instead, we take the same path here as we've
taken in spirv_to_nir and put the decorations on the matrix type itself.
This also subtly adds support for strided vector types. These don't
come up in SPIR-V directly but you can get one as the result of taking a
column from a row-major matrix or a row from a column-major matrix.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This is C++ so we can just poke at the fields of glsl_type if we wish
and calling get_instance is way easier and more reliable than handling
each instance separately. While we're at it, we re-arrange the base
type labels to match the enum order and add 8-bit type support.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
It was added in bce6f99875 even though it's completely redundant with
glsl_array_type().
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Previously, NIR had a single nir_var_uniform mode used for atomic
counters, UBOs, samplers, images, and normal uniforms. This commit
splits this into nir_var_uniform and nir_var_ubo where nir_var_uniform
is still a bit of a catch-all but the nir_var_ubo is specific to UBOs.
While we're at it, we also rename shader_storage to ssbo to follow the
convention.
We need this so that we can distinguish between normal uniforms and UBO
access at the deref level without going all the way back variable and
seeing if it has an interface type.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
I have no idea how shader_storage made it into the list of banned
variable modes for stores but it clearly should be allowed. This only
doesn't cause us a problem today because we never actually use derefs on
shader_storage variables.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This doesn't currently change anything because array indices are
required to be 32 bits and all derefs are also 32 bits. However, we
will one day have 64-bit derefs for OpenCL.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We already had code in link_as_ssa to handle bit sizes; we just need to
use it. While we're at it we clean up link_as_ssa a bit and add an
explicit bit_size parameter in preparation for a day when we have derefs
that aren't 32 bit.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This simplifies our deref handling by emitting the actual NIR deref
instructions on-the-fly instead of of building up a deref chain and then
emitting them at the last moment. In order for this to work with the
parts of the compiler that assume they can chase deref chains, we have
to run nir_rematerialize_derefs_in_use_blocks_impl to put the derefs
back in the right places. Otherwise, in cases such as loop continues
where the SPIR-V blocks are not in the same order as the NIR blocks, we
may end up with a deref chain with a parent that does not dominate it's
child and nir_repair_ssa_impl will insert phis in the deref chain.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The loop through instructions doesn't set the cursor for us so unless we
set it somewhere, we may end up emitting instructions in the wrong
place. The only reason why we haven't been bitten by this in the past
is that it only happens in a few variable pointers cases and the CTS
tests for those don't use much control flow so things were getting
emitted in the correct order by accident.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Note that these limits are exact, not a "precision is at least x",
as texel coords also get snapped to a multiple of this step size
before filtering.
This fixes CTS tests
dEQP-VK.texture.explicit_lod.2d.sizes.31x55_nearest_linear_mipmap_nearest_repeat
dEQP-VK.texture.explicit_lod.2d.sizes.57x35_nearest_linear_mipmap_nearest_repeat
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109151
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
These days, we have two sampler lowering passes. The newer one,
gl_nir_lower_samplers_as_deref, is used by radeonsi. It rewrites
variables to drop structures out of sampler deref chains, to make
life simpler. It then sets var->data.binding for non-bindless
sampler and image variables based on the GL uniform storage's
opaque index values.
The older one converts sampler deref chains (nir_tex_src_texture_deref)
to a numerical offset (nir_tex_src_texture_offset). It also stores the
constant-valued portion of that number in tex->texture_index, making
life really simple for drivers that don't support indirects. It too
pokes at GL uniform storage's opaque index values.
Logically, we can do the first pass (simplify derefs, set bindings)
then the second (turn derefs to offsets, set texture_index). This
patch does exactly that, eliminating some redundancy (only one pass
has to poke at GL uniform storage), and gaining proper var->data.binding
values for drivers using the full lowering.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
We recurse to remove structures, and at each step, re-modify the
resulting type for our link in the deref chain. For arrays, the
result of recursion is the new underlying type - so we wrap it with
the array dimensionality again. For structs, we want to simply use
the new underlying type, skipping the struct altogether.
The correct way to do this is to do nothing at all. Previously, we
had reset type to next->type, which is the /old/ field type, not the
new field type we obtained by recursing. This undid our recursive work.
Fixes about 338 tests with nested structs, such as:
dEQP-GLES2.functional.uniform_api.value.initial.get_uniform.nested_structs_arrays.sampler2D_samplerCube_fragment
Note that currently only radeonsi uses this pass, and NIR support is
disabled there by default, so the breakage was likely not seen by most
people. The next commit uses this pass for more drivers, so this fix
prevents regressions from that change.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
unused and gcc complains about strncpy. (from what I can see because
strncpy does not leave a 0 byte on truncate. That said we don't use
it so this does not fix a real bug).
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We do the ImageFormatProperties check already, and rejecting an usage
flag when both ImageFormatProperties and the WSI (which is Android)
support it is not allowed.
Intel does support storage for some of the support WSI formats, such
as R8G8B8A8_UNORM, and looking at the ISL_SURF_USAGE_DISABLE_AUX_BIT,
the imported images do not have any form of compression that would
prevent this fix.
v2: Also consider STORAGE bit for Gralloc usage bits.
(From Kevin Strasser <kevin.strasser@intel.com>)
Fixes: 053d4c328f "anv: Implement VK_ANDROID_native_buffer (v9)"
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
We started using it in the btoi paths for r32g32b32, and the LLVM IR
checker will complain about it because we end up with intrinsics with
the wrong type extension in the name.
Fixes: 593996bc02 ("radv: implement buffer to image operations for R32G32B32")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Some of the status variables in the compiler are only used in asserts
and thus may be unused in release builds. Annotate them accordingly
to avoid 'unused but set' warnings from the compiler.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Take into account the render target format when checking if the color
mask affects all channels of the RT. This allows to enable full
overwrite in a few cases where a non-alpha format is used.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The functional change here is moving the nir_lower_io_to_scalar_early()
calls inside st_nir_link_shaders() and moving the st_nir_opts() call
after the call to nir_lower_io_arrays_to_elements().
This fixes a bug with the following piglit test due to the current code
not cleaning up dead code after we lower arrays. This was causing an
assert in the new duplicate varyings link time opt introduced in
70be9afccb.
tests/spec/glsl-1.10/execution/vsfs-unused-array-member.shader_test
Moving the nir_lower_io_to_scalar_early() calls also allows us to tidy
up the code a little and merge some loops.
Reviewed-by: Eric Anholt <eric@anholt.net>
Even without any clever optimization on the unpack operations, this gives
us a useful value for the channels read field, which we can use to avoid
ldtmu instructions to the no-op register.
instructions in affected programs: 890712 -> 881974 (-0.98%)
I've been doing this in the nir-to-vir and nir-to-qir backends of v3d and
vc4, but nir could potentially do some useful stuff for us (like avoiding
unpack/repacks) if we give it the information.
v2: Skip lowering for txs/query_levels
v3: Fix a crash on old-style shadow
v4: Rename to tex_packing, use nir_format_unpack_sint/uint helpers, pack
the enum.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In 92eb5bbc68 we attempted to avoid copying clear colors whenever
we weren't doing a resolve. However, this broke MSAA resolves because
we need the clear color in the source. This patch makes blorp much more
conservative such that it only avoids the clear color copy if either
aux_usage == NONE or it's explicitly doing a fast-clear.
Fixes: 92eb5bbc68 "intel/blorp: Only copy clear color when doing..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107728
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We can pull a whole vector in a single indirect load. This saves a bunch
of round-trips to the TMU, instructions for setting up multiple loads,
references to the UBO base in the uniforms, and apparently manages to
reduce register pressure as well.
instructions in affected programs: 3086665 -> 2454967 (-20.47%)
uniforms in affected programs: 919581 -> 721039 (-21.59%)
threads in affected programs: 1710 -> 3420 (100.00%)
spills in affected programs: 596 -> 522 (-12.42%)
fills in affected programs: 680 -> 562 (-17.35%)
Improves 3dmmes performance by 2.29312% +/- 0.139825% (n=5)
In the process of adding support for SSBOs and CS shared vars, I ended up
needing a helper function for doing TMU general ops. This helper can be
that starting point, and saves us a bunch of round-trips to the TMU by
loading a vector at a time.
I noticed that a VS I was debugging was missing all of its output stores
-- outputs_written was for POS, VAR0, VAR3, while the shader's variables
were POS, VAR9, and VAR12. I'm not sure what outputs_written is supposed
to be doing here, but we can just walk the declared variables and avoid
both this bug and the emission of extra stvpms for less-than-vec4
varyings.
When copy_prop_vars also took care of dead write handling, intrin was
used as part of store_to_entry. Now it isn't, so this assignment
isn't used really used. Add a comment clarifying what happens to
intrin.
Fixes: 4dfa7adc10 "nir: Remove handling of dead writes from copy_prop_vars"
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Even with the previous commit, hangs are still happening. The problem
there is that the VF cache invalidate do happen immediately without
waiting for previous rendering to complete. What happens is that we
invalidate the cache the moment the PIPE_CONTROL is parsed but we
still have old rendering in the pipe which continues to pull data into
the cache with the old high address bits. The later rendering with the
new high address bits then doesn't have the clean cache that it
expects/needs.
v2: Update commit message/explanation with Jason's
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: a363bb2cd0 ("i965: Allocate VMA in userspace for full-PPGTT systems.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109072
Hopefully we can kick start the revolution and other distros will start
providing them as well :)
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
No functional change as the socket name is the same,
just removing the double definition of the path.
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>
A 2d-array texture (for example), should get the # of array elements
from box->depth, rather than depth0 which is minified.
Fixes dEQP-GLES3.functional.shaders.texture_functions.texture.sampler2darray_bias_float_fragment
with tiled textures.
Reported-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
If we hit the memcpy() path for copy_region(), that will try to do a
transfer_map(), which goes badly for blits to/from staging triggered
by transfer_map() or transfer_unmap().
We could possibly add fd_blit2() which has allow_transfer_map param,
and call that for staging blits. But I'm not really sure if trying
the blit via copy_region() is very useful. At least for newer gens
that implement fd_context::blit(), it probably isn't.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Switch over to using fd_context::blit(), in the same way that a5xx does.
The previous patch wires fd_resource_copy_region() up to the blitter so
a6xx no longer needs to bypass the core layer to accelerate this.
Signed-off-by: Rob Clark <robdclark@gmail.com>
First step to unify the way fd5 and fd6 blitter works. Currently a6xx
bypasses the blit API in order to also accelerate resource_copy_region()
But this approach can lead to infinite recursion:
#0 fd_alloc_staging (ctx=0x5555936480, rsc=0x7fac485f90, level=0, box=0x7fbab29220) at ../src/gallium/drivers/freedreno/freedreno_resource.c:291
#1 0x0000007fbdebed04 in fd_resource_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/drivers/freedreno/freedreno_resource.c:479
#2 0x0000007fbe5c5068 in u_transfer_helper_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/auxiliary/util/u_transfer_helper.c:243
#3 0x0000007fbde2dcb8 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47c780, src_level=0, src_box_in=0x7fbab2945c) at ../src/gallium/auxiliary/util/u_surface.c:350
#4 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173
#5 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587
#6 0x0000007fbde2f3d0 in util_try_blit_via_copy_region (ctx=0x5555936480, blit=0x7fbab29430) at ../src/gallium/auxiliary/util/u_surface.c:864
#7 0x0000007fbdec02c4 in fd_blit (pctx=0x5555936480, blit_info=0x7fbab29588) at ../src/gallium/drivers/freedreno/freedreno_resource.c:993
#8 0x0000007fbdf08408 in fd6_blit (pctx=0x5555936480, info=0x7fbab29588) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:546
#9 0x0000007fbdebdc74 in do_blit (ctx=0x5555936480, blit=0x7fbab29588, fallback=false) at ../src/gallium/drivers/freedreno/freedreno_resource.c:129
#10 0x0000007fbdebe58c in fd_blit_from_staging (ctx=0x5555936480, trans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:326
#11 0x0000007fbdebea38 in fd_resource_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:416
#12 0x0000007fbe5c5c68 in u_transfer_helper_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/auxiliary/util/u_transfer_helper.c:516
#13 0x0000007fbde2de24 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47b8e0, src_level=0, src_box_in=0x7fbab2997c) at ../src/gallium/auxiliary/util/u_surface.c:376
#14 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173
#15 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587
...
Instead rework the API to push the fallback back to core code, so that
we can rework resource_copy_region() to have it's own fallback path,
and then finally convert fd6 over to work in the same way.
This also makes ctx->blit() optional, and cleans up some unnecessary
callers.
Signed-off-by: Rob Clark <robdclark@gmail.com>
For multi-pass rendering, it is common to keep the same depth buffer
from previous pass, to discard geometry that would be hidden by later
draws. In the later passes with depth-test enabled, but depth-write
disabled, there is no reason to do gmem2mem resolve.
TODO probably do something similar for stencil.. although stencil
buffer isn't used as commonly these days
Signed-off-by: Rob Clark <robdclark@gmail.com>
After trying multiple times to merge if-statements with phis
between them I've come to the conclusion that it cannot be done
without regressions. The problem is for some shaders we end up
with a whole bunch of phis for the merged ifs resulting in
increased register pressure.
So this patch just merges ifs that have no phis between them.
This seems to be consistent with what LLVM does so for radeonsi
we only see a change (although its a large change) in a single
shader.
Shader-db results i965 (SKL):
total instructions in shared programs: 13098176 -> 13098152 (<.01%)
instructions in affected programs: 1326 -> 1302 (-1.81%)
helped: 4
HURT: 0
total cycles in shared programs: 332032989 -> 332037583 (<.01%)
cycles in affected programs: 60665 -> 65259 (7.57%)
helped: 0
HURT: 4
The cycles estimates reported by shader-db for i965 seem inaccurate
as the only difference in the final code is the removal of the
redundent condition evaluations and jumps.
Also the biggest code reduction (~7%) for radeonsi was in a tomb
raider tressfx shader but for some reason this does not get merged
for i965.
Shader-db results radeonsi (VEGA):
Totals from affected shaders:
SGPRS: 232 -> 232 (0.00 %)
VGPRS: 164 -> 164 (0.00 %)
Spilled SGPRs: 59 -> 59 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 14584 -> 13520 (-7.30 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 13 -> 13 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Before, I had per-stage entryoints with some helpers shared between them.
As I extended for compute shaders and shader-db, it turned out that the
other common code in the middle wanted to be shared too.
Loops will be trickier, since we need some analysis to figure out if the
breaks/continues inside are uniform. Until we get that in NIR, this gets
us some quick wins.
total instructions in shared programs: 6192844 -> 6174162 (-0.30%)
instructions in affected programs: 487781 -> 469099 (-3.83%)
I wanted to reuse the comparison stuff for nir_ifs, but for that I just
want the flags and no destination value. Splitting the conditions from
the destinations ended up cleaning the existing code up, anyway.
We'll still fail at draw time, but this avoids a regression in shader-db
execution once I enable TLB writes in precompiles.
Fixes: b38e4d313f ("v3d: Create a state uploader for packing our shaders together.")
Makes debugging easier when we care about the deref chain and not the
deref instruction itself. To make it take a const pointer, constify
some of the static functions in nir_print.c.
Reviewed-by: Eric Anholt <eric@anholt.net>
Nouveau requires rtti. Often LLVM is configured without rtti, and code
with and without cannot be linked safely. Lets just error out if nouveau
is requested and llvm is built without rtti.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109202
Fixes: c5a97d658e
("meson: fix builds against LLVM built without rtti")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The 16-bit polynomial execution doesn't meet Khronos precision requirements.
Also, the half-float denorm range starts at 2^(-14) and with asin taking input
values in the range [0, 1], polynomial approximations can lead to flushing
relatively easy.
An alternative is to use the atan2 formula to compute asin, which is the
reference taken by Khronos to determine precision requirements, but that
ends up generating too many additional instructions when compared to the
polynomial approximation. Specifically, for the Intel case, doing this
adds +41 instructions to the program for each asin/acos call, which looks
like an undesirable trade off.
So for now we take the easy way out and fallback to using the 32-bit
polynomial approximation, which is better (faster) than the 16-bit atan2
implementation and gives us better precision that matches Khronos
requirements.
v2:
- Fallback to 32-bit using recursion (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- use nir_fadd_imm and nir_fmul_imm helpers (Jason)
v3:
- since we need to define one for fsub use it for fdiv too (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- fix huge_val for 16-bit, it was mean't to be 2^14 not 10^14.
v3:
- rebase on top of new bool sized opcodes
- use nir_b2f helper
- use nir_fmul_imm helper
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- use nir_fadd_imm and nir_fmul_imm helpers (Jason)
- rebased on top of new sized boolean opcodes
- use nir_b2f helper
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- use nir_fmul_imm and nir_fadd_imm helpers (Jason)
v3:
- missed one case where we need to replace nir_imm_float
with nir_imm_floatN_t (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This just cleans things up a little and make things more safe for
derefs.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
The following patches will add support for an additional
optimisation so this function will no longer just optimise varying
constants.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will help the new opt introduced in the following patches
allowing us to remove extra duplicate varyings.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The following patch will use this with the radeonsi NIR backend
but I've added it to ac so we can use it with RADV in future.
This is a NIR implementation of the tgsi function
tgsi_scan_tess_ctrl().
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The previous code used a do while loop and continues after walking
a nested loop/if-statement. This means we end up evaluating the
last instruction from the nested block against the while condition
and potentially exit early if it matches the exit condition of the
outer block.
Fixes: 386d165d8d ("tgsi/scan: add a new pass that analyzes tess factor writes")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This just happened not to crash/assert because all loops have at
least 1 if-statement and due to a second bug we end up matching
the same ENDIF to exit both the iteration over the if-statment
and the loop.
The second bug is fixed in the following patch.
Fixes: 386d165d8d ("tgsi/scan: add a new pass that analyzes tess factor writes")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There's no way to tell the 3D engine about swizzling on such textures.
While rendering to NPOT ones may be possible, there's no great way to
expose that in gallium, nor would there be any practical benefit.
Fixes the non-compressed-format "copyteximage 3D" failures. Something
odd going on with the compressed formats.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This caused random failures for two conditional rendering tests:
dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_discard
dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_no_discard
These wrote the predicate with the vertex shader, did a barrier and then
started the conditional rendering. However the cache flushes for the barrier
only happen on first draw, so after the predicate has been read.
Fixes: e45ba51ea4 "radv: add support for VK_EXT_conditional_rendering"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Without this, I get the following error when building the tests with
autotools on i686:
---8<---
src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’:
src/intel/common/gen_clflush.h:37:7: warning: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Wimplicit-function-declaration]
__builtin_ia32_clflush(p);
^~~~~~~~~~~~~~~~~~~~~~
__builtin_ia32_pause
src/intel/common/gen_clflush.h: In function ‘gen_flush_range’:
src/intel/common/gen_clflush.h:45:4: warning: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Wimplicit-function-declaration]
__builtin_ia32_mfence();
^~~~~~~~~~~~~~~~~~~~~
__builtin_ia32_fnclex
---8<---
The erros are generated for each of these files:
- mesa/src/intel/vulkan/tests/state_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool.c
- mesa/src/intel/vulkan/tests/block_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool_free_list_only.c
This is obviously because gen_clflush.h contains code that uses
intrinsics that are only available with SSE3. Since the driver already
uses SSE3, it seems reasonable to add this to the tests as well.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Eric Engeström <eric@engestrom.ch>
Without this, I get the following error when building the tests using
meson on i686:
---8<---
In file included from ../../../mesa/src/intel/vulkan/anv_private.h:46,
from ../../../mesa/src/intel/vulkan/tests/state_pool_no_free.c:26:
../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’:
../../../mesa/src/intel/common/gen_clflush.h:37:7: error: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Werror=implicit-function-declaration]
__builtin_ia32_clflush(p);
^~~~~~~~~~~~~~~~~~~~~~
__builtin_ia32_pause
../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_flush_range’:
../../../mesa/src/intel/common/gen_clflush.h:45:4: error: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Werror=implicit-function-declaration]
__builtin_ia32_mfence();
^~~~~~~~~~~~~~~~~~~~~
__builtin_ia32_fnclex
---8<---
The errors are generated for each of these files:
- mesa/src/intel/vulkan/tests/state_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool.c
- mesa/src/intel/vulkan/tests/block_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool_free_list_only.c
This is obviously because gen_clflush.h contains code that uses
intrinsics that are only available with SSE3. Since the driver already
uses SSE3, it seems reasonable to add this to the tests as well.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engeström <eric@engestrom.ch>
s3tc layouts are a bit finicky - they're packed, but not swizzled.
Adjust logic to allow for that case:
- Don't set a uniform pitch for POT-sized compressed textures
- Adjust define_rect API to be less confused about block sizes
- Only mark a texture as linear if it has a uniform pitch set
This has been tested to fix xonotic (as well as the s3tc-* piglits)
on nv3x and keeps it working on nv4x.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This doesn't matter since all compressed formats supported by this
hardware use square blocks, but best to use the correct helper.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This logic mirrors what we do on nv50. The relatively new
texture_subdata callback can cause this to happen with 3D textures,
which is triggered at least by xonotic, and probably many piglits.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
If the last-active context gets deleted, the pushbuf doesn't have a
bufctx to reference. Then there could be a sequence of binds which would
trigger a reset on that bin before validation was done. Instead we just
pass in the bufctx in question directly.
All other instances of PUSH_RESET happen strictly after a validation is
run.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The whole user_priv thing is a mess, but as long as it's there, it
basically has to map 1:1 to the cur_ctx. Unfortunately we were setting
user_priv to some context, then that context could get deleted without
any draws/validations in it, leading user_priv to become NULL, with
cur_ctx still pointing at some old context. Then we wouldn't run the
switch logic, which in turn led to a NULL bufctx being dereferenced.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
We can just look at the MSF flags -- if they're unset, then we're
definitely in a helper invocation. Fixes
dEQP-GLES31.functional.shaders.helper_invocation.* with GLES3.1 enabled.
This is what the GLSL ES 310 spec tells us to do, but apparently the
"gather mode" flag doesn't imply it in the HW. Fixes
dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.filter_mode.min_nearest_mipmap_linear_mag_linear
The greedy comparison folding in bcsel means that we may have left the
original bool-generating NIR ALU instruction dead, but DCE wasn't
eliminating the VIR code for it because of the flags updates.
total instructions in shared programs: 5186024 -> 5100894 (-1.64%)
instructions in affected programs: 1448695 -> 1363565 (-5.88%)
This allows the original shader-db project's run.c runner to parse things
easily, and is probably a good thing to have for GL_ARB_debug_output in
general. I formatted it more like Intel's so I can mostly reuse their
report script.
I've been using my apitrace-based shader-db so far, but it's slow
(apitrace decompression), intrusive (apitrace windows spamming the
screen), and doesn't have much coverage. The original shader-db provides
a lot more coverage and compiles faster, at the expense of not having the
actual runtime variant key. As v3d has a lot less runtime variation than
vc4 did, this tradeoff makes more sense.
It's better to let most applications make use of adaptive sync
by default. Problematic applications can be placed on the blacklist
or the user can manually disable the feature.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
The DDX driver can be notified of adaptive sync suitability by
flagging the application's window with the _VARIABLE_REFRESH property.
This property is set on the first swap the application performs
when adaptive_sync is set to true in the drirc.
It's performed here instead of when the loader is initialized for
two reasons:
(1) The window's drawable can be missing during loader init.
This can be observed during the Unigine Superposition benchmark.
(2) Adaptive sync will only be enabled closer to when the application
actually begins rendering.
If adaptive_sync is false then the _VARIABLE_REFRESH property
is deleted on loader init.
The property is only managed on the glx DRI3 backend for now. This
should cover most common applications and games on modern hardware.
Vulkan support can be implemented in a similar manner but would likely
require splitting the function out into a common helper function.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Applications that don't present at a predictable rate (ie. not games)
shouldn't have adapative sync enabled. This list covers some of the
common desktop compositors, web browsers and video players.
[ Michel Dänzer: Added entry for firefox-esr ]
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
This option lets the user decide whether mesa should notify the
window manager / DDX driver that the current application is adaptive
sync capable.
It's off by default.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Some programs start with the path and command line arguments in
argv[0] (program_invocation_name). Chromium is an example of
an application using mesa that does this.
This tries to query the real path for the symbolic link /proc/self/exe
to find the program name instead. It only uses the realpath if it
was a prefix of the invocation to avoid breaking wine programs.
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
We were leaking surfaces because the references taken in
etna_set_framebuffer_state weren't being released on context destroy.
Instead of just directly releasing those references in
etna_context_destroy, use the util_copy_framebuffer_state helper.
Take the chance to remove the duplicated buffer references in
compiled_framebuffer_state to avoid confusion.
The leak can be reproduced with a client that continuously creates and
destroys contexts.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reported-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Older versions of virglrenderer before 33da7361aec486290df0aec4ad8dfa8ff6adde2c
in vtest mode, misrender gears.
Fixes: 9d81cd8e7c (virgl: Pass resource size and transfer offsets)
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
I know it's not what anyone wants, but how about we start with a
message in the documentation that encourages people to try meson.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engeström <eric@engestrom.ch>
Note that meson requires python 3, scons requires python 2, and
autotools works with either.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engeström <eric@engestrom.ch>
ATOMFADD is a little special -- make drivers have to specify it
explicitly.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is supported by at least NVIDIA hardware, and exposeable via GL
extensions.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We need a 64-bit value, otherwise we only handle the low 32, and happen to
sign-extend to claim to write all varying slots if VARYING_SLOT_VAR2 was
used.
Fixes: 4d0b2c7aaa ("ttn: Update shader->info as we generate code.")
Reviewed-by: Rob Clark <robdclark@gmail.com>
We've made the choice not to use fast clears on layer > 0 with
multilayer images. This is partly because we would need to store
multiple clear colors for each layer, making the existing memory
layout, already including aux surfaces, fast clear color, image state,
etc... even more complex.
Partial resolves are the operations transfering the clear colors into
the auxiliary buffers. This operation is currently implemented in
Blorp by loading the clear color from the image's BO, into a shader
that then samples from the auxiliary buffer and writes the color only
if it isn't there already.
The problem here is that because we store only one clear color for all
layers and it is used for partial resolves. If you trigger a partial
clear on a layer > 0, then you're likely to deal with a color that is
not what you actually want. In the particular issues below, we have
multiple layers, each cleared with a different color but the partial
resolve just writes the wrong color into the auxiliary buffers for
layers > 0.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108910
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108911
Cc: mesa-stable@lists.freedesktop.org
100 is too small for some games, which triggers recompilations
every frame. Increase to 1024.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
nine_context_box_upload uploads a ram buffer (from src)
to a pipe_resource (dst).
We already have a refcount on the pipe_resource,
what needs to be protected from release is the ram buffer,
thus a reference to src.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Cc: mesa-stable@lists.freedesktop.org
This enables to match the window size
on resize on all cases, as it only works
currently with presentation buffers.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This patch introduces a structure to release the
present_handles only when they are fully released
by the server, thus making
"DestroyD3DWindowBuffer" actually release the buffer
right away when called.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This logic can be re-written as the two cases for 3d (ie. before/after
the miplevel sizes start reducing) vs everything else. I think it is
easier to read this way.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This was a hold-over from the early TGSI days, and mostly not needed
with NIR. This avoids burning an entire 4 consecutive scalar regs
for vec3 outputs, for example. Which fixes a few places that we were
doing worse that we should on register usage.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fixes the following crash that happened after d6110d4d
The problem happens if we first compile a "vanilla" shader with nothing
lowered in NIR, which perform the final lowering passes on so->shader->
nir (including nir_lower_locals_to_regs()), and then later we have
compile a shader with some lowering. The second time through we would
have already done nir_lower_locals_to_regs().
Arguably this was already a bug, just one we hadn't noticed yet.
Fixes: d6110d4d54 intel/compiler: move nir_lower_bool_to_int32 before nir_lower_locals_to_regs
Signed-off-by: Rob Clark <robdclark@gmail.com>
DrawPixels lowering, for example, adds new varyings that need to be
accounted for in inputs_read. The earlier info gathering at link time
cannot account for this.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The glDrawPixels passthrough vertex shader copies position and texcoord
vertex attributes to varying outputs. It also optionally copies a third
gl_Color attribute, which sometimes is unnecessary. Until now, we've
compiled separate variants of the shader, one of which does this extra
copy, and the other of which doesn't. We have done this since 2007.
But, the vertex shader runs for a whopping four vertices, and so the
cost of a copying a single input to output is likely inconsequential.
In theory, we could bind one fewer vertex element - but we always bind
all three regardless. So, we don't even get that savings.
This patch unifies the two, so we always copy the optional color,
and save having to compile the variant. It also makes the VS input
interface match up with the vertex element state without any dead
(unused) input attributes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
So that gitlab will render the < and > correctly allowing the tag to be
copy-n-pasted without additional formatting.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Whenever llvm removes an intrinsic (we're using), we're hitting segfaults
due to llvm doing calls to address 0 in the jitted code instead.
However, Jose figured out we can actually detect this with
LLVMGetIntrinsicID(), so use this to abort, so we don't have to wonder
what got broken. (Of course, someone still needs to fix the code to
no longer use this intrinsic.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This intrinsic disppeared with llvm 6.0, using it ends up in segfaults
(due to llvm issuing call to NULL address in the jited shaders).
Add code doing the same thing as the autoupgrade code in llvm so it
can be matched and replaced back with a pavgb.
While here, also improve lp_test_format, so it tests both with and without
cache (as it was, it tested the cache versions only, whereas cache is
actually disabled in llvmpipe, and in any case even with it enabled
vertex and geometry shaders wouldn't use it). (Although at least for
the unorm8 uncached fetch, the code is still quite different to what
llvmpipe is using, since that would use unorm8x16 type, whereas
the test code is using unorm8x4 type, hence disabling some intrinsic
paths.)
Fixes: 6f4083143b ("gallivm: use llvm jit code for decoding s3tc")
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
This commit adds a number of build combinations:
- Gallium Drivers {SWR, RadeonSI, Others)
Each one has different LLVM requirements. Building SWR alone is twice
as slow as all other drivers combined.
- Gallium ST Clover LLVM {5,6,7}
Because C++ API changes all the time. Analogous to above building
Clover takes as much time as building all other ST combined.
- Gallium ST Others
Nouveau is used, instead of i915g since meson has explicit target
tracking. Meaning that a configure error is thrown if we use i915g
with say va, vdpau or others.
Note: LLVM prior to 5.0 is intentionally dropped. If needed we can add
that later.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
This is the supported way to do this, and should be more robust and
reliable.
v2: [Emil]
- enable backslash escapes
- don't hardcode the path
- pass the argument directly to meson
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The latter is the default these days and Travis will be removing sudo
soonish.
Flipping to xenial, allows us to remove a bunch of hacks we have. Plus
it prevents us from adding new ones, to workaround what seems like a
gcc/binutils bug. For example (from the upcoming meson build):
FAILED: ccache c++ -o src/gallium/targets/pipe-loader/pipe_r600.so ...
... src/util/libmesa_util.a ... /usr/lib/x86_64-linux-gnu/libz.so ...
src/util/libmesa_util.a(disk_cache.c.o): In function `deflate_and_write_to_disk':
_build/../src/util/disk_cache.c:746: undefined reference to `deflateInit_'
_build/../src/util/disk_cache.c:765: undefined reference to `deflate'
...
As we can see, even though libz.so is explicitly passed after the
object that requires it - the linker still fails to see the symbols.
Avoid all those situations - flip the switch.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Seemingly with LLVM7 and GCC 5.0, the former won't properly advertise
-std=c++11 and the latter will choke.
dd this temporary workaround, otherwise we'll get errors like:
In file included from /usr/include/c++/5/type_traits:35:0,
from /usr/lib/llvm-7/include/llvm/Support/type_traits.h:18,
from /usr/lib/llvm-7/include/llvm/ADT/Optional.h:22,
from /usr/lib/llvm-7/include/llvm/ADT/STLExtras.h:20,
from /usr/lib/llvm-7/include/llvm/ADT/StringRef.h:13,
from /usr/lib/llvm-7/include/llvm/Target/TargetMachine.h:17,
from ../../../src/amd/common/ac_llvm_helper.cpp:36:
/usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Swap '..' with the symbolic inc_glx and add glproto as dependency. That
will pull the correct include, effectively fixing the tests on macOS.
Fixes: a47c525f32 ("meson: build glx")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
When producing the final libGL.so/libGLX_mesa.so we only link the local
static helper lib (libglx). Thus there's no reason for the includes.
Fixes: a47c525f32 ("meson: build glx")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The library itself (libGL) is only built when -Dglx=dri, yet it's
accompanying tests are build even with -Dglx=xlib.
Adjust the guards, so we don't build the tests when they are not
applicable
v2:
- Reword commit message (Dylan)
- Drop build_by_default hunk (Dylan)
Fixes: a47c525f32 ("meson: build glx")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The gallium drivers do not require a DRI loader. Drop the artificial
and unnecessary restriction.
Fixes: af9d276134 ("meson: build libmesa_gallium")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Currently our is_sha_nomination does:
- folds any whitespace, attempting to extract sha-like information
- checks that at least one of the shas has landed
Split it in two and do sha-like validation first.
This way, commits with mesa-stable and sha nominations will feature the
fixes/revert/etc instead of stable (a) or will be omitted if not
applicable for the respective branch (b).
Misc examples from 18.3
(a)
-[ stable ] 5bc509363b glx: make xf86vidmode mandatory for direct rendering
+[ fixes ] 5bc509363b glx: make xf86vidmode mandatory for direct rendering
(b)
-[ stable ] 9a7b319903 anv/query: flush render target before copying results
CC: Juan A. Suarez <jasuarez@igalia.com>
CC: Dylan Baker <dylan@pnwbakers.com>
CC: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
This lets the driver use pipe_debug_message() for GL_ARB_debug_output.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This lets the driver use pipe_debug_message() for GL_ARB_debug_output.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
i915 render nodes refuse the dumb ioctls, so the simulator would crash on
the original non-apitrace shader-db. Replace them with direct i915 calls
if we detect that we're on one of their gem fds.
Because of the many caveats involved, using -Dc_args instead of CFLAGS
is recommended both by meson upstream and by us.
v2: - Fix typo
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
We don't ever want to do the fmask lookup on a atomic or
store, the fmask should have been decompressed if the
surface has been moved to IMAGE_LAYOUT.
Original patch by Dave Airlie.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes GPU hangs on GFX9 with
dEQP-VK.memory.external_memory_host.bind_image_memory_and_render.with_zero_offset.*
Copied from RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Exactly what title says, the new addrlib does not allow the above with
certain dimensions that the CTS seems to hit. Work around it by not
allowing the app to render to it via compat with other 128bpp formats
and do not render to it ourselves during copies.
Fixes: 776b911365 "amd/addrlib: update Mesa's copy of addrlib"
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The driver needs to decompress all image layers if a fast
depth/color clear has been performed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This workaround has been introduced by 135e4d434f for fixing
DXVK GPU hangs with many games. It is no longer needed since
LLVM r345718.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The former expects to see SSA-only things, but the latter injects registers.
The assertions in the lowering where not seeing this because they asserted
on the bit_size values only, not on the is_ssa field, so add that assertion
too.
Fixes: 11dc130779 "nir: Add a bool to int32 lowering pass"
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
A long time ago, when this was first implemented, not having a sampler
bound would cause problems on Fermi. I didn't work out the reasons, but
the solution was simple -- just put the samplers back in.
Since then, regular texturing paths appear to have lost their associated
samplers which required a fuller investigation and fix in nouveau. Now
that this is done, this code should no longer need a sampler state for
fetching texels from a buffer texture.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is (much) faster than using the util fallback.
(Note that there's two methods here, one would use a cache, similar to
the existing code (although the cache was disabled), except the block
decode is done with jit code, the other directly decodes the required
pixels. For now don't use the cache (being direct-mapped is suboptimal,
but it's difficult to come up with something better which doesn't have
too much overhead.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
An earlier patch that introduced the function failed to handle the case
where an image format layout qualifier is not specified, which is allowed
on desktop GL profiles. In these cases, nir_variable's image format is
GL_NONE, and we don't need to print a debug message for those.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Across several projects I've seen new contributors say "I wasn't sure if I
should provide a review tag since I'm not really an expert in this area."
Everyone I know already applies some implicit weighting to reviews from
different people, so encourage participation.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This calls the expensive uif offset function once per utile, but it still
gets us a 212.218% +/- 2.41216% (n=10) win on 1024x1024 glTexImage over
calling it on each pixel.
This lets us store the non-PBO glTexImage data directly into the tiled
image without making an extra untiled memcpy for the gallium transfer.
Improves 1024x1024 TexImage perf by ~19%, mostly from not thrashing around
in the kernel mapping and unmapping the transfer's temporary area.
We're waiting for the jobs-completed count to increment (with wrapping),
not to reach its starting state. This mostly ended up working out because
the next v3d_hw_tick() for a submit CL would end up doing the TFU
operation first, but it did fail when a blit was used for glReadPixels()
at the end of a test.
Fixes: ee0549ff9a ("v3d: Add the V3D TFU submit interface to the simulator.")
In the UAPI, the first BO is the destination, and the one the kernel
should do an exclusive reservation on. Currently we only do exclusive
reservations, anyway. However, in the simulator path I was only copying
back the "destination" BO (actually src in this case), and this caused
regressions once I fixed the simulator to actually complete TFU before
returning (since otherwise, the TFU op would happen at the start of the
next CL submit and the draw would get the right contents).
Fixes: 976ea90bdc ("v3d: Add support for using the TFU to do some blits.")
When copy propagation handles a store/copy, it iterates the current
copy entries to remove aliases, but keeps the "equal" entry (if
exists) to be updated.
The removal step may swap the entries around (to ensure there are no
holes), invalidating previous iteration pointers. The bug was saving
such pointer to use later. Change the code to first perform the
removals and then find the remaining right entry.
This was causing updates to be lost since they were being made to an
entry that was not part of the current copies.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108624
Fixes: b3c6146925 "nir: Copy propagation between blocks"
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes build failure if the LLVM headers aren't in a standard include
directory.
Fixes: ec22dd34c8 "radeonsi: move SI_FORCE_FAMILY functionality to
winsys"
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When updating a copy entry source value from a "non-SSA" (the data
come from a copy instruction) to a "SSA" (the data or parts of it come
from SSA values), it was possible to hold invalid data in ssa[0]
depending on the writemask. Because the union, ssa[0] could contain a
pointer to a nir_deref_instr left-over from previous non-SSA usage.
Change code to clean up the array before use to avoid invalid data
around.
Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously, we ignored the the glUnmap(..) operation and
flushed before we flush the cbuf. Now, let's just flush
the data when we unmap.
Neither method is optimal, for example:
glMapBufferRange(.., 0, 100, GL_MAP_FLUSH_EXPLICIT_BIT)
glFlushMappedBufferRange(.., 25, 30)
glFlushMappedBufferRange(.., 65, 70)
We'll end up flushing 25 --> 70. Maybe we can fix this later.
v2: Add fixme comment in the code (Elie)
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
With commit 89b479, we moved to tracking buffer cleanliness
when binding.
TEST=dEQP-GLES31.functional.image_load_store.buffer.load_store.r32ui
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Remove a level of indirection to make the code more explicit -- should
make it easier to follow what's going on.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is a move towards using composition instead of inheritance for
different query types.
This change weakens out-of-memory error reporting somewhat, though this
should be acceptable since we didn't consistently report such errors in
the first place.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Order-aware scan/reduce can trade-off LDS traffic for external atomics
memory traffic in producer/consumer compute shaders.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The following race condition could occur in the no-timeout case:
API thread Gallium thread Watchdog
---------- -------------- --------
dd_before_draw
u_threaded_context draw
dd_after_draw
add to dctx->records
signal watchdog
dump & destroy record
execute draw
dd_after_draw_async
use-after-free!
Alternatively, the same scenario would assert in a debug build when
destroying the record because record->driver_finished has not signaled.
Fix this and simplify the logic at the same time by
- handing the record pointers off to the watchdog thread *before* each
draw call and
- waiting on the driver_finished fence in the watchdog thread
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fulfills a requirement for clients that want to utilize same
code path for images with external formats (VK_FORMAT_UNDEFINED) and
"regular" RGBA images where format is known. This is similar to how
OES_EGL_image_external works.
To support this, we allow color conversion samplers for non-YUV
formats but skip setting up conversion when format does not have
can_ycbcr flag set.
v2: add comment and bundle can_ycbcr to the existing break
condition (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
If a conversion struct was passed, then initialize view using
format from the conversion structure.
v2: use vk_format directly from the anv_format struct
v3: added some assertions (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
If external format is used, we store the external format identifier in
conversion to be used later when creating VkImageView.
v2: rebase to b43f955037 changes
v3: added assert, ignore components when creating external
format conversion (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Since we don't know the exact format at creation time, some initialization
is done only when bound with memory in vkBindImageMemory.
v2: demand dedicated allocation in vkGetImageMemoryRequirements2 if
image has external format
v3: refactor prepare_ahw_image, support vkBindImageMemory2,
calculate stride correctly for rgb(x) surfaces, rename as
'resolve_ahw_image'
v4: rebase to b43f955037 changes
v5: add some assertions to verify input correctness (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2: have separate memory properties for android, set usage
flags for buffers correctly
v3: code cleanup (Jason)
+ limit maxArrayLayers to 1 for AHardwareBuffer based images
v4: rebase to b43f955037 changes
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2: add support for non-image buffers (AHARDWAREBUFFER_FORMAT_BLOB)
v3: properly handle usage bits when creating from image
v4: refactor, code cleanup (Jason)
v5: rebase to b43f955037 changes,
initialize bo flags as ANV_BO_EXTERNAL (Lionel)
v6: add assert that anv_bo_cache_import succeeds, add comment
about multi-bo support to clarify current implementation (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This makes it cleaner to introduce more cases where we import memory
from different types of external memory buffers.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Use the anv_format address in formats table as implementation-defined
external format identifier for now. When adding YUV format support this
might need to change.
v2: code cleanup (Jason)
v3: set anv_format address as identifier
v4: setup suggestedYcbcrModel and suggested[X|Y]ChromaOffset
as expected for HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL
v5: set linear tiling for GPU_DATA_BUFFER usage, add comment
about multi-bo support to clarify current implementation (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2: handle R8G8B8X8 as R8G8B8_UNORM (Jason)
v3: add HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL, we make it define
for now to avoid direct dependency to minigbm headers
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This will make it possible for next patch to rip
anv_image_create_info out from make_surface function.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The quotation marks around 1.0 cause it to be treated as a string
instead of a floating point value. The generator then treats it as an
arbitrary variable replacement, so any iand involving a ('ineg', ('b2i',
a)) matches.
v2: Remove misleading comment about sized literals (suggested by
Timothy). Add assertion that the name of a varible is entierly
alphabetic (suggested by Jason).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com> [v1]
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> [v1]
Fixes: 6bcd2af086 ("nir/algebraic: Add some optimizations for D3D-style Booleans")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109075
Tested on gen9.
v2: Rename lower_txd_3d_surafaces flag to lower_txd_3d (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Note that we have an official GL extension number, pick the appropriate
section of the GLX spec to modify, and add changelog.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
This has not even had an attempt at implementation. If you asked for
renderer 0 - which, the spec implies, should always work - then
dri2_convert_glx_attribs would fail, we'd silently fall back to creating
an indirect context, and xserver would also not recognize the attribute
and would throw BadValue at you.
The API would be difficult to use in any case, since there's no way to
enumerate how many renderers the screen has. I'd be tempted to add that
by defining:
glXQueryRendererIntegerMESA(dpy, screen,
/* renderer = */ -1,
0, &value);
to return the number of renderers, but a new entrypoint might be
cleaner. Still, better to not specify it at all than to lie about it.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
In one place we say, if GLES isn't supported then the profile version
will be 0.0. Then later we say, if the GLES profile extension isn't
supported then GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA is not
mentioned in the spec. A strict reading of the latter would mean that
GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA is not a recognized token,
and the query should instead return False.
The implementation does not check for the GLES profile extensions, and
the additional complexity doesn't seem worth it. Removing the
interaction text makes the spec match the implementation.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
emit_intrinsic_store_image() is always using 4 components when
collecting registers for the value. When image has less than
4 components (e.g, r32f, rg32i, etc) this results in extra mov
instructions.
This patch uses the actual number of components from the image format.
For example, in a shader like:
layout (r32f, binding=0) writeonly uniform imageBuffer u_image;
...
void main(void) {
...
imageStore (u_image, some_offset, vec4(1.0));
...
}
instruction count is reduced in at least 3 instructions (note image
format is r32f, 1 component only).
This obviously reduces register pressure as well.
v2: - Added support for image formats from NV_image_format extension
(Ilia Mirkin).
- Return 4 components by default instead of asserting. (Rob Clark).
v3: Added more missing formats (Ilia Mirkin).
v4: Added a debug message for unknown image formats (Rob Clark).
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Instead of going all the way back to the variable, just look at the
deref. The modes are guaranteed to be the same by nir_validate whenever
the variable can be found. This fixes clear_unused_for_modes for
derefs that don't have an accessible variable.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Instead of going all the way back to the variable, just look at the
deref. The modes are guaranteed to be the same by nir_validate whenever
the variable can be found. This fixes apply_barrier_for_modes for
derefs that don't have an accessible variable.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
If we can't find the variable from the deref, just assume it isn't
invariant and continue on. This can happen if, for instance, we're
writing to a deref that points into an SSBO.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Each time I have to touch the buffer import/export functions in the dri
state tracker I get lost in the maze of functions converting between
DRI_IMAGE_FOURCC, DRI_IMAGE_FORMAT, DRI_IMAGE_COMPONENTS and pipe format.
Rip it out and replace by a single table, which defines the correspondence
between the different representations.
Also this now stores all the known representations in the __DRIimageRec,
to avoid the loss of information we currently have when importing a buffer
with a fourcc, which doesn't have a corresponding dri format.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Currently all the EGL APIs are missing a way to specify how an imported
dma-buf is intended to be used. Demanding the format to be both usable
for sampling and rendering artificially restricts the list of formats a
driver is able to import.
Looking at how the Intel driver implements those DRI2 image APIs it
doesn't distinguish between render or sampler compatible formats. So
this patch aligns behavior between Intel and Gallium based drivers.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There is no need to do the detour over the resource behind the
surface to get the format. Use the surface format directly.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Old versions of meson returned ppc64le as the cpu_family for little
endian power8 cpus, versions >=0.48 don't do this, so the check wouldn't
work in that case. This generalizes the check to work for both old and
new versions of meson.
Fixes: 34bbb24ce7
("meson: Add support for ppc assembly/optimizations")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Gen9-10 have fewer than 4 subslices per slice, so they need this to be
rounded up. Gen11 isn't documented as needing this hack, and it can
also have more than 4 subslices, so the hack actually can break things.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive. On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.
This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).
v2: Remove stray #if block. Noticed by Thomas.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
That flow control may be trying to avoid invalid loads. On at least
some platforms, those loads can also be expensive.
No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").
v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested
by Rob. See also the big comment in src/intel/compiler/brw_nir.c.
v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).
v4: Fix inverted condition in brw_nir.c. Noticed by Lionel.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
If there is a CMP.NZ that compares a single component (via a .zzzz
swizzle, for example) with 0, it can propagate its conditional modifier
back to a previous CMP that writes only that component. The specific
case that I saw was:
cmp.l.f0(8) g42<1>.xF g61<4>.xF (abs)g18<4>.zF
...
cmp.nz.f0(8) null<1>D g42<4>.xD 0D
In this case we can just delete the second CMP.
No changes on Broadwell or Skylake because they do not use the vec4
backend. Also no changes on GM45 or Iron Lake.
Sandy Bridge, Ivy Bridge, and Haswell had similar results. (Sandy Bridge shown)
total instructions in shared programs: 10856676 -> 10852569 (-0.04%)
instructions in affected programs: 228322 -> 224215 (-1.80%)
helped: 1331
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 3.09 x̃: 4
helped stats (rel) min: 0.11% max: 6.67% x̄: 1.88% x̃: 1.83%
95% mean confidence interval for instructions value: -3.19 -2.99
95% mean confidence interval for instructions %-change: -1.93% -1.83%
Instructions are helped.
total cycles in shared programs: 154788865 -> 154732047 (-0.04%)
cycles in affected programs: 2485892 -> 2429074 (-2.29%)
helped: 1097
HURT: 59
helped stats (abs) min: 2 max: 168 x̄: 51.96 x̃: 64
helped stats (rel) min: 0.12% max: 12.70% x̄: 3.44% x̃: 2.22%
HURT stats (abs) min: 2 max: 16 x̄: 3.02 x̃: 2
HURT stats (rel) min: 0.18% max: 0.83% x̄: 0.64% x̃: 0.71%
95% mean confidence interval for cycles value: -51.04 -47.26
95% mean confidence interval for cycles %-change: -3.40% -3.07%
Cycles are helped.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The (-abs(x) >= 0) => (x == 0) optimization is removed from the vec4 and
scalar parts. In the VS part, adding the new pattern was not
helpful. The pattern that is removed is really old, and it has been
handled by NIR for ages.
All Gen7+ platforms had similar results. (Broadwell shown)
total instructions in shared programs: 14715715 -> 14715709 (<.01%)
instructions in affected programs: 474 -> 468 (-1.27%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.12% max: 1.35% x̄: 1.28% x̃: 1.35%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.40% -1.15%
Instructions are helped.
total cycles in shared programs: 559569911 -> 559569809 (<.01%)
cycles in affected programs: 5963 -> 5861 (-1.71%)
helped: 6
HURT: 0
helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17
helped stats (rel) min: 1.45% max: 1.88% x̄: 1.73% x̃: 1.85%
95% mean confidence interval for cycles value: -18.15 -15.85
95% mean confidence interval for cycles %-change: -1.95% -1.51%
Cycles are helped.
Iron Lake and Sandy Bridge had similar results. (Iron Lake shown)
total instructions in shared programs: 7780915 -> 7780913 (<.01%)
instructions in affected programs: 246 -> 244 (-0.81%)
helped: 2
HURT: 0
total cycles in shared programs: 177876108 -> 177876106 (<.01%)
cycles in affected programs: 3636 -> 3634 (-0.06%)
helped: 1
HURT: 0
GM45
total instructions in shared programs: 4799152 -> 4799151 (<.01%)
instructions in affected programs: 126 -> 125 (-0.79%)
helped: 1
HURT: 0
total cycles in shared programs: 122052654 -> 122052652 (<.01%)
cycles in affected programs: 3640 -> 3638 (-0.05%)
helped: 1
HURT: 0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In an instruction sequence like
cmp(8).ge.f0.0 vgrf17:D, vgrf2.xxxx:D, vgrf9.xxxx:D
(+f0.0) sel(8) vgrf1:UD, vgrf8.xyzw:UD, vgrf1.xyzw:UD
The other fields of vgrf17 may be unused, but the CMP still needs to
generate the other flag bits.
To my surprise, nothing in shader-db or any test suite appears to hit
this. However, I have a change to brw_vec4_cmod_propagation that
creates cases where this can happen. This fix prevents a couple dozen
regressions in that patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 5df88c20 ("i965/vec4: Rewrite dead code elimination to use live in/out.")
We were not using the view mask for depth clears, causing only the
first view to be cleared.
Fixes: 2e86f6b259 "radv: Add multiview clears."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The switch directly after the check has a default case that returns
NULL too, so the effective return value is not changed. Also this
check is wrong once we start dealing with formats introduced by an
extension (e.g. YUV formats).
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
I botched some copy-and-paste and clamped to signed int max instead of
uint max. Fixes KHR-GL46.shader_image_load_store.multiple-uniforms on
skl.
Fixes: d3e046e76c ("nir: Pull some of intel's image load/store format
conversion to nir_format.h")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I thought the value was correctly propagated, but actually not.
Fixes: 2ac6d55f38 ("radv: bump reported version to 1.1.90")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
nir_sweep already marks all metadata invalid, so it is safe to release
the memory here too.
mean soft fp64 using uint64: 1,342,759,331 => 1,010,670,475
gfxbench5 aztec ruins high 11: 63,555,571 => 61,889,811
deus ex mankind divided 148: 62,845,304 => 62,829,640
deus ex mankind divided 2890: 71,922,686 => 71,922,686
dirt showdown 676: 69,238,607 => 69,238,607
dolphin ubershaders 210: 77,822,072 => 77,822,072
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Found using pahole.
Changes in peak memory usage according to Valgrind massif:
mean soft fp64 using uint64: 1,343,991,403 => 1,342,759,331
gfxbench5 aztec ruins high 11: 63,619,971 => 63,555,571
deus ex mankind divided 148: 62,887,728 => 62,845,304
deus ex mankind divided 2890: 72,399,750 => 71,922,686
dirt showdown 676: 69,464,023 => 69,238,607
dolphin ubershaders 210: 78,359,728 => 77,822,072
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Replace the old array in each value with a hash table in each value.
Changes in peak memory usage according to Valgrind massif:
mean soft fp64 using uint64: 5,499,875,082 => 1,343,991,403
gfxbench5 aztec ruins high 11: 63,619,971 => 63,619,971
deus ex mankind divided 148: 62,887,728 => 62,887,728
deus ex mankind divided 2890: 72,402,222 => 72,399,750
dirt showdown 676: 74,466,431 => 69,464,023
dolphin ubershaders 210: 109,630,376 => 78,359,728
Run-time change for a full run on shader-db on my Haswell desktop (with
-march=native) is 1.22245% +/- 0.463879% (n=11). This is about +2.9
seconds on a 237 second run. The first time I sent this version of this
patch out, the run-time data was quite different. I had misconfigured
the script that ran the test, and none of the tests from higher GLSL
versions were run. These are generally more complex shaders, and they
are more affected by this change.
The previous version of this patch used a single hash table for the
whole phi builder. The mapping was from [value, block] -> def, so a
separate allocation was needed for each [value, block] tuple. There was
quite a bit of per-allocation overhead (due to ralloc), so the patch was
followed by a patch that added the use of the slab allocator. The
results of those two patches was not quite as good:
mean soft fp64 using uint64: 5,499,875,082 => 1,343,991,403
gfxbench5 aztec ruins high 11: 63,619,971 => 63,619,971
deus ex mankind divided 148: 62,887,728 => 62,887,728
deus ex mankind divided 2890: 72,402,222 => 72,402,222 *
dirt showdown 676: 74,466,431 => 72,443,591 *
dolphin ubershaders 210: 109,630,376 => 81,034,320 *
The * denote tests that are better now. In the tests that are the same
in both patches, the "after" peak memory usage was at a different
location. I did not check the local peaks.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
D3D Booleans use a 32-bit 0/-1 representation. Because this previously
matched NIR exactly, we didn't have to really optimize for it. Now that
we have 1-bit Booleans, we need some specific optimizations to chew
through the D3D12-style Booleans.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15136811 -> 14967944 (-1.12%)
instructions in affected programs: 2457021 -> 2288154 (-6.87%)
helped: 8318
HURT: 10
total cycles in shared programs: 373544524 -> 359701825 (-3.71%)
cycles in affected programs: 151029683 -> 137186984 (-9.17%)
helped: 7749
HURT: 682
total loops in shared programs: 4431 -> 4399 (-0.72%)
loops in affected programs: 32 -> 0
helped: 21
HURT: 0
total spills in shared programs: 10290 -> 10051 (-2.32%)
spills in affected programs: 2532 -> 2293 (-9.44%)
helped: 18
HURT: 18
total fills in shared programs: 22203 -> 21732 (-2.12%)
fills in affected programs: 3319 -> 2848 (-14.19%)
helped: 18
HURT: 18
Note that a large chunk of the improvement fixing regressions caused by
switching to 1-bit Booleans. Previously, our ability to optimize D3D
booleans was improved by using the D3D representation directly in NIR.
Now that NIR does 1-bit bools, we need a few more optimizations.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is a squash of a few distinct changes:
glsl,spirv: Generate 1-bit Booleans
Revert "Use 32-bit opcodes in the NIR producers and optimizations"
Revert "nir/builder: Generate 32-bit bool opcodes transparently"
nir/builder: Generate 1-bit Booleans in nir_build_imm_bool
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We also have to add support for 1-bit integers while we're here so we
get 1-bit variants of iand, ior, and inot.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This commit adds support for 1-bit Booleans and integers. Booleans
obviously take a value of true or false. Because we have to define the
semantics of 1-bit signed and unsigned integers, we define uint1_t to
take values of 0 and 1 and int1_t to take values of 0 and -1. 1-bit
arithmetic is then well-defined in the usual way, just with fewer bits.
The definition of int1_t and uint1_t doesn't usually matter but we do
need something for purposes of constant folding.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This commit contains three related changes. First, we define boolN_t
for N = 8, 16, and 64 and move the definition of boolN_vec to the loop
with the other vec definitions. Second, there's no reason why we need
the != 0 on the source because that happens implicitly when it's
converted to bool. Third, for destinations, we use a signed integer
type and just do -(int)bool_val which will give us the 0/-1 behavior we
want and neatly scales to all bit widths.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is a squash of a bunch of individual changes:
nir/builder: Generate 32-bit bool opcodes transparently
nir/algebraic: Remap Boolean opcodes to the 32-bit variant
Use 32-bit opcodes in the NIR producers and optimizations
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Use 32-bit opcodes in the NIR back-ends
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This was originally added for the out-of-tree Mali driver but I think
we've all agreed it's easy enough for them to just do in their back-end.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Shader-db results on Kaby Lake:
total instructions in shared programs: 15072525 -> 15072525 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
This helps prevent regressions in later commits.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Instead of looking at input_sizes[i] which contains the number of
components for each source, we look at the bit size of input_types[i].
This fixes a regression in the 1-bit boolean series though I have no
idea how we haven't seen it before now.
Fixes: 35baee5dce "nir/constant_folding: fix incorrect bit-size check"
Fixes: 9076c4e289 "nir: update opcode definitions for different bit sizes"
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The previous code was creating a boolean by doing an arithmetic right-
shift by 31 which produces a boolean which is true if the argument is
negative. This is the same as the expression r < 0 which is much
simpler and doesn't depend on NIR's representation of booleans.
Reviewed-by: Eric Anholt <eric@anholt.net>
Sadly, the GLX_USE_APPLEGL and GLX_USE_WINDOWSGL cases are not identical
(because GLX_USE_WINDOWSGL uses vtables rather than a maze of ifdefs)
Include <sys/time.h> again, as functions prototyped by it are used in
the GLX_USE_WINDOWSGL path.
Make the include guard around the __glxGetMscRate() definition match the
one at it's declaration again, as it's referenced from dri_common.c
which is built for GLX_USE_WINDOWSGL.
Fixes: a95ec138 ("glx: mandate xf86vidmode only for "drm" dri platforms")
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
These have all been floating in my head, and while I've thought about
encoding them in issues on gitlab once they're enabled, they also make
sense to just have in the area of the code you'll need to work in.
Follows 3954331aff ("vc4: Pull uinfo->data[i] dereference out to the top
of the loop.") which showed a large performance win for vc4, but also
cleans up the code a decent bit.
After generating VIR, we leave c->cursor pointing at the end of the
shader. If the shader had dead code at the end (for example from preamble
instructions in a shader with no side effects), we would assertion fail
that we were leaving the cursor pointing at freed memory. Since anything
following DCE should be setting up a new cursor anyway, just clear the
cursor at the start.
In trying to enable compute shaders, I found that a bunch of deqp-gles31's
compute stuff wanted to interact with indirect dispatch. This was easy to
do on its own.
The thrsw will invalidate rtop, just like accumulators and flags. Caught
by simulator assertions in CS imulextended/umulextended tests.
Fixes: 90269ba353 ("broadcom/vc5: Use THRSW to enable multi-threaded shaders.")
Just like vc4, we have to support linear shared BOs for X11 on arbitrary
displays. When we're faced with a request to texture from one of those,
make a shadow image that we copy using the TFU at the start of the draw
call.
generatemipmap is just filling out the rest of the mipmap that's already
been written (by a mapping or a draw call), so it didn't matter. As I
reuse the TFU code for linear-to-UIF conversions, it'll start mattering.
Otherwise we may race to read old contents. This didn't show up in the
CTS and piglit for me, but it did once I started using the TFU to do
linear->UIF blits for X11.
Fixes: 2ebca177dc ("v3d: Use the TFU to do generatemipmap.")
Same as on nv50, the TXF op always uses the TSC bound to slot 0,
returning blank values if nothing is bound.
An earlier change arranges for the TSC entries list to always have valid
data at entry 0, so here we just make use of it.
Fixes arb_texture_buffer_object-subdata-sync among others.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This was used for implementing FBFETCH. However that uses TXF, which
doesn't do much with a TSC. The only important bit is that sRGB-decoding
works as expected, which we can achieve since all samplers we ever
generate enable sRGB-decoding. Always point to entry 0 in the TSC table,
and ensure that even before it ever gets initialized, the sRGB-decoding
enable bit is set.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
For older gen's fd_wfi() is used to conditionally insert a WFI if there
hasn't already been one since last draw. But this doesn't work out well
with stateobj since the order the stateobj is evaluated might not be
what you expect. (Ie. stateobj might not be evaluated until a later
draw if there is no geometry from the current draw in a given tile.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Gen9 hardware requires some workarounds to disable preemption depending
on the type of primitive being emitted.
We implement this by adding a function that checks the primitive type
and number of instances right before the 3DPRIMITIVE.
For now, we just ignore blorp. The only primitive it emits is
3DPRIM_RECTLIST, and since it's not listed in the workarounds, we can
safely leave preemption enabled when it happens. Or it will be disabled
by a previous 3DPRIMITIVE, which should be fine too.
v3:
- Apply missing workarounds for instanced rendering and line loop (Ken)
- Move workaround code to brw_draw_single_prim()
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Set bit when initializing context.
v3:
- Always toggle preemption bool to false before enabling it for the
first time, so the state gets emitted (Chris Wilson).
- Emit end of pipe sync with PIPE_CONTROL_RENDER_TARGET_FLUSH (Ken)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now everything with type 'struct slab_child_pool *' is name pool, and
everything with type 'struct slab_mempool *' is named mempool.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
There is no need to have an extra ctx paramter as all the other
parameters carry all the needed information.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
When we first started using genxml, we decided to represent MOCS as an
actual structure, and pack values. However, in many places, it was more
convenient to use a numeric value rather than treating it as a struct,
so we added secondary setters in a bunch of places as well.
We were not entirely consistent, either. Some places only had one.
Gen6 had both kinds of setters for STATE_BASE_ADDRESS, but newer gens
only had the struct-based setters. The names were sometimes "Constant
Buffer Object Control State" instead of "Memory", making it harder to
find. Many had prefixes like "Vertex Buffer MOCS"...in a vertex buffer
packet...which is a bit redundant.
On modern hardware, MOCS is simply an index into a table, but we were
still carrying around the structure with an "Index to MOCS Table" field,
in addition to the direct numeric setters. This is clunky - we really
just want a number on new hardware.
This patch eliminates the struct-based setters, and makes the numeric
setters be consistently called "MOCS". We leave the struct definition
around on Gen7-8 for reference purposes, but it is unused.
v2: Drop bonus "Depth Buffer MOCS" fields on Gen7.5 and Gen9
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
The pass did not correctly handle loops ending in:
if ssa_7 {
block block_8:
/* preds: block_7 */
continue
/* succs: block_1 */
} else {
block block_9:
/* preds: block_7 */
break
/* succs: block_11 */
}
The break will get eliminated by another opt but if this pass gets
called first (as it does on RADV) we ended up inserting
instructions after the break.
Fixes: 5921a19d4b ("nir: add if opt opt_if_loop_last_continue()")
Reviewed-by: Dave Airlie <airlied@redhat.com>
pctx->resource_copy_region() needs to fall back to sw copy for
non-renderable formats. But previously for things that we could
not use the blitter for, would fall back to 3d. Which won't work
if 3d can't render to the dst format either.
Instead rework things to fallback to fd_resource_copy_region(),
which will try 3d core and then fall back to memcpy().
Fixes (for example) dEQP-GLES3.functional.texture.format.sized.2d.rgb9_e5_pot
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fixes a crash with unsupported formats in dEQP-GLES3.functional.texture.format.sized.2d.rgb9_e5_pot
Also fixes gpu hangs with some formats that are supported, but which we
don't know what internal-format to use for the blitter, for ex
dEQP-GLES3.functional.texture.format.sized.2d_array.rgb10_a2_pot
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fixes a crash in dEQP-GLES3.functional.shaders.fragdepth.compare.fragcoord_z
Fixes: 0d240c2214 freedreno/ir3: don't fetch unused tex components
Signed-off-by: Rob Clark <robdclark@gmail.com>
If we emit shader as a pointer to a GEM object, also set the RELOC_DUMP
flag as a hint to kernel that this is a useful buffer to snapshot for
debug dumps.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Pull in updated UAPI and use kernel API version to enable softpin.
Since MSM_SUBMIT_BO_DUMP flag was added at same time, use that to
signal to kernel that cmdstream buffers are useful to dump for
debugging/cmdstream-traces.
Signed-off-by: Rob Clark <robdclark@gmail.com>
I needed the same function for v3d. This was originally in d3e046e76c
("nir: Pull some of intel's image load/store format conversion to
nir_format.h") before we made am istake about simplifying the function.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This reverts commit 06fbcd2cd5.
nir_pack_half_2x16_split *isn't* vectorizable, it's 1-component only, thus
why we had this split-scalar code in the first place.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This helps a lot when debugging image load/store lowering on large
testcases. Unfortunately the Mesa enum name stuff is under src/mesa and
we can't get at it from the compiler.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We have a NIR path, and V3D doesn't have TGSI input for compute (only what
TTN can handle for the various gallium-internal shaders).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
If we gave this function 0 counter buffers, we'd still try and
access pCounterBuffers[0] as this check was incorrect.
Fixes crash with ext_transform_feedback-pipeline-basic-primgen
on zink on radv.
Fixes: 677b496b6 (radv: fix begin/end transform feedback with 0 counter buffers.)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The pass should work for all bit sizes but it's less clear that the
extra instructions are worth it on small integers. Also, the hardware
doesn't do mul_high on anything other than 32-bit integers and, absent
any decent mechanism for testing the pass on 8 and 16-bit types, it's
probably best to just leave it disabled for now.
Shader-db results on Sky Lake:
total instructions in shared programs: 15105795 -> 15111403 (0.04%)
instructions in affected programs: 72774 -> 78382 (7.71%)
helped: 0
HURT: 265
Note that hurt here actually means helped because we're getting rid of
integer quotient operations (which are a send on some platforms!) and
replacing them with fairly cheap ALU ops.
Reviewed-by: Ian Romanick ian.d.romanick@intel.com
It's a reasonably well-known fact in the world of compilers that integer
divisions by constants can be replaced by a multiply, an add, and some
shifts. This commit adds such an optimization to NIR for easiest case
of udiv. Other division operations will be added in following commits.
In order to provide some additional driver control, the pass takes a
minimum bit size to optimize.
Reviewed-by: Ian Romanick ian.d.romanick@intel.com
Currently we have the three dri "platforms" - drm, apple and windows.
Since xf86vidmode is a thing only for the drm one, adjust the
preprocessor guards and correctly check for the dependency.
v2: terminate the GLX_USE_WINDOWSGL hunk
Cc: Jon TURNEY <jon.turney@dronecode.org.uk>
Fixes: 5bc509363b ("glx: make xf86vidmode mandatory for direct rendering")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Virglrenderer does the wrong thing when given an instance divisor;
it tries to use the element-index rather than the binding-index as
the argument to glVertexBindingDivisor(). This worked fine as long
as there was a 1:1 relationship between elements and bindings,
which was the case util 19a91841c3 "st/mesa: Use Array._DrawVAO in
st_atom_array.c.".
So let's detect instance divisors, and restore a 1:1 relationship in
that case. This will make old versions of virglrenderer behave
correctly. For newer versions, we can consider making a better
interface, where the instance divisor isn't specified per element,
but rather per binding. But let's save that for another day.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 19a91841c3 "st/mesa: Use Array._DrawVAO in st_atom_array.c."
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Tested-By: Gert Wollny <gert.wollny@collabora.com>
When hile_size is 0, we can't enable HTILE. This doesn't change
anything, except not calling radv_image_alloc_htile().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Feral games aren't affected because they don't decompress DCC.
F1 2018 has one DCC decompression per frame, but I don't see
any performance improvements. This new predicate will be
probably more useful for DCC/MSAA.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I needed the same functions for v3d. Note that the color value in the
Intel lowering has already been cut down to image.chans num_components.
v2: Drop the half float one, since it was a 1-liner after cleanup.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Most of the bits were constant, but a few were missed. Avoids warnings
from v3d's upcoming static const bits declarations.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Here we rework force_unroll_array_access() so that we can reuse
the induction variable detection in a following patch.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
This documents a process for using GitLab Merge Requests as an second
way to submit code changes for Mesa. Only one of the two methods is
allowed for each patch series.
We will *not* require all patches to be emailed. Some code changes may
be reviewed and merged without any discussion on the mesa-dev email
list.
v2:
* No longer require email. Allow submitter to choose email or a
GitLab merge request.
* Various feedback from Brian, Daniel, Dylan, Eric, Erik, Jason,
Matt, Michel and Rob.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Rob Clark <robdclark@gmail.com>
This has thrown a few people off recently and it's good to have the
process and all the rational for it documented somewhere. A comment at
the top of nir_inline_functions seems as good a place as any.
Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
After going through the spec changelog, it looks like RADV
is up to date. Note that ANV also reports 1.1.90.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When I made sure that half-float texture-filtering was required for ES3,
I didn't realize that virgl doesn't report support for this correctly.
This regressed the GLES version available on top of several drivers,
including i965 from 3.2 to 2.0.
This is going to need protocol changes to fix properly, so let's just
restore the previous behavior by enabling floating-point filtering
unconditionally for now.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: fcf9fcee3c "mesa/main: do not require float-texture filtering for es3"
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
The implementation of these opcodes in the generator assumes that their
arguments are packed, and it generates register regions based on that
assumption.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
These are usually used for dealing with sparse resources but there's no
reason why we can't hook them up before we have sparse. We have the
hardware; let's light it up.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We have to lower some shadow instructions because they don't exist in
hardware and we have to lower txb+offset+clamp because the message gets
too big and we run into the sampler message length limit of 11 regs.
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
I don't know if one is better than the other or not but this approach
has the advantage that we never forget to copy information over and
we're not hard-coding quite as many assumptions. It's also a lot
simpler and much less code.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Instead of having to call two different lower_gradient functions based
on whether or not it's a cube, just make lower_gradient handle cubes.
This significantly simplifies some of the logic.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This simple check helps catch bugs early that can end up propagating
into later stages of the compile and triggering strange asserts.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
AoS sampling tries to use integers for coord wrapping when possible,
as it should be faster. However, for AVX, this was suboptimal, because
only floats can use 8x32bit vectors, whereas integers have to be split
into 4x32bit vectors. (I believe part of why it was slower was also
that at least earlier llvm versions had trouble optimizing it properly,
since you can still do simple bit ops with 8x32bit vectors, so a
sequence of int add / and / int add / and with such vectors would
actually end up doing 128bit inserts/extracts between the operations
instead of just doing the cheap 128bit ands.)
Hence, a special float coord wrapping path was added to AoS sampling.
But this path was actually disabled for a long time already, since we
found that just splitting everything before entering the AoS path was
still sligthly faster usually, so none of this float coord wrapping
code was used anymore (AoS sampling code, when avx2 isn't supported,
never sees vectors with length > 4). I thought it might be useful some
day again, but I'm not interested anymore in optimizing for very weird
instruction sets which have support for 256bit vectors for floats but
not for ints, so just drop it.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The commit aa0fed10d3 moved a bunch of Freedreno code to a common
directory. The previous directory had a .dir-locals file for Emacs.
This patch copies it to the new directory as well.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
LinkedTransformFeedback is normally populated, which had nerf'd varying
packing since the check was introduced.
Fixes: dbd52585fa st/nir: Disable varying packing when doing transform feedback.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The Vulkan working group recently discovered that we made a mistake in
assuming that PCI domains are 16-bit even though they can potentially be
32-bit values. To fix this, the next spec update will change the types
in the VK_EXT_pci_bus_info struct to be 32 bits which will be a
backwards-incompatible change. Normally, Khronos tries very hard to
never make backwards incompatible changes to specs. Hopefully, the
extension is new enough (2 months) that there are no shipping apps which
use the extension so this should be safe.
This commit disables the extension for both anv and radv in mesa and
should be back-ported to 18.3 ASAP so we avoid any potential issues with
new apps running on old drivers. I'll send out a commit (which we can
also back-port to 18.3 if we really care) to re-enable the extension in
both drivers once this week's spec update ships. The one known use of
this extension is internal to mesa and will continue working with the
extension disabled and will naturally update when we get a new header.
Cc: "18.3" <mesa-stable@lists.freedesktop.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
There's a few missing and convoluted bits:
- FramebufferTexture2DMultisampleEXT
Missing sanity check, should be desktop="false"
- RenderbufferStorageMultisampleEXT
Missing sanity check, is aliased to RenderbufferStorageMultisample.
Thus it's set only when desktop GL or GLES2 v3.0+, while the extension
is GLES2 2.0+.
If we flip the aliasing we'll break indirect GLX, so loosen the version
to 2.0. Not perfect, yet this is the most sane thing I could think of.
v2: [Emil] Fixup RenderbufferStorageMultisampleEXT, commmit message
Cc: Kristian H. Kristensen <hoegsberg@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108974
Fixes: 1b331ae505 ("mesa: Add core support for EXT_multisampled_render_to_texture{,2}")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Commit b028ce29f0 fixed a typo in
src/freedreno/Makefile.am, but ended up breaking the build for
freedreno. The typo inadvertently made things work, as we were not
supposed to link with libnir or libmesautil to begin with. Those come
in through libmesagallium and the typo prevented the duplicated
linkage.
Fixes: b028ce29f ("freedreno: add the missing _la in libfreedreno_ir3_la")
Cc: Emil Velikov <emil.velikov@collabora.com>
While disassembling the predicate always print flag subregister number
to keep grammar same across the generation for assembler tool.
v2: Combine consecutive format calls (Matt Turner)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When RepCtrl is set, the swizzle field is ignored by the hardware. In
order to ensure a 1-to-1 correspondence between the human-readable
disassembly and the binary instruction encoding always set the swizzle
to XXXX (all zeros) when it is unused due to RepCtrl
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Just to make it easier to run a nir tests together.
Fixes: a0ae12ca91
("nir/algebraic: Add unit tests for bitsize validation")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Currently we distinguish if the drawable is a window or pixmap by
checking xcb_present_select_input throws an error or not.
Yet, we don't always free the error state returned by xcb.
Cc: Kirill Burtsev <kirill.burtsev@qt.io>
Cc: Boyan Ding <boyan.j.ding@gmail.com>
Fixes: 6bd9ba7d07 ("loader: Add dri3 helper")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
[Emil: add commit message, fixes tag]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Following commits will introduce additional fields such as
guessed_trip_count. Renaming these will help avoid confusion
as our unrolling feature set grows.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
load_register_imm and load_register_mem take the destination as the
first argument, so I'd like load_register_reg to do the same the sake
of consistency. Otherwise, reading sequences of mixed LRI/LRM/LRR is
needlessly confusing.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
opnd() might delete the passed in instruction, but it's used through
i->srcExists() later in visit
v2: use continue instead return
v3: use brackets for the outer if/else chain
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
this race condition is pretty harmless, but also pretty trivial to fix
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Shaders are usually quite short, and are private to the context. We can
save memory and reduce the work the kernel needs to do at exec time by
packing them together in a stream uploader for long-lived state.
The TFU lets us format raster and SAND images into formats that can be
read by the texture engine, and do mipmap generation.
The UAPI comes from drm-next e69aa5f9b97f ("Merge tag
'drm-misc-next-2018-12-06' of git://anongit.freedesktop.org/drm/drm-misc
into drm-next")
The HW apparently has some issues (or at least a much more complicated VCM
calculation) with non-combined segments, and the closed source driver also
uses combined I/O. Until I get the last CTS failure resolved (which does
look plausibly like some VPM stomping), let's use combined I/O too.
We were exposing ARB_texture_float, but apparently not the OES subset
flag. Fixes regression from GLES3 support to GLES2.
Fixes: fcf9fcee3c ("mesa/main: do not require float-texture filtering
for es3")
In the softpin world, surface state base address may be a fixed 64-bit
address (with no associated BO). It makes sense to store this in the
offset field. But it needs to be the full size.
We also update the clear color address to be consistently uint64_t
everywhere so we can continue passing intel_miptree_get_clear_color
a pointer to the blorp_address's offset field without type mismatches.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Fix an emberrasing memory leak with the non-softpin submit/rb
implementation.
Fixes: f3cc0d2747 freedreno: import libdrm_freedreno + redesign submit
Signed-off-by: Rob Clark <robdclark@gmail.com>
Detect when a component of an (for example) texture fetch is unused and
propagate the updated wrmask back to the parent instruction.
Signed-off-by: Rob Clark <robdclark@gmail.com>
If we have an reloc from stateobjA to stateobjB, we would previously
leave stateobjB's bos out of the submit's bos table. Handle this case
by copying into stateobjA's reloc_bos table.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Linking against LLVM built with BUILD_SHARED_LIBS fails otherwise,
as the component is required for the draw module.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
There is not much to do in freedreno - tile layout and multisample
state for gmem renderings is programmed based on the pfb sample count,
while resolve blits take the destination sample count from the resource.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
In gallium, we model the attachment sample count as a new nr_samples
field in pipe_surface. A driver can indicate support for the extension
using the new pipe cap, PIPE_CAP_MULTISAMPLED_RENDER_TO_TEXTURE.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
This new pipe cap and the new nr_samples field in pipe_surface lets a
state tracker bind a render target with a different sample count than
the resource. This allows for implementing
EXT_multisampled_render_to_texture and
EXT_multisampled_render_to_texture2.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
This also turns on EXT_multisampled_render_to_texture which is a
subset of EXT_multisampled_render_to_texture2, allowing only
COLOR_ATTACHMENT0.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is
one if 8, 16, 32, or 64. This leads to having a few more opcodes but
now everything is consistent and booleans aren't a weird special case
anymore.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Suffixes are dropped from a bunch of conversion opcodes when it makes
sense to do so. Others are kept if we really do want the bit-size
restriction.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
All conversion opcodes require a destination size but this makes
constructing certain algebraic expressions rather cumbersome. This
commit adds support to nir_search and nir_algebraic for writing
conversion opcodes without a size. These meta-opcodes match any
conversion of that type regardless of destination size and the size gets
inferred from the sizes of the things being matched or from other
opcodes in the expression.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Instead of using an OrderedDict, just have a (necessarily sorted) array
of transforms and a set of opcodes.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Both of these things are already handled in the Value base class so we
don't need to handle them explicitly in Constant.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The non-failure path can be tested by just compiling mesa and then
testing it, but the failure paths won't be hit unless you make a mistake,
so it's best to test them with some unit tests.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Before this commit, there were two copies of the algorithm: one in C,
that we would use to figure out what bit-size to give the replacement
expression, and one in Python, that emulated the C one and tried to
prove that the C algorithm would never fail to correctly assign
bit-sizes. That seemed pretty fragile, and likely to fall over if we
make any changes. Furthermore, the C code was really just recomputing
more-or-less the same thing as the Python code every time. Instead, we
can just store the results of the Python algorithm in the C
datastructure, and consult it to compute the bitsize of each value,
moving the "brains" entirely into Python. Since the Python algorithm no
longer has to match C, it's also a lot easier to change it to something
more closely approximating an actual type-inference algorithm. The
algorithm used is based on Hindley-Milner, although deliberately
weakened a little. It's a few more lines than the old one, judging by
the diffstat, but I think it's easier to verify that it's correct while
being as general as possible.
We could split this up into two changes, first making the C code use the
results of the Python code and then rewriting the Python algorithm, but
since the old algorithm never tracked which variable each equivalence
class, it would mean we'd have to add some non-trivial code which would
then get thrown away. I think it's better to see the final state all at
once, although I could also try splitting it up.
v2:
- Replace instances of "== None" and "!= None" with "is None" and
"is not None".
- Rename first_src to first_unsized_src
- Only merge the destination with the first unsized source, since the
sources have already been merged.
- Add a comment explaining what nir_search_value::bit_size now means.
v3:
- Fix one last instance to use "is not" instead of !=
- Don't try to be so clever when choosing which error message to print
based on whether we're in the search or replace expression.
- Fix trailing whitespace.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Nothing to do, the compiler already handles that.
All new dEQP.VK.ubo.* and dEQP.VK.ssbo.* pass, except some
16-bit tests that are quite related to fdo bug #108114.
Only enable the extension on CIK+ because it might not work on SI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The original code was modifying the global drisw_lf variable, which is bad
when there are multiple contexts in single process, each initialized with
different loader. One may support put_image_shm and the other not.
Since there are currently only two possible combinations, lets create two
global tables, one for each. Lets make them const, since we won't change them
and they can be shared.
This fixes crash in VLC. It used two GL contexts (each in different thread), one
was initialized by its Qt GUI, the other by its video output plugin. The first
one set the put_image_shm=drisw_put_image_shm, the second did not, but
since the same structure was used, the drisw_put_image_shm was used too. Then
it crashed because the second loader did not have putImageShm set.
Downstream bug:
https://bugzilla.opensuse.org/show_bug.cgi?id=1113533
v2: Added Fixes and described the VLC bug.
Fixes: 63c427fa71 ("drisw: use putImageShm if available")
Signed-off-by: Michal Srb <msrb@suse.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
In case we are unlucky if the low part is 0xffffffff.
Fixes: 5d6a560a29 ("radv: do not use the availability bit for timestamp queries")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the driver used a compute shader for resetting a query pool,
it should be completed when caches are flushed.
This might reduce the number of stalls if operations are done
between vkCmdResetQueryPool() and vkCmdBeginQuery()
(or vkCmdWriteTimestamp()).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Alex Smith <asmith@feralinteractive.com>
After investigating on this, it appears that COND_WRITE doesn't
work correctly in some situations. I don't know exactly why does
it fail to update DB_Z_INFO.ZRANGE_PRECISION, but as AMDVLK
also uses COND_EXEC I think there is a reason.
Now the driver stores a new metadata value in order to reflect
the last fast depth clear state. If a TC-compat HTILE is fast cleared
with 0.0f, we have to update ZRANGE_PRECISION to 0 in order to
work around that hardware bug.
This fixes rendering issues with The Forest and DXVK and doesn't
seem to introduce any regressions.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108914
Fixes: 68dead112e ("radv: update the ZRANGE_PRECISION value for the TC-compat bug")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Technically speaking, this validation was incorrect, because GL_RGB565
is only supported in OpenGL ES 1.x if OES_framebuffer_object is
supported. This couldn't lead to any real incorrect behavior, because
all drivers support OES_framebuffer_object. But let's keep the code
self-documenting, by correcting the check as per the spec.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
For format fallbacks like ETC and ASTC, switching between sRGB and linear
decoding is undefined, or at least is not bit-exact. Same as
EXT_texture_sRGB_decode on GLES.
There are no piglit or dEQP regresssions.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
1. tools/i965_disasm.c:58:4: warning:
ignoring return value of ‘fread’,
declared with attribute warn_unused_result
fread(assembly, *end, 1, fp);
v2: Fixed incorrect return value check.
( Eric Engestrom <eric.engestrom@intel.com> )
v3: Zero size file check placed before fread with exit()
( Eric Engestrom <eric.engestrom@intel.com> )
v4: - Title is changed.
- The 'size' variable was moved to top of a function scope.
- The assertion was replaced by the proper error handling.
- The error message on a caller side was fixed.
( Eric Engestrom <eric.engestrom@intel.com> )
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
../src/intel/tools/aubinator_error_decode.c: In function ‘instdone_register_for_ring’:
../src/intel/tools/aubinator_error_decode.c:177:4: warning: enumeration value ‘I915_ENGINE_CLASS_INVALID’ not handled in switch [-Wswitch]
switch (class) {
^~~~~~
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It doesn't seem like the exact number has too much effect on the
performaince in "teximage". However setting it to just about anything
prevents some OOMs from getting hit. These values are not well-tuned,
but don't seem too bad.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
All TXF operations implicitly use sampler 0, and fail if it's not bound
to anything. This does not happen in LINKED_TSC mode, but we don't
currently use this.
We ensure that TSC entry at id 0 has the SRGB conversion bit enabled
(and all samplers we normally generate will too). Then when the TSC at
*slot* 0 (not to be confused with entry 0 in the global TSC table) is
unbound, we bind it to entry 0. This way, TXF operations are not
dependent on there being a regular sampler bound there.
Fixes arb_texture_buffer_object-subdata-sync among others. (TBO's are
particularly susceptible to this as they don't bind a sampler.)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes some crucible 3d miptree tests I've been working on
when executed using the compute shader path.
Fixes: d08f267814 (radv/gfx9: fix 3d image to image transfers on compute queues.)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
These days we don't always allocate scanout compatible textures anymore.
That does mean we have to fix the radv android WSI though.
Fixes: b1444c9ccb "radv: Implement VK_ANDROID_native_buffer."
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This extension is not properly tested (testing for
GL_ARB_fragment_shader_interlock is not sufficient), and since this was
noted in review on August 28th no tests have been sent.
Revert "i965: Add INTEL_fragment_shader_ordering support."
Revert "mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering"
This reverts commit 03ecec9ed2.
This reverts commit 119435c877.
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Eric Anholt <eric@anholt.net>
The OpenGL ES 3.0 specification, table 3.13 lists half-float textures as
filterable, but not float textures. So we shouldn't depend on
ARB_float_texture, which requires full filtering support for both.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
sRGB textures is a requirement for OpenGL ES 3.0, so let's make sure
we don't incorrectly enable a too high version.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
OpenGL ES 3.0 require this functionality, so we should also test for it
to avoid incorrectly exposing a too high GLES version.
On desktop, this has been required since all the way back in OpenGL 1.2
anyway.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
On OpenGL ES 2.0, there's separate extensions adding support for
half-float and float textures. So we need to validate the enums
separately as well.
This also prevents these enums from incorrectly being allowed on
OpenGL ES 1.x, where there's no extension that enables this in the
first place.
While we're at it, remove the pointless default-case, and the seemingly
stale fallthrough comment.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.EXT_texture_sRGB_R8 is set regardless of the API
that's used, so checking for those direcly will always allow the
enums from this extensions when they are supported by the driver.
There's no extension adding support for this on OpenGL ES before
version 3.0, so let's tighten the check.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.EXT_texture_sRGB is set regardless of the API that's
used, so checking for those direcly will always allow the enums from
this extensions when they are supported by the driver.
There's no extension adding support for this on OpenGL ES before
version 3.0, so let's tighten the check.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.EXT_texture_snorm is set regardless of the API
that's used, so checking for those direcly will always allow the
enums from this extensions when they are supported by the driver.
There's no extension adding support for this on OpenGL ES before
version 3.0, so let's tighten the check.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.OES_texture_float is set regardless of the API
that's used, so checking for those direcly will always allow the
enums from this extensions when they are supported by the driver.
There's no extension enabling floating-point textures for OpenGL
ES 1.x, so we shouldn't allow those enums there.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.EXT_texture_type_2_10_10_10_REV is set regardless of
the API that's used, so checking for those direcly will always enable
extensions when they are supported by the driver.
There's no corresponding extension for OpenGL ES 1.x/2.0, so we
shouldn't allow these enums there.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This extension requies OpenGL, and shouldn't be available on OpenGL ES.
So let's not allow the enums from it either.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.EXT_texture_shared_exponent is set regardless of the
API that's used, so checking for those direcly will always allow the
enums from this extensions when they are supported by the driver.
We also need to make sure this is enabled on OpenGL ES 3. Because the
check is repeated, let's introduce a helper.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
EXT_packed_float isn't supported on OpenGL ES, we shouldn't allow
these enums there, before OpenGL ES 3.0 which also introduce support
for these enums.
Since this check is repeated a lot, let's make a helper for this.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
EXT_packed_float isn't supported on OpenGL ES, we shouldn't allow
these enums there, before OpenGL ES 3.0 which also introduce support
for these enums.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Floating-point depth buffers are only supported on OpenGL 3.0, OpenGL ES
3.0, or if ARB_depth_buffer_float is supported. Because we checked a
driver capability rather than using an extension-check helper, we ended
up incorrectly allowing this on OpenGL ES 1.x and 2.x.
Since this logic is repeated, let's make a helper for it.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Integer textures shouldn't be implicitly exposed on OpenGL ES 1.x and
2.x, but because the code checked against a driver-capability rather
than using an extension-check helper, we ended up accidentally allowing
these enums on older versions when the driver supports it.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ARB_texture_rgb10_a2ui isn't supported on OpenGL ES, we shouldn't expose
it there even if the driver supports it.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.ARB_texture_stencil8 is set regardless of the API
that's used, so checking for those direcly will always allow the
enums from this extensions when they are supported by the driver.
So let's instead check for both ARB_texture_stencil8 and
OES_texture_stencil8, so we support depth textures on OpenGL and
OpenGL ES 2.0+. There's no extension enabling stencil-textures for
OpenGL ES 1.x, so we shouldn't allow those enums there.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.ARB_depth_texture is set regardless of the API that's
used, so checking for those direcly will always allow the enums from
this extensions when they are supported by the driver.
So let's instead check for both ARB_depth_texture and OES_depth_texture,
so we support depth textures on OpenGL and OpenGL ES 2.0+. There's no
extension enabling depth-textures for OpenGL ES 1.x, so we shouldn't
allow those enums there.
This fixes oes_packed_depth_stencil-depth-stencil-texture_gles1 on i965
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.KHR_texture_compression_astc_ldr is set regardless of
the API that's used, so checking for those direcly will always enable
extensions when they are supported by the driver.
But there's no extension enabling ASTC for OpenGL ES 1.x, so we
shouldn't allow those enums there.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.ARB_ES3_compatibility is set regardless of the API
that's used, so checking for those direcly will always enable
extensions when they are supported by the driver.
But there's no extension enabling ETC2 for OpenGL ES 1.x, so we
shouldn't allow those enums there.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There's no extension enabling S3TC formats on OpenGL ES 1.x, so we
shouldn't allow these even if the driver can support it. So let's check
for EXT_texture_compression_s3tc instead of ANGLE_texture_compression_dxt,
which is supported on all other OpenGL variations.
We also need to use _mesa_has_EXT_texture_compression_s3tc() instead of
checking the driver cap directly, otherwise we end up enabling this on
OpenGL ES 1.x, as the API isn't checked.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
_mesa_has_FOO_bar() knows about the APIs these extensions should be
supported under, so let's use that to simplify these checks a bit.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This makes the logic a little bit easier to follow, and reduce a bit of
repetition.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This makes the logic a little bit easier to follow; this is *either*
about ES2 compatibility *or* about gles. GL_RGB565 was added already in
OpenGL ES 1.0.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Using the _mesa_has_FOO_bar helpers is generally more safe and should
generally be prefered over checking driver-caps like this code did,
because the _mesa_has_FOO_bar helpers also verify the API type and
version.
This shouldn't have any practical effect here, as this function only
gets called for OpenGL ES 3.x right now. But if this was to change in
the future, this makes the function behave a lot more predictable.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
S3_s3tc is the extension that enables this functionality on desktop, so
let's check for that one. The _mesa_has_S3_s3tc() helper already
verifies the API according to the extension-table.
As for the second hunk, we currently already only expose
EXT_texture_compression_s3tc on desktop so by using the helper instead,
we get rid of this detail here, and once we enable it for GLES we'll
automaticall get the interaction right.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
_mesa_es3_error_check_format_and_type isn't specific to OpenGL ES 3.x,
it applies to all versions of OpenGL ES. So let's rename it to reflect
this.
While we're at it, let's also rename a helper function it uses similarly.
As the helper is static, we can also remove the namespacing-prefix from
the name.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
All other _mesa_has_foo functions return bool rather than GLboolean, so
let's follow that style here as well.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The new approach is that samplers don't get unbound even if they won't be used
in a draw and we should just leave them be as well.
Fixes a regression in multiple windows games using gallium nine and nouveau.
v2: adjust num_samplers to keep track of the highest sampler bound
v3: rework how to set the new value of num_samplers
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106577
Fixes: 4d6fab245e
"cso: don't track the number of sampler states bound"
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixes -Wstringop-truncation compiler warnings.
See f836d799f9 "intel/decoder: use snprintf(..., "%s", ...) instead of strncpy"
Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
Android has cpufeatures library but pinning of threads is not supported
PIPE_OS_LINUX code path causes build error due to sched_getcpu() unavailable
thus we need to avoid setting HAVE_SCHED_GETCPU for Android
Fixes: 48f2160 ("st/mesa: regularly re-pin driver threads to the CCX where the app thread is")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This patch fixes this build error.
CC tests/xvmc_bench.o
In file included from tests/xvmc_bench.c:35:
tests/testlib.h:38:10: fatal error: 'X11/Xlib.h' file not found
^~~~~~~~~~~~
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We can mark the buffer unclean if it's ever bound as a TBO,
SSBO, ABO, or image.
This improves
dEQP-GLES3.performance.buffer.data_upload.function_call.map_buffer_range.new_specified_buffer.flag_write_full.stream_draw
from 9.58 MB/s to 451.17 MB/s.
v2: Track buffer cleanliness as a function of bindings (Ilia).
v3: virgl_modify_clean --> virgl_dirty_res (Erik)
Tested-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
We flush everytime the command buffer (16 kB) is full, which is
quite costly.
This improves
dEQP-GLES3.performance.buffer.data_upload.function_call.buffer_data.new_buffer.usage_stream_draw
from 111.16 MB/s to 1930.36 MB/s.
In addition, I made the benchmark produce buffers from 0 --> VIRGL_MAX_CMDBUF_DWORDS * 4,
and tried ((VIRGL_MAX_CMDBUF_DWORDS * 4) / 2), ((VIRGL_MAX_CMDBUF_DWORDS * 4) / 4), etc.
I didn't notice any clear differences, so let's just go with the most obvious
heuristic.
Tested-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Tested running WebGL aquarium on Nvidia host (10,000 fishes)
This moves us from 7 fps to 9 fps. After quadrupling, performance
gains diminish.
v2: Remove change ID (Erik)
Tested-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Pipeline state pending bits should be taken into account when copying
results.
In the particular bug below, the results of the
vkCmdCopyQueryPoolResults() command was being overwritten by the
preceding vkCmdCopyBuffer() with a same destination buffer. This is
because we copy the buffers using the 3D pipeline whereas we copy the
query results using the command streamer. Those pieces of HW work in
parallel and the results are somewhat undefined.
v2: Unconditionally flush the pipeline before copying the results
(Jason)
v3: Wrap & expressions (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108894
Cc: mesa-stable@lists.freedesktop.org
The calculated length of a line may be infinite, if the coords we
get are bogus. This leads to an infinite loop in line stippling.
To prevent this test for this explicitly (although technically
on at least x86 sse it would actually work without the explicit
test, as long as we use the int-converted length value).
While here also get rid of some always-true condition.
Note this does not actually solve the root cause, which is that
the coords we receive are bogus after clipping. This seems a difficult
problem to solve. One issue is that due to float arithmetic, clip w
may become 0 after clipping if the incoming geometry is
"sufficiently degenerate", hence x/y/z ndc (and window) coords will
be all inf (or nan). Even with w not quite 0, I believe it's possible
we produce values which are actually outside the view volume.
(Also, x=y=z=w=0 coords in clipspace would be not considered subject
to clipping, and similarly result in all NaN coords.) We just hope for
now other draw stages (and rasterizers) can handle those relatively
safely (llvmpipe itself should be sort of robust against this, certainly
converstion to fixed point will produce garbage, it might fail a couple
assertions but should neither hang nor crash otherwise).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Update to the internal master as of 2018-11-15.
This has a lot of gratuitous whitespace change, but on the plus
side it's built using the same tooling that's used for AMDVLK,
which should help going forward.
Our choices here are simply redundant as long as sin.flags is set
correctly.
(v2:
- remove unused function parameter)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If all layers are bound we can perform a fast color or depth clear
instead of iterating over all layers. This has the advantage
to avoid trashing the framebuffer for nothing if you we end up by
doing a fast clear when calling radv_clear_image_layer(), and
clearing all layers in one shot is obviously faster.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fix build error.
CXXLD pipe_msm.la
../../../../src/gallium/drivers/freedreno/.libs/libfreedreno.a(freedreno_batch.o): In function `batch_init':
src/gallium/drivers/freedreno/freedreno_batch.c:54: undefined reference to `fd_device_version'
src/gallium/drivers/freedreno/freedreno_batch.c:59: undefined reference to `fd_submit_new'
src/gallium/drivers/freedreno/freedreno_batch.c:61: undefined reference to `fd_submit_new_ringbuffer'
src/gallium/drivers/freedreno/freedreno_batch.c:64: undefined reference to `fd_submit_new_ringbuffer'
src/gallium/drivers/freedreno/freedreno_batch.c:66: undefined reference to `fd_submit_new_ringbuffer'
src/gallium/drivers/freedreno/freedreno_batch.c:70: undefined reference to `fd_submit_new_ringbuffer'
Fixes: b4476138d5 ("freedreno: move drm to common location")
Fixes: aa0fed10d3 ("freedreno: move ir3 to common location")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Sadly, the 3 games I tested (DeusEx:MD, DiRT Rally, DOTA 2) are unaffected
by the overallocation, because I guess their buffers don't fall into
the small range below a power-of-two size.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
- the slab buffer size increased from 128 KB to 2 MB (PTE fragment size)
- the max suballocated buffer size increased from 64 KB to 256 KB,
this increases memory usage because it wastes memory
- the number of suballocators increased from 1 to 3 and they are layered
on top of each other to minimize unused space in slabs
The final increase in memory usage is:
DeusEx:MD: 1.8%
DOTA 2: 1.75%
DiRT Rally: 0.2%
The kernel driver will also receive fewer buffers.
Vulkan and Gallium don't use Mesa's gl_program data structure, so they
can't poke at 'prog'. But we can simply use the copy of the shader info
stored with the NIR shader, which is guaranteed to exist.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Otherwise, I get this error:
main/egldevice.h:54:13: error: ‘NULL’ undeclared (first use in this function)
dev = NULL;
^~~~
with this config:
./autogen.sh --enable-gles1 --enable-gles2 --with-platforms='surfaceless' --disable-glx
--with-dri-drivers="i965" --with-gallium-drivers="" --enable-gbm
v3: Use stddef.h (Matt)
v4: Modify commit message (Eric)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The common truncf(x + 0.5) fails for the floating-point value just less
than 0.5 (nextafterf(0.5, 0.0)). nextafterf(0.5, 0.0) + 0.5, after
rounding is 1.0, thus truncf does not produce the desired value.
The solution is to add nextafterf(0.5, 0.0) instead of 0.5 before
truncating. This works for all values.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Introduce a new driver-private transfer flag RADEON_TRANSFER_TEMPORARY
that specifies whether the caller will use buffer_unmap or not. The
default behavior is set to permanent maps, because that's what drivers
do for Gallium buffer maps.
This should eliminate the need for hacks in libdrm. Assertions are added
to catch when the buffer_unmap calls don't match the (temporary)
buffer_map calls.
I did my best to update r600 for consistency (r300 needs no changes
because it never calls buffer_unmap), even though the radeon winsys
ignores the new flag.
As an added bonus, this should actually improve the performance of
the normal fast path, because we no longer call into libdrm at all
after the first map, and there's one less atomic in the winsys itself
(there are now no atomics left in the UNSYNCHRONIZED fast path).
Cc: Leo Liu <leo.liu@amd.com>
v2:
- remove comment about visible VRAM (Marek)
- don't rely on amdgpu_bo_cpu_map doing an atomic write
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Originally the driver reported GL_FRAMEBUFFER_UNSUPPORTED in all cases,
adding more specific error messages was not correct and broke many tests.
Mostly revert this and only report GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT
for MESA_FORMAT_R_SRGB8.
Fixes: ebcde34545
i965: be more specific about FBO completeness errors
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108805
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The format is emulated by using ISL_FORMAT_L8_SRGB, therefore we need to
force swizzles for the GBA channels. However, doing this only based on the
data type GL_RED breaks other formats, therefore, test specifically for the
format.
Fixes: c5363869d4
i965: Force zero swizzles for unused components in GL_RED and GL_RG
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
vtest doesn't implement the according API and would segfault:
Program received signal SIGSEGV, Segmentation fault.
#0 0x0000000000000000 in ?? ()
#1 in virgl_fence_server_sync at
src/gallium/drivers/virgl/virgl_context.c:1049
#2 in st_server_wait_sync at
src/mesa/state_tracker/st_cb_syncobj.c:155
so just don't do the call when the function pointers are not set.
Fixes dEQP:
dEQP-GLES3.functional.fence_sync.wait_sync_smalldraw
dEQP-GLES3.functional.fence_sync.wait_sync_largedraw
Fixes: d1a1c21e76
virgl: native fence fd support
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Robert Foss <robert.foss@collabora.com>
Avoids:
Conditional jump or move depends on uninitialised value(s)
at 0x9E2B39F: virgl_vtest_winsys_resource_cache_create (virgl_vtest_winsys.c:379)
by 0x9E2725F: virgl_buffer_create (virgl_buffer.c:169)
by 0x9E246D5: virgl_resource_create (virgl_resource.c:60)
by 0xA0C1B9F: bufferobj_data (st_cb_bufferobjects.c:344)
by 0xA0C1B9F: st_bufferobj_data (st_cb_bufferobjects.c:390)
by 0x9F4ACE3: vbo_use_buffer_objects (vbo_exec_api.c:1136)
by 0xA0C68C3: st_create_context_priv (st_context.c:416)
by 0xA0C707A: st_create_context (st_context.c:598)
by 0x9F81C6B: st_api_create_context (st_manager.c:918)
by 0x9BBE591: dri_create_context (dri_context.c:161)
by 0x9BB6931: driCreateContextAttribs (dri_util.c:473)
by 0x4E97A44: drisw_create_context_attribs (drisw_glx.c:630)
by 0x4E7C591: glXCreateContextAttribsARB (create_context.c:78)
Uninitialised value was created by a stack allocation
at 0x9E2B249: virgl_vtest_winsys_resource_cache_create (virgl_vtest_winsys.c:342)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Robert Foss <robert.foss@collabora.com>
We normally call with stderr which is unbuffered, so this won't affect
that, but it does let me call nir_print_shader(nir, fopen("log", "w+"))
from gdb and actually get the whole shader in my file.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
I've been using this with the kmsro series to test v3d on VKMS without my
old KMS hack in the v3d kernel driver. KMSRO still needs some cleanup,
but v3d RO support seems reasonable.
Improves performance in Talos by about 15% (and significant improvements
in RotR and possibly other but did not bench with final patch) on
kernel 4.19 and earlier.
On 4.20+ a similar effect comes from
433ca054949a "drm/amdgpu: try allocating VRAM as power of two"
v2: Do not impact the alignment of the physical memory.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Since 1285f71d3e landed, it needs to provide apps with proper sample
position for MSAA.
Currently no way to query this to hw, these are taken from blob driver.
Fixes: dEQP-GLES31.functional.texture.multisample.samples_#.sample_position
Signed-off-by: Rob Clark <robdclark@gmail.com>
We set equiv bit in SP_FS_CTRL_REG0. Somehow the hw doesn't hang with
this mismatched config, but does run slower. It is faster with either
neither bit set, or both bits set, but both is the fastest of the three
configurations. Worth a bit over 10% gain in glmark2.
Spotted-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@gmail.com>
blip_fp uses GENERIC as input, so blit_vp should match for linking
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Adds all missing texture related logic. For everything to work it also
needs changes to ir2/fd2_program, which are part of the ir2 update patch.
Note: it needs rnndb update
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
[remove stray patch]
Signed-off-by: Rob Clark <robdclark@gmail.com>
200: 256KiB GMEM A200 (imx53)
201: 128KiB GMEM A200 (imx51)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@gmail.com>
On older gens, the CLIP_ADJ bitfields were actually 3.6 fixed point.
Which might make more sense. Although this formula comes up with values
pretty close to what blob does for various viewport sizes (for at least
a5xx and a6xx), and seems to work.
Signed-off-by: Rob Clark <robdclark@gmail.com>
f6131d4ec7 had the side effect of enabling LRZ w/ 32b depth buffers.
But there are some bugs with this, which aren't fully understood yet,
so for now just skip LRZ w/ z32..
Fixes: f6131d4ec7 freedreno/a6xx: Clear z32 and separate stencil with blitter
Signed-off-by: Rob Clark <robdclark@gmail.com>
We generate an IB to clear the gmem at flush time and jump to it
before rendering each tile. This lets us get rid of the command stream
patching for gmem offsets.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
For RGB surfaces (for example) we don't really care that the colormask
is 0x7 instead of 0xf. This should not trigger clear_with_quad()
slowpath.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If we can't clear all the buffers with pctx->clear() (say, for example,
because of ColorMask), push the buffers we *can* clear with pctx->clear()
first. Tilers want to see clears coming before draws to enable fast-
paths, and clearing one of the attachments with a quad-draw first
confuses that logic.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Move (most of) the ir3 compiler to src/freedreno/ir3 so that it can be
re-used by some future vulkan driver. The parts that are gallium
specific have been refactored out and remain in the gallium driver.
Getting the move done now so that it can happen before further
refactoring to support a6xx specific instructions.
NOTE also removes ir3_cmdline compiler tool from autotools build since
that was easier than fixing it and I normally use meson build. Waiting
patiently for the day that we can remove *everything* from the autotools
build.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Split the parts that are gallium specific into ir3_gallium so the rest
can move to a common location outside of gallium.
Signed-off-by: Rob Clark <robdclark@gmail.com>
A bit annoying to have to copy into our own struct. But this is
something the compiler really needs to know, at least on earlier
generations where streamout is implemented in shader.
Signed-off-by: Rob Clark <robdclark@gmail.com>
So I can drop env2u() helper from freedreno_util.h and get rid of one
small ir3 dependency on gallium/freedreno
Signed-off-by: Rob Clark <robdclark@gmail.com>
Just massive search/replace for the most part.
Step towards removing ir3 dependency on disasm.h which is shared by
a2xx. One step closer to being able to move ir3 out of gallium.
Signed-off-by: Rob Clark <robdclark@gmail.com>
So that we can re-use at least parts of it for vulkan driver, and so
that we can move ir3 to a common location (which uses fd_bo to allocate
storage for shaders)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Prep work to move drm to a common location.
Slightly hacky, but the softpin debug flag is only temporary.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The compiler doesn't know that ny != 0, so x might be uninitialized for
the printf at the end.
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
L3 allocation table in h/w specification recommends using 4 KB
granularity for programming allocation fields in L3CNTLREG.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
L3 allocation table in h/w specification recommends using 4 KB
granularity for programming allocation fields in L3CNTLREG.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Use L3 configuration specified in h/w specification.
V2: Drop configs which do under allocation of l3 cache.
Bump up the comment above table.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Scons and autotools don't define it, and as of last commit nothing
uses it.
`VERSION` is also a generic enough name that something somewhere will
eventually clash, and we don't want to repeat the LLVM `DEBUG` fiasco.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Everything else uses PACKAGE_VERSION, so let's be consistent, and
VERSION and PACKAGE_VERSION are currently defined to be the same in
meson and android, while VERSION is undefined in autotools and scons.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Per chapter 3.2 "Instances":
> Providing a NULL VkInstanceCreateInfo::pApplicationInfo or providing
> an apiVersion of 0 is equivalent to providing an apiVersion of
> VK_MAKE_VERSION(1,0,0).
Reported-by: Niklas Haas <git@haasn.xyz>
Fixes: 8c048af589 "anv: Copy the appliation info into the instance"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If glGetTexImage or glGetnTexImage is called with a level that doesn't
exist, we get an error message on this form:
Mesa: User error: GL_INVALID_VALUE in glGetTexImage(depth = 0)
This is clearly nonsensical, because these APIs don't even have a
depth-parameter. The reason is that get_texture_image_dims() return
all-zero dimensions for non-existent texture-images, and we go on to
validate these dimensions as if they were user-input, because
glGetTextureSubImage requires checking.
So let's split this logic in two, so glGetTextureSubImage can have
stricter input-validation. All arguments that are no longer validated
are generated internally by mesa, so there's no use in validating them.
Fixes: 42891dbaa1 "gettextsubimage: verify zoffset and depth are correct"
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
This check is the only part of dimensions_error_check that isn't about
error-checking the offset and size arguments of
glGet[Compressed]TextureSubImage(), so it doesn't really belong in here.
This doesn't make a difference right now, apart for changing the
presedence of this error. But it will make a difference for the next
patch, where we no longer call this method from the non-sub tex-image
getters.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
This error checking is the same for teximage and texsubimage getters, so
let's factor it out to its own function.
This will be useful when getteximage and gettexsubimage gets their own
error checking routines a bit later.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
This will be useful when we split error-checking for getteximage and
gettexsubimage later.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
The explanation quotes the spec on the following wording to justify the
error:
"An INVALID_VALUE error is generated if xoffset + width is greater than
the texture’s width, yoffset + height is greater than the texture’s
height, or zoffset + depth is greater than the texture’s depth."
However, this shouldn't generate an error in the case where *all three*
of width, xoffset and the texture's width are zero. In this case, we end
up generating an unspecified error.
So let's remove this check, and instead make sure that we consider this
as an empty texture.
So let's not generate an error, there's non mandated in the spec in
xoffset/yoffset/zoffset = 0 case. We already avoid doing any work in
this case, because of the final, non-error generating check in this
function.
Fixes: b37b35a5d2 "getteximage: assume texture image is empty for non defined levels"
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
This function has been core since OpenGL 4.3, so naming the
implementation and reporting erros using an ARB-suffix can be
confusing.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
When a shader program is de-serialized the gl_shader_program passed in
may actually still hold memory allocations for the transform feedback
varyings. If that is the case, free the varying names and reallocate
the new storage for the names array.
This fixes a memory leak:
Direct leak of 48 byte(s) in 6 object(s) allocated from:
in malloc (/usr/lib64/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xdb880)
in transform_feedback_varyings ../../samba/mesa/src/mesa/main/transformfeedback.c:875
in _mesa_TransformFeedbackVaryings ../../samba/mesa/src/mesa/main/transformfeedback.c:985
...
Indirect leak of 42 byte(s) in 6 object(s) allocated from:
in __interceptor_strdup (/usr/lib64/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0x761c8)
in transform_feedback_varyings ../../samba/mesa/src/mesa/main/transformfeedback.c:887
in _mesa_TransformFeedbackVaryings ../../samba/mesa/src/mesa/main/transformfeedback.c:985
Fixes: ab2643e4b0
glsl: serialize data from glTransformFeedbackVaryings
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
We used the layer count which results in an off by one error.
Not sure this really affects anything.
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Use VAO binding information in feedback rendering. In theory
it should reduce the amount of buffer objects scheduled for rendering.
Feedback rendering is implemented in a crude way anyhow, so I do not
expect much gain here. But for the sake of code reuse we should
use the same code for the same task. And finally if feeback rendering
may get improved the array setup is already well done there.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The change removes the reference that is held on the entries of the
vbuffers[] array. The new code does not do that anymore as following
the code into draw_set_vertex_buffers() the draw context holds an
other reference as long as it is reset down the function again.
So it should be already by that argument save to remove that
additional reference count.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Factor out vertex array setup routines from the array state atom.
The factored functions will be used in feedback rendering in the
next change.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
In st_atom_array, we only need to unmap the upload buffer that
was actually used.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
In st_atom_array, we only need to care for unmapping the upload buffer
if we actually used it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
dnz flag only applies for multiplications (e.g. to make 0 * Infinity
becomes 0 instead of NaN). Once we optimize a MAD into an ADD, the dnz
flag no longer makes sense, and upsets the GM107 emitter (since it looks
at the ftz and dnz flags together).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
We don't need to flush anything before these two commands as well.
This is because they have to be externally synchronized, so the
app should have called CmdPipelineBarrier() prior to that and the
driver should have flushed the caches.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The rules encoded in this code also applies to OpenGL ES 3.0 and up,
but the per-enum validation has already been taught about these rules.
So let's get rid of this duplicate, narrow version of the validation.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ctx->Extensions.ARB_timer_query is set based on the driver-
capabilities, not based on the context type. We need to check
against _mesa_has_ARB_timer_query(ctx) instead to figure out
if the extension is really supported. We also need to check for
EXT_disjoint_timer_query for GLES-support.
This shouln't have any functional effect, as this entry-point is only
valid on desktop GL, or on GLES with EXT_disjoint_timer_query in the
first place. But if this gets added to the core of a future version
of ES, this should be a step in the right direction.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ctx->Extensions.ARB_query_buffer_object is set based on the driver-
capabilities, not based on the context type. We need to check against
_mesa_has_ARB_query_buffer_object(ctx) instead to figure out if the
extension is really supported.
This turns attempts to read queries into buffer objects on ES 3 into
errors, as required by the spec.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ctx->Extensions.ARB_transform_feedback_overflow_query is set based on
the driver-capabilities, not based on the context type. We need to
check against _mesa_has_RB_transform_feedback_overflow_query(ctx)
instead to figure out if the extension is really supported.
This turns usage of GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW and
GL_TRANSFORM_FEEDBACK_OVERFLOW into errors on ES 3, as required by the
spec.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ctx->Extensions.EXT_transform_feedback is set based on the driver-
capabilities, not based on the context type. We need to check against
_mesa_has_EXT_transform_feedback(ctx) instead to figure out if the
extension is really supported. We also need to check for
OES_geometry_shader.
This turns usage of GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN into an
error on ES 2, as well as usage of GL_PRIMITIVES_GENERATED on ES 3, both
as required by the spec.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ctx->Extensions.EXT_timer_query is set based on the driver-
capabilities, not based on the context type. We need to check against
_mesa_has_EXT_timer_query(ctx) instead to figure out if the extension
is really supported. We also need to check for
EXT_disjoint_timer_query, which enables the same functionality for ES.
This turns usage of GL_TIME_ELAPSED into an error on ES 3, as is
required by the spec.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ctx->Extensions.ARB_ES3_compatibility is set based on the driver-
capabilities, not based on the context type. We need to check against
_mesa_has_ARB_ES3_compatibility(ctx) instead to figure out if the
extension is really supported.
In addition, EXT_occlusion_query_boolean should also allow this
behavior.
This shouldn't cause any functional change, as all drivers that support
ES3_compatibility should in practice enable either ES3_compatibility or
EXT_occlusion_query_boolean under all APIs that export this symbol.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ctx->Extensions.ARB_occlusion_query2 is set based on the driver-
capabilities, not based on the context type. We need to check against
_mesa_has_ARB_occlusion_query2(ctx) instead to figure out if the
extension is really supported.
In addition, EXT_occlusion_query_boolean should also allow this
behavior.
This shouldn't cause any functional change, as all drivers that support
ARB_occlusion_query2 should in practice enable either
ARB_occlusion_query2 or EXT_occlusion_query_boolean under all APIs that
export this symbol.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ctx->Extensions.ARB_occlusion_query is set based on the driver-
capabilities, not based on the context type. We need to check against
_mesa_has_ARB_occlusion_query(ctx) instead to figure out if the
extension is really supported. We also need to check for
ARB_occlusion_query2, as ARB_occlusion_query isn't available in core
contexts.
This turns usage of GL_SAMPLES_PASSED into an error on ES 3, as is
required by the spec.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The _mesa_has_ARB_pipeline_statistics_query(ctx)-helper will already
check the GLES-version according to the extension-table, so if this
extension would ever be back-ported to ES, we only need to update the
table to support this.
This shouln't have any functional effect.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
These enums all have the same values as their non-prefixed versions, and
there's several aliases for some of them. So let's switch to the
non-prefixed versions for simplicity.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
According to the extension spec, this was initially released in 2011,
so let's set this to the correct value.
The value of 2001 could be a copy-paste mistake, as ARB_occlusion_query
which this is based on was released then.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
EXT_occlusion_query_boolean require support for GL_ANY_SAMPLES_PASSED,
which ARB_occlusion_query doesn't supply. We need ARB_occlusion_query2
for this instead.
This is still not 100% accurate, as we also require support for the
GL_SAMPLES_PASSED_CONSERVATIVE target, which isn't guaranteed by either
ARB_occlusion_query nor ARB_occlusion_query2. But it should be trivial
to implement for any driver supporting ARB_occlusion_query2, as it can
simply be implemented as GL_ANY_SAMPLES_PASSED.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Fixes issues with following SkQP tests:
unitTest_VulkanHardwareBuffer_Vulkan_EGL_Syncs
unitTest_VulkanHardwareBuffer_Vulkan_Vulkan_Syncs
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Instead of taking a whole pipeline (which could be anything!), just take
a physical device and robust_buffer_access boolean. This makes it
easier to verify that only the things in the hash actually affect
pipeline compilation.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
It affects apply_pipeline_layout. Shaders compiled with the wrong value
will work but they may not be robust as requested by the app.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Our compile already splits UBO loads into scalars and the untyped
surface read messages we use for SSBO reads and writes only require
dword alignment.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
'post_flush' is only set to NULL for the normal clear path
(ie. only vkCmdClearColorImage() and vkCmdClearDepthStencilImage()
are affected commands).
Because these two operations have to be externally synchronized
with VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT,
it's useless to set those flags internallY.
VK_PIPELINE_STAGE_TRANSFER_BIT will wait for compute to be idle,
while VK_ACCESS_TRANSFER_WRITE_BIT will invalidate both L1 vector
caches and L2. RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2 will be superseded
by RADV_CMD_FLAG_INV_GLOBAL_L2.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Since apps also have to follow the ImageFormatProperties query,
we can disallow formats that don't allow image stores (for AMD
that would be SRGB formats).
Note that this only affects anything if the app actually decides
to use the flag.
Had someone ask for this on IRC and at least on the AMD side we
can support it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
thread_submit can be useful even without DRI_PRIME,
as it can help avoid missed pageflips.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
The path allowing triple buffering behaviour wasn't implemented
yet for thread_submit
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
This fixes two memory leaks reported by ASAN:
Direct leak of 248 byte(s) in 1 object(s) allocated from:
in malloc (/usr/lib64/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xdb880)
in r600_alloc_buffer_struct ../../samba/mesa/src/gallium/drivers/r600/r600_buffer_common.c:578
in r600_buffer_create ../../samba/mesa/src/gallium/drivers/r600/r600_buffer_common.c:600
in r600_resource_create_common ../../samba/mesa/src/gallium/drivers/r600/r600_pipe_common.c:1265
in r600_resource_create ../../samba/mesa/src/gallium/drivers/r600/r600_pipe.c:725
in pipe_buffer_create ../../samba/mesa/src/gallium/auxiliary/util/u_inlines.h:291
in update_gs_block_state ../../samba/mesa/src/gallium/drivers/r600/r600_state_common.c:1482
Direct leak of 248 byte(s) in 1 object(s) allocated from:
in malloc (/usr/lib64/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xdb880)
in r600_alloc_buffer_struct ../../samba/mesa/src/gallium/drivers/r600/r600_buffer_common.c:578
in r600_buffer_create ../../samba/mesa/src/gallium/drivers/r600/r600_buffer_common.c:600
in r600_resource_create_common ../../samba/mesa/src/gallium/drivers/r600/r600_pipe_common.c:1265
in r600_resource_create ../../samba/mesa/src/gallium/drivers/r600/r600_pipe.c:722
in pipe_buffer_create ../../samba/mesa/src/gallium/auxiliary/util/u_inlines.h:291
in update_gs_block_state ../../samba/mesa/src/gallium/drivers/r600/r600_state_common.c:1489
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Fixes: 1371d65a7f
r600g: initial support for geometry shaders on evergreen (v2)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
CP DMA can only be busy when the driver copies buffers. The
only affected Vulkan commands are vkCmdCopyBuffer() and
vkCmdUpdateBuffer() (because we fallback to a copy depending on
a threshold). Clear operations are currently not concerned
because the driver always syncs after the last DMA operation.
Per the spec, these two operations have to be externally
synchronized with VK_PIPELINE_STAGE_TRANSFER_BIT.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Unnecessary as they allow the app to call vkCmdPipelineBarrier()
inside the render pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This reverts commit 1f29f4db1e.
For this to work the compiler must ensure that it never puts
the values that arrive to this helper into unsigned variables
at any point in its processing, since that would not apply sign
extension to the value and it would break the expectations here.
Unfortunately, we use uint64_t extensively to pass and copy
things around, so some times we get to this helper with values
that are not properly sign extended to 64-bit. Here is an example
for an 8-bit value that comes from a switch case:
(gdb) p /x x
$1 = 0xffffffd6
The value seems to have been sign extended to 32-bit at some point
getting proper sign extension, but then copied into a uint64_t
which wont' apply sign extension, breaking the expectations of
the assertion.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With the current VAO layout we do not need to make these
fields a bitfield. We get a tight struct layout with this change
for VAO attributes.
v2: Change unsigned char -> GLubyte.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Factor out struct gl_vertex_format from array attributes.
The data type is supposed to describe the type of a vertex
element. At this current stage the data type is only used
with the VAO, but actually is useful in various other places.
Due to the bitfields being used, special care needs to be
taken for the glGet code paths.
v2: Change unsigned char -> GLubyte.
Use struct assignment for struct gl_vertex_format.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Instead of open coding the size computation, use the
already available gl_array_attribute::_ElementSize value.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Instead of open coding the size computation, use the
already available gl_array_attribute::_ElementSize value.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Use GL_UNSIGNED_BYTE as initialization data type
for the edge flag vertex attribute array. The same datatype
is used in the glEdgeFlagPointer function when setting the
array pointer.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
For enabling or disabling VAO arrays it is now possible to
change a set of arrays with a single call without the need to
iterate the attributes.
Make use of this technique in the vao module.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Instead of using gl_array_attributes::Enabled use the
much more compact representation stored in
gl_vertex_array_object::Enabled using the corresponding bits.
Keep the glGet changes in a seperate patch at least for review.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Instead of using gl_array_attributes::Enabled use the
much more compact representation stored in
gl_vertex_array_object::Enabled using the corresponding bits.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Mark the up to now derived bitfield value now as primary
value by removing the underscore.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
radeonsi has 3 driver threads (glthread, gallium, winsys), other drivers
may have 2 (glthread, gallium), so it makes sense to pin them to a random
CCX and keep that irrespective of the app thread.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is used when glthread is disabled.
Mesa pretty much chases the app thread on the CPU.
The performance is the same as pinning the app thread.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This moves nir_shader_clone() to the driver-specific compile function,
rather than the shared src/intel/compiler code. This allows i965 to do
key-specific passes before calling brw_compile_*. Vulkan should not
need this cloning as it doesn't compile multiple variants.
We do need to continue cloning in the compute shader code because we
lower various things in NIR based on the SIMD width.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
We won't have a var to load from, so don't try to the processing
required if we don't need it.
This avoids crashes in:
dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.compute.workgroup_two_buffers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For variable pointers we really don't want to case the pointers to int
without a good reason, just add a wrapper for bcsel loading and result
storing.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Meson test has a concepts of suites, which allow tests to be grouped
together. This allows for a subtest of tests to be run only (say only
the tests for nir). A test can be added to more than one suite, but for
the most part I've only added a test to a single suite, though I've
added a compiler group that includes nir, glsl, and glcpp tests.
To use this you'll need to invoke meson test directly, instead of ninja
test (which always runs all targets). it can be invoked as:
`meson test -C builddir --suite $suitename` (meson test has addition
options that are pretty useful).
Tested-By: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Currently we detect the module and if missing, the glXGetMsc* API is
effectively a stub, always returning false.
This is what effectively has been happening with our meson build :-(
Thus users have no chance of using it - they cannot even distinguish
if the failure is due to a misconfigured build.
There's no reason for keeping xf86vidmode optional - it has been
available in all distributions for years.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes: a47c525f32 "meson: build glx"
Fixes an assertion in SoTTR.
Fixes: dd0172e865 ("radv: Use structured intrinsics instead of indexing workaround for GFX9.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The existing backend code assumed that if VARYING_SLOT_CLIP_DIST0
was written, then VARYING_SLOT_CLIP_DIST1 would be as well. That's
true with the current lowering, but not necessary if there are 4 or
fewer clip distances. Separate out the checks to allow this.
The new NIR-based lowering will trigger this case, which would have
caused backend validation errors (src is null) without this patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
The way nir_lower_clip_vs() works with store_output intrinsics makes a
ton of assumptions about the driver_location field.
In i965 and iris, I'd rather do this lowering early and work with
variables. v3d may want to switch to that as well, and ir3 could too,
but I'm not sure exactly what would need updating. For now, handle
both methods.
Reviewed-by: Eric Anholt <eric@anholt.net>
I posted a load of hacks before to do this, Jason suggested this,
just check the deref mode, not the variable mode and delay getting
the variable until we know the type.
avoids crashes when derefing shared memory pointers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
../src/intel/compiler/brw_fs_nir.cpp:3534:46: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Wsign-compare]
assert(nir_intrinsic_write_mask(instr) ==
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
(1 << instr->num_components) - 1);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This was caused by 6339aba775 which added these completely valid
checks. However clang likes to complain about signedness mismatches.
Fixes: 6339aba775 "intel/compiler: Lower SSBO and shared..."
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
It's not at all intel-specific; the formula is dictated by OpenGL and
Vulkan. The only intel-specific thing is that we need the lowering. As
a nice side-effect, the new version is variable-group-size ready.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Fixes: d971a4230d "loader: Factor out the common driver
opening logic from each loader."
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows to fast clear the depth part (or the stencil part)
of a depth+stencil surface when HTILE is enabled. I didn't test
on GFX8, so it's disabled currently.
This gives a very nice boost, for example when clearing the depth
aspect of a 4096x4096 D32_SFLOAT_S8_UINT image (18x faster).
BEFORE: 235 us
AFTER: 13 us
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This path will be eventually improved later but as it's only
used on SI (or with RADV_DEBUG=noibs), I'm not sure if that
matters much.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The chained submission is the fastest path and it should now
be used more often than before. This removes some EOP events.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
At least GC2000 seems to push some dirt from the PE color cache into
the last bound render target when drawing depth only. Newer cores
seem to behave properly and don't do this, but I have found no way
to fix it on GC2000. Flushes and stalls don't seem to make any
difference.
In order to stop the core from pushing the dirt into a precious real
render target, plug in dummy buffer when rendering without a color
buffer.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
DCC and FMASK also imply a fast-clear eliminate, so it should be
safe to reset the predicate unconditionally. We still only skip
FMASK or CMASK decompressions for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We read 4 values out of sample_locs_8x, so make sure the array is
big enough.
Fixes: ac76aeef20 ("radeonsi: switch back to standard DX sample positions")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
With 5d517a streamout info is only attached to the shader for which the
transform feedback is actually recorded, but the driver set the context info
with each state submitted, thereby always using the info data that was
attached to the vertex shader.
Pass the streamout stride info to the context only from the shader that
actually has outputs. (Thanks to Marek Olšák for pointing me in the right
direction)
Fixes regresion with: dEQP-GLES31.functional.tessellation.invariance.*
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108734
Fixes: 5d517a599b
st/mesa: Don't record garbage streamout information in the non-SSO case.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
FRAMEBUFFER_INCOMPLETE_DIMENSIONS is not supported for GLES 3.0 and later and
not defined for Desktop OpenGL. Instead use FRAMEBUFFER_UNSUPPORTED like it
was done before.
Thanks to Iago Toral and Andrey Simiklit for pointing out the problem and the
details.
Fixes: ebcde34545
i965: be more specific about FBO completeness errors
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The structure qdws is not allocated at this point, nor is the
file descriptor set to it's member. Use the fd directly instead.
Fixes: d1a1c21e76
virgl: native fence fd support
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Emulate MESA_FORMAT_R_SRGB8 by using L8_UNORM_SRGB. This is possible
because component swizzling is handled based on the mesa format and,
hence, the a r001 swizzling can be used to correct the components.
Enables and makes pass (tested on Kabylake)
dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.*
dEQP-GLES31.functional.texture.filtering.cube_array.formats.sr8*
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
The driver was returning GL_FRAMEBUFFER_UNSUPPORTED for all cases of an
incomplete fbo, be a bit more specific about this following the description
of glCheckFramebufferStatus.
This helps to keeps dEQP happy when adding EXT_texture_sRGB_R8 support.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
On nv50, certain operations must happen on regs below 64, due to
encoding requirements. First of all, we add infrastructure to enforce
this. Secondly we change the spill order to first spill RIG nodes that
are unconstrained, followed by ones that are.
This makes the gamecube logo shadertoy compile properly. Curiously, if
we adjust the spill order so that we first spill the constrained RIG
nodes instead, the RA also succeeds. However it seems more logical to
first spill the unconstrained ones.
While we're at it, drop the nv50 max register to reserve r127 as the
zero register of last resort (r63 is preferred).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Karol Herbst <kherbst@redhat.com>
Instead of the size restriction existing in two places, and potentially
being applied twice, we move this together. Ops with 16-bit register
addresses can only take a short reg, and ops with immediates can only
take a short reg.
Of course we leave the immediate 0 in place since we know that it will
be replaced by r63/r127 down the line, so don't treat zeroes as an
immediate.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
We removed the op from the BB, but it was still listed in its sources'
uses. This could trip up some logic down the line which analyzes all the
uses of an l-value, e.g. spilling.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Previously we would print errors on the console like:
libEGL debug: EGL user error 0x3001 (EGL_NOT_INITIALIZED) in eglInitialize
When we had everything we needed for:
libEGL debug: EGL user error 0x3001 (EGL_NOT_INITIALIZED) in eglInitialize: DRI2: failed to find EGLDevice
(for a gbm error in my case)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
I copied the code from egl_dri2.c, but the functionality was equivalent
between all the loaders other than their particular environment variables.
v2: Drop the logging function equivalent to loader_default_logger()
(requested by Eric, Emil). Move the SCons workaround across. Drop
the now-unused driGetDriverExtensions() declaration that was lost in a
rebase.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
I need other types from the header now, and "gl.h is big" is not a good
reason to duplicate definitions.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The only thing you do with a dri driver handle is get the extensions
pointer, so just fold it in to simplify the callers.
v2: Add the declaration of driGetDriverExtensions() that got lost in a
rebase.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
You can tell by "Mesa/configs/default" how old this is. Your build system
really has to provide the DEFAULT_DRIVER_DIR, or other loaders will break.
v2: Move the bad (non-prefix-dependent) define to the SConscript to avoid
breaking it.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
After doing a bunch of benchmarks, primitive binning helps
some games like The Talos Principle (+5%) or Serious Sam 2017
(+3%). For other titles, either it doesn't change anything or
it hurts very few (less than 1%).
This only affects GFX9.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We can only start parsing commands from the head pointer. This was
working fine up to now because we only dealt with a "made up" ring
buffer (generated by aub_write) which always had its head at 0.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Toni Lönnberg <toni.lonnberg@intel.com>
According to the EGL_EXT_image_dma_buf_import spec, creating an EGL
image with a DRM format not supported should yield the BAD_MATCH
error :
"
* If <target> is EGL_LINUX_DMA_BUF_EXT, and the EGL_LINUX_DRM_FOURCC_EXT
attribute is set to a format not supported by the EGL, EGL_BAD_MATCH
is generated.
"
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 20de7f9f22 ("egl/dri2: support for creating images out of dma buffers")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
gbm_bo_create() was presumably meant to originally accept gbm_bo_format
enums, but it's accepted GBM_FORMAT_* tokens since the dawn of time.
This is good, since gbm_bo_format is rarely used and covers a lot less
ground than GBM_FORMAT_*.
Change the documentation to refer to both; this involves removing a 'see
also' for gbm_bo_format, since we can't also use \sa to refer to a
family of anonymous #defines.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit 647c2b90e9. There was
one recently-introduced bug in ac for dvec3 loads, but the other test
failures were actually bugs in the tests. See
9429e621c4
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The handles exported need to be on the KMS device's fd, anything else is
failure. Also, this code is assuming that the scanout resource has been
created already, so assert it.
The DRI3 create_with_modifiers paths don't set tmpl.bind to SCANOUT or
SHARED, with the theory that given that you've got modifiers, that's all
you need. However, we were looking at the tmpl.bind for setting up the
KMS handle in the renderonly case, so we'd end up trying to use vc4's
handle on the hx8357d fd.
Fixes: 84ed8b67c5 ("vc4: Set shareable BOs as T tiled if possible")
Handle all cases in calculation of layers count for isl_view
taking into account texture view and image unit.
st_convert_image was taken as a reference.
When u->Layered is true the whole level is taken with respect to
image view. In other case only one layer is taken.
v3: (Józef Kucia and Ilia Mirkin)
- Rewrote patch by taking st_convert_image as a reference
- Removed now unused get_image_num_layers function
- Changed commit message
v4: (Jason Ekstrand)
- Added assert
Fixes: 5a8c8903
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107856
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We have a bunch of code to do this in the back-end compiler but it's
fairly specific to typed surface messages and the way we emit them.
This breaks it out into NIR were it's easier to do things a bit more
generally. It also means we can easily share the code between the vec4
and FS back-ends if we wish.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This also changes spirv_to_nir and glsl_to_nir to set them. The one
place that doesn't set them is shared memory access lowering in
nir_lower_io. That will have to be updated before any consumers of it
can effectively use these new alignments.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
The new helpers can generate any pack/unpack operation including those
for which we do not have specific opcodes and they express a bitcast in
terms of these pack/unpack operations. In particular, the new helpers
properly handle 8-bit types.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The pattern of adding or multiplying an integer by an immediate is
fairly common especially in deref chain handling. This adds a helper
for it and uses it a few places. The advantage to the helper is that
it automatically handles bit sizes for you.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
This assert won't catch all mistakes with this helper but it will at
least ensure that the top bits are all zero or all one which should help
catch bugs.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Section 3.7 (Identifiers) of the GLSL spec says:
However, as noted in the specification, there are some cases where
previously declared variables can be redeclared to change or add
some property, and predeclared "gl_" names are allowed to be
redeclared in a shader only for these specific purposes. More
generally, it is an error to redeclare a variable, including those
starting "gl_".
This patch should fix piglit tests:
clip-distance-redeclare-without-inout.frag
clip-distance-redeclare-without-inout.vert
However, this causes a regression in
clip-distance-out-values.shader_test. A fix for that test has been sent
to the piglit list for review:
https://patchwork.freedesktop.org/patch/255201/
As far as I understood following mailing thread:
https://lists.freedesktop.org/archives/piglit/2013-October/007935.html
looks like we have accepted to remove an ability to change qualifiers
but have not done it yet. Unless I missed something)
v2 (idr): Move 'earlier->data.mode != var->data.mode' test much earlier
in the function. Add special handling for gl_LastFragData.
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
I screwed up a rebase over a refactor and didn't notice locally because
the uncommitted refactor hid the issue.
Fixes: c973364967 "egl: add missing glvnd entrypoint for EGL_ANDROID_blob_cache"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Both BRW_SFID_SAMPLER and GEN6_SFID_DATAPORT_SAMPLER_CACHE are getting
disassembled as "sampler", which is misleading for assembler tool.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Fixes dEQP-EGL.functional.get_proc_address.extension.egl_android_blob_cache
on builds with glvnd enabled.
Fixes: 6f5b57093b "egl: add support for EGL_ANDROID_blob_cache"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Currently we detect when a breaking commit:
- has landed in stable, and
- is referenced by a untagged fix in master
Yet we did not consider the case of breaking commit:
- prior to the branchpoint, and
- is referenced by a untagged fix in master
Addressing the latter is extremely slow, due to the size of the lookup.
That said, we can trivially use the existing is_sha_nomination() helper
to catch reverts.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Currently we match on:
- any arbitrary length of,
- any a-z A-Z and 0-9 characters
At the same time, a commit sha consists of lowercase hexadecimal
numbers. Any sha shorter than 8 characters is ambiguous - in some cases
even 11+ are required.
So change the pattern to a-f0-9 and adjust the length to 8-40.
As we're here we could use a single grep, instead of the grep/sed combo.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Having a separate script to handle the fixes tag, brings a number of
issues, so let's fold it in get-pick-list.sh.
v2:
- pass the sha as argument to the function
- Keep original sed pattern
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
As the comment in get-typod-pick-list.sh says, there's little point in
having a duplicate file.
Add the new pattern + tag to get-pick-list.sh and nuke this file.
v2:
- pass the sha as argument to the function
- grep -q instead of using a variable (Eric)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
With later commits we'll fold all the different scripts into one.
Add the explicit prefix, so that we know the origin of the nomination
v2:
- pass the sha as argument to the function
- swap $tag = none for an else statment (Juan)
- grep -q instead of using a variable (Eric)
- print the tag and commit oneline separately (Eric)
v3:
- drop unused "tag=none" assignment (Juan)
- typo nomination
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
A while back we agreed that having a live/staging branch is beneficial.
Sadly we forgot to document that, so here is my first attempt.
Document the caveat that the branch history is not stable.
CC: Andres Gomez <agomez@igalia.com>
CC: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
etna_resource_alloc and etna_resource_from_handle currently use different checks.
This leads to
etna_resource_from_handle:492: target=2, format=PIPE_FORMAT_B8G8R8X8_UNORM, 1080x1920x1, array_size=1, last_level=0, nr_samples=0, usage=0, bind=8000a, flags=0
etna_resource_from_handle:541: BO stride 4320 is too small for RS engine width padding (4352, format PIPE_FORMAT_B8G8R8X8_UNORM)
since etna_resource_from_handle wants to be aligned to a 16 byte
boundary while the etna_resource_alloc does not.
Adjust the two checks by using a common function.
Broken by baff59ebf0
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
1. nir/nir_lower_vars_to_ssa.c:691:21: warning:
unused variable ‘var’
nir_variable *var = path->path[0]->var;
v2: Changes for some part of 'may be used uninitialized'
warnings were removed, seems like it is a compiler issue.
( Eric Engestrom <eric.engestrom@intel.com> )
Possible like this one:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46684
This issue is flagged as duplicate but an
original one is not closed yet.
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Bump minor to signal support for new formats and higher precision
solid pictures.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Support Component Alpha for those composite operations that do not require
per-channel alpha blending.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
In the case when we had both source and mask samplers, transformations were
typically not applied correctly.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
The only solid fill picture type we supported only had 8 bit color
channels. Add a new solid picture type that supports float channels.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Remove unused and obsolete code for gradients and component-alpha
Support solid source- and mask pictures using a variable number
of samplers in the composite pipeline rather than the fixed number
we used before.
Tested using rendercheck for XA.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Some hardware supports source mods only for float operations. Make it
possible to skip lowering to source mods in these cases.
v2: use option flags instead of a boolean (Jason Ekstrand)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
this helps reduce the overall code changes when a bit_size parameter is
added to nir_load_system_value
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
this allows to replace some nir_load_system_value calls with the specific
system value constructor
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
This shouldn't make any difference but I feel uneasy to use the
expanded aspects that do not represent the image in its entirety. If
we ever change the implementation of the anv_image_aspect_to_plane()
helper, this is safer.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
To play around with debugging, we might want to disable one or the
other component. Having 0s as default values makes this work.
Otherwise we might have NULL components, leading to crashes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Those empty variables in the !wayland case are useless and running that
meson.build with them breaks the build:
[287/850] Generating wayland-drm-client-protocol.h with a custom command.
FAILED: src/egl/wayland/wayland-drm/wayland-drm-client-protocol.h
client-header ../src/egl/wayland/wayland-drm/wayland-drm.xml src/egl/wayland/wayland-drm/wayland-drm-client-protocol.h
/bin/sh: client-header: command not found
ninja: build stopped: subcommand failed.
Fixes: d1992255bb "meson: Add build Intel "anv" vulkan driver"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
`inc_wayland_drm` is only used if wayland is built, and it's already
added in that case a few lines below.
Fixes: a29869e872 "gbm: Don't traverse backwards for includes"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
These files are close to 4 years out of date; a lot's changed since.
Let's just check in a recently-regenerated version.
Changes generated by running `ninja xmlpool-{pot,update-po,gmo}`.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions.
v4: Added missing engine definition to MI_TOPOLOGY_FILTER.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions.
v4: Added missing engine definition to MI_TOPOLOGY_FILTER.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions.
v4: Added more missing engine definitions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions.
v4: Added missing engine tag for MI_TOPOLOGY_FILTER and MI_LOAD_URB_MEM.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions
v4: Added missing engine to MEDIA_GATEWAY_STATE
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added addition engine definitions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The engine to which the batch was sent to is now set to the decoder context when
decoding the batch. This is needed so that we can distinguish between
instructions as the render and video pipe share some of the instruction opcodes.
v2: The engine is now in the decoder context and the batch decoder uses a local
function for finding the instruction for an engine.
v3: Spec uses engine_mask now instead of engine, replaced engine class enums
with the definitions from UAPI.
v4: Fix up aubinator_viewer (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Removed the gen_engine enum and changed the involved functions to use the
drm_i915_gem_engine_class enum from UAPI instead.
v3: Wrong engine was being used for blocks in video ring
v4: Fixed aubinator_viewer.cpp
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Preliminary work for adding handling of different pipes to gen_decoder. Each
instruction needs to have a definition describing which engine it is meant for.
If left undefined, by default, the instruction is defined for all engines.
v2: Changed to use the engine class definitions from UAPI
v3: Changed I915_ENGINE_CLASS_TO_MASK to use BITSET_BIT, change engine to
engine_mask, added check for incorrect engine and added the possibility to
define an instruction to multiple engines using the "|" as a delimiter in the
engine attribute.
v4: Fixed the memory leak.
v5: Removed an unnecessary ralloc_free().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
On the host VREND_DEBUG=guestallow must be set to let the guest override
the debug flags.
v2: Send flag string instead of flags, this avoids the need to keep
the flags in sync.
v3: Only request host logging if the host actually understands the command
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Transform feedback objects may hold a pointer to a shader program, and
at least in Gallium, this must be a valid pointer until
ctx->Driver.EndTransformFeedback in glEndTransformFeedback has been called
- which is conform with the spec that any program that is part of a
current rendering state should only be flagged for deletion by glDeleteProgram.
This was not handled properly for the transform feedback objects so that
a call sequence
glUseProgram(x)
glBeginTransformFreedback(...)
glPauseTransformFeedback(...)
glDeleteProgram(x)
glEndTransformFeedback(...)
would result in a use after free bug. With this patch the transform
feedback object also updates the reference count to the used program
thereby keeping the program valid as long as the transform feedback
objects links to it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108713
Fixes: 654587696b
mesa: add end_transform_feedback() helper
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
If the local work group size is variable it won't be available
at compile time so we can't lower it in nir_lower_system_values().
Signed-off-by: Plamena Manolova <plamena.n.manolova@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Prior to this patch sizeof(linear_header) was 20 bytes in a
non-debug build on 32-bit platforms. We do some pointer arithmetic to
calculate the next available location with
ptr = (linear_size_chunk *)((char *)&latest[1] + latest->offset);
in linear_alloc_child(). The &latest[1] adds 20 bytes, so an allocation
would only be 4-byte aligned.
On 32-bit SPARC a 'sttw' instruction (which stores a consecutive pair of
4-byte registers to memory) requires an 8-byte aligned address. Such an
instruction is used to store to an 8-byte integer type, like intmax_t
which is used in glcpp's expression_value_t struct.
As a result of the 4-byte alignment returned by linear_alloc_child() we
would generate a SIGBUS (unaligned exception) on SPARC.
According to the GNU libc manual malloc() always returns memory that has
at least an alignment of 8-bytes [1]. I think our allocator should do
the same.
So, simple fix with two parts:
(1) Increase SUBALLOC_ALIGNMENT to 8 unconditionally.
(2) Mark linear_header with an aligned attribute, which will cause
its sizeof to be rounded up to that alignment. (We already do
this for ralloc_header)
With this done, all Mesa's unit tests now pass on SPARC.
[1] https://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html
Fixes: 47e1758692 ("glcpp: use the linear allocator for most objects")
Bug: https://bugs.gentoo.org/636326
Reviewed-by: Eric Anholt <eric@anholt.net>
For example the following type of thing is seen in TCS from
a number of Vulkan and DXVK games:
vec1 32 ssa_557 = deref_var &oPatch (shader_out float)
vec1 32 ssa_558 = intrinsic load_deref (ssa_557) ()
vec1 32 ssa_559 = deref_var &oPatch@42 (shader_out float)
vec1 32 ssa_560 = intrinsic load_deref (ssa_559) ()
vec1 32 ssa_561 = deref_var &oPatch@43 (shader_out float)
vec1 32 ssa_562 = intrinsic load_deref (ssa_561) ()
intrinsic store_deref (ssa_557, ssa_558) (1) /* wrmask=x */
intrinsic store_deref (ssa_559, ssa_560) (1) /* wrmask=x */
intrinsic store_deref (ssa_561, ssa_562) (1) /* wrmask=x */
No shader-db changes on i965 (SKL).
vkpipeline-db results RADV (VEGA):
Totals from affected shaders:
SGPRS: 7832 -> 7728 (-1.33 %)
VGPRS: 6476 -> 6740 (4.08 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 469572 -> 456596 (-2.76 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 989 -> 960 (-2.93 %)
Wait states: 0 -> 0 (0.00 %)
The Max Waves and VGPRS changes here are misleading. What is
happening is a bunch of TCS outputs are being optimised away as
they are now recognised as unused. This results in more varyings
being compacted via nir_compact_varyings() which can result in
more register pressure when they are not packed in an optimal way.
This is an existing problem independent of this patch. I've run
some benchmarks and haven't noticed any performance regressions
in affected games.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously the debug would be:
libEGL debug: No DRI config supports native format 0x20203852
libEGL debug: No DRI config supports native format 0x38385247
but
libEGL debug: No DRI config supports native format R8
libEGL debug: No DRI config supports native format GR88
is a lot easier to understand.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
This requires that the caller make a little (stack) allocation to store
the string.
v2: Use gbm_format_canonicalize (suggested by Daniel)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
There are two problems:
1) the extra underscore in MISSING_64BIT_ATOMICS
2) we should link with libatomic if the previous test decided we needed
it
Fixes: d1992255bb
("meson: Add build Intel "anv" vulkan driver")
Reviewed-and-Tested-by: Matt Turner <mattst88@gmail.com>
regs is only set and used on x86; on other platforms (like ARM), this
code causes a trivial warning, solved by moving the regs declaration to
the architecture-dependent usage.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
meson does this for you with its warn levels, so we don't need to set
it ourselves.
Fixes: d1992255bb
("meson: Add build Intel "anv" vulkan driver")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
We're about to introduce AYUV support which provides its own alpha
channel. So give alpha as a parameter and set it to 1 on exising
formats.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Only needed when the pipeline actually uses tessellation. I don't
think that changes anything, except improving readability.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This can occur during payload setup of SIMD-split send message
instructions, which can lead to the emission of header setup
instructions with a non-zero channel group and fixed SIMD width. Such
instructions could end up using undefined channel enable signals
except they don't care since they're always marked force_writemask_all.
Not known to affect correctness of any workload at this point, but it
would be trivial to back-port to stable if something comes up.
Reported-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Sagar Ghuge <sagar.ghuge@intel.com>
Just break out of the loop instead, it does the same thing.
Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
This fixes various crashes and hangs when using nine's 'thread_submit'
feature.
On 64bit, the thread function's data argument would just be NULL.
On 32bit, the data argument would be garbage depending on the compiler
flags (in my case -march>=core2).
Fixes: f3fa7e3068 ("st/nine: Use WINE thread for threadpool")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
It seems I missed some details when exposing NV_conditional_render
on GLES; this fixes up "make check".
Fixes: 5213be9fab ("mesa: expose NV_conditional_render on GLES")
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-and-Tested-by: Eric Engestrom <eric.engestrom@intel.com>
Helpful for debugging compiler backend problems: this allows us to
easily retrieve the LLVM IR from RenderDoc.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The extension spec has been updated to include GLES 2 support, so let's
enable it there.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
nir_alu_type_get_type_size takes a type as parameter and we were
passing a bit-size instead, which did what we wanted by accident,
since a bit-size of zero matches nir_type_invalid, which has a
size of 0 too.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
SIMD16 instructions need to have additional interferences to prevent
source / destination hazards when the source and destination registers
are off by one register.
While we already have code to handle this, it was only running for SIMD16
dispatches, however, we can have SIDM16 instructions in a SIMD8 dispatch.
An example of this are pull constant loads since commit b56fa830c6,
but there are more cases.
This fixes a number of CTS test failures found in work-in-progress
tests that were hitting this situation for 16-wide pull constants
in a SIMD8 program.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Because we only have one file_max for the (2d) gs input file, the value
actually represents the max of attrib and vertex index (although I'm
not entirely sure if we really want the max, since the max valid value
of the vertex dimension can be easily deduced from the input primitive).
Thus in cases where the number of inputs is higher than the number of
vertices per prim, we did not properly clamp the vertex index, which
would result in out-of-bound fetches, potentially causing segfaults
(the segfaults seemed actually difficult to trigger, but valgrind
certainly wasn't happy). This might have happened even if the shader
did not actually try to fetch bogus vertices, if the fetching happened
in non-active conditional clauses.
To fix simply use the correct max vertex index value (derived from
the input prim type) instead when clamping for this case.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fixes Skqp's unitTest_EGLImageTest test.
For Intel platforms, we support external textures only for EGLImages
created with EGL_EXT_image_dma_buf_import. This restriction seems to
be Intel specific and not present for other platforms.
While running SKQP test - unitTest_EGLImageTest, GL_INVALID is sent
to the test because of this restriction.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105301
Signed-off-by: Aditya Swarup <aditya.swarup@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Use #pragma warning(off) and #pragma warning(on) to disable or enable
all warnings. This is a big hammer. If we ever need a smaller hammer,
we can enhance this functionality.
There is one lame thing about this. Because we parse everything, create
an AST, then convert the AST to GLSL IR, we have to treat the #pragma
like a statment. This means that you can't do something like
' void
' #pragma warning(off)
' __foo
' #pragma warning(on)
' (float param0);
Fixing that would, as far as I can tell, require a huge amount of work.
I did try just handling the #pragma during parsing (like we do for
state for the whole shader.
v2: Fix the #pragma lines in the commit message that git-commit ate.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
As of this commit, all uses of const sources either go through a
nir_src_as_<type> helper which handles bit sizes correctly or else are
accompanied by a nir_src_bit_size() == 32 assertion to assert that we
have the size we think we have.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As of this commit, all uses of const sources either go through a
nir_src_as_<type> helper which handles bit sizes correctly or else are
accompanied by a nir_src_bit_size() == 32 assertion to assert that we
have the size we think we have.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Everywhere we handle SSBO intrinsics, we have exactly the same pattern
for computing the index so we may as well make a helper for it. We also
add a get_nir_src_imm to vec4 and use it for SSBO offsets.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
HTILE is supported on these chips, not sure how I missed that.
This restores using PFP_SYNC_ME when LOAD_CONTEXT_REG is not used.
Fixes: f425d9ee74 ("radv: use LOAD_CONTEXT_REG when loading fast clear values")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This avoids syncing the Micro Engine. This is only supported
for VI+ currently. There is probably a way for using
LOAD_CONTEXT_REG on previous chips but that could be done later.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
GLXCreate{,New}Context, like most X resource creation requests, does not
emit a reply and therefore is emitted into the X stream asynchronously.
However, unlike most resource creation requests, the GLXContext we
return is a handle to library state instead of an XID. So if context
creation fails for any reason - say, the server doesn't support indirect
contexts - then we will fail in strange places for strange reasons.
We could make every GLX entrypoint robust against half-created contexts,
or we could just verify that context creation worked. Reuse the
__glXIsDirect code to do this, as a cheap way of verifying that the
XID is real.
glXCreateContextAttribsARB solves this by using the _checked version of
the xcb command, so effectively this change makes the classic context
creation paths as robust as CreateContextAttribs.
v2: Better use of Bool, check that error != NULL first (Olivier Fourdan)
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
In function 'uint8_t nv50_ir::getTEXSMask(uint8_t)':
warning: control reaches end of non-void function [-Wreturn-type]
Reported-by: Moiman@freenode
Fixes: f821e80213
"gm107/ir: use scalar tex instructions where possible"
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
TEXS, TLD4 and TLD4S are variants of tex instructions which are more
scalar, which gives RA more freedom and is less likely to insert silly
MOVs to satisfy quad registers.
shader-db changes:
total instructions in shared programs : 7687265 -> 7614782 (-0.94%)
total gprs used in shared programs : 803620 -> 798045 (-0.69%)
total shared used in shared programs : 639636 -> 639636 (0.00%)
total local used in shared programs : 24648 -> 24648 (0.00%)
total bytes used in shared programs : 82103400 -> 81330696 (-0.94%)
local shared gpr inst bytes
helped 0 0 3648 10647 10647
hurt 0 0 464 205 205
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The biggest change here is the rename of VK_NVX_ray_tracing to
VK_NV_ray_tracing and the total removal of VK_KHR_mir_surface.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Enables on R600 and makes pass:
dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.*
dEQP-GLES31.functional.texture.filtering.cube_array.formats.sr8*
v2: remove chunk for dri/radeon (Emil)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Allocating through Gralloc implies buffers are going to be used
outside the driver. We have special MOCS settings for external BOs and
we probably want to use them here too.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a1220e7311 ("anv/android: Set the BO flags in bo_cache_import (v2)")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This reduces the amount of #ifdef ANDROID we'll have to have inside
the driver. Potentially offering better coverage of the android
extensions.
v2: Move anv_android.h include before anv_entrypoints.h (Tapani)
Fix autotools android build (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
At higher resolutions with the addition of MSAA, the number of tiles
can increase to the point where we use more than one VSC pipe per
tile. Which would cause us to calculate an out-of-bounds offset for
VSC_SIZE_ADDRESS. So don't try to be clever, just always put it at
a fixed offset assuming the max 32 VSC pipes in use.
Signed-off-by: Rob Clark <robdclark@gmail.com>
After commit a9fb331ea ("wayland/egl: update surface size on window
resize"), the surface size is updated as soon as the resize is done, and
`update_buffers()` would resize only if the surface size differs from
the attached size.
However, in the case of swrast, there is no resize callback and the
attached size is updated in `dri2_wl_swrast_commit_backbuffer()` prior
to the `swrast_update_buffers()` so the attached size is always up to
date when it reaches `swrast_update_buffers()` and the surface is never
resized.
This can be observed with "totem" using the GDK backend on Wayland (the
default) when running on software rendering:
$ LIBGL_ALWAYS_SOFTWARE=true CLUTTER_BACKEND=gdk totem
Resizing the window would leave the EGL surface size unchanged.
To avoid the issue, partially revert the part of commit a9fb331ea for
`swrast_update_buffers()` and resize on the win size and not the
attached size.
Fixes: a9fb331ea - wayland/egl: update surface size on window resize
Signed-off-by: Olivier Fourdan <ofourdan@redhat.com>
CC: Daniel Stone <daniel@fooishbar.org>
CC: Juan A. Suarez Romero <jasuarez@igalia.com>
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
If the user provides an invalid display or device the ToVendor lookup
will fail.
In this case, the local [Mesa vendor] error code will be set. Thus on
sequential eglGetError(), the error will be EGL_SUCCESS.
To be more specific, GLVND remembers the last vendor and calls back
into it's eglGetError, although there's no guarantee to ever have had
one.
v2:
- Add _eglError call, so the debug callback is executed (Kyle)
- Drop XXX comment.
Piglit: tests/egl/spec/egl_ext_device_query
Fixes: ce562f9e3f ("EGL: Implement the libglvnd interface for EGL (v3)")
Cc: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kyle Brenneman <kbrenneman@nvidia.com>
eglQueryDevicesEXT (unlike the other three functions) does not depend
on the display. It is implemented in GLVND, which calls into each
driver collecting the list of devices and presenting it to the user.
For the other entrypoints, GLVND acts as pass through stub calling into
the vendor library. The vendor implementation calls back into GLVND to
get the vendor dispatch. Then the driver proceeds to call itself via
the said dispatch.
This design makes is possible to keep using "old" GLVND with newer
vendor drivers. Since effectively all the extension code is within the
latter itself.
Without said entrypoints, any user will outright crash - as reported in
the bug report.
Note: there's a follow-up fix needed to our GLVND code, to make piglit
happy.
v2: add some beefy documentation in the commit message.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108635
Fixes: 7552fcb7b9 ("egl: add base EGL_EXT_device_base implementation")
Reported-by: kyle.devir@mykolab.com
Cc: kyle.devir@mykolab.com
Acked-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Emil Velikov <emil.velikov@collabora.com>
We can only map at page aligned offsets. We got that wrong with buffer
size where (size % 4096) != 0 (anv has a WA buffer of 1024).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The spec says:
"vkCmdCopyQueryPoolResults is considered to be a transfer
operation, and its writes to buffer memory must be synchronized
using VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT
before using the results."
VK_PIPELINE_STAGE_TRANSFER_BIT will wait for compute to be idle,
while VK_ACCESS_TRANSFER_WRITE_BIT will invalidate both L1 vector
caches and L2. So, it's useless to set those flags internally.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In the non-SSO case, where multiple shader stages are linked together,
we were recording garbage pipe_stream_output_info structures for all
but the last enabled geometry-processing stage.
Specifically, we were using the gl_transform_feedback_info from
shader_program->last_vert_prog (the stage whose outputs will be
recorded)...but were pairing it with the output varying mappings
from the current shader stage. For example, a program with a VS and
GS, the VS's pipe_shader_state would have a pipe_stream_output_info
based on the GS transform feedback info, but the VS output mapping.
This generally worked out okay because only the pipe_stream_output_info
for the last stage really matters - the others can be ignored. However,
we'd like to avoid confusing the pipe driver. In particular, my new
driver translates the stream out information to hardware packets at
bind_{vs,tes,gs}_state() time...and was hitting asserts about garbage
varyings that didn't exist.
This patch changes st/mesa to record a blank pipe_stream_output_info
with num_outputs = 0 for all stages prior to last_vert_prog. The last
one is captured as normal.
(In the fully-SSO case, nothing should change - each program contains
a single shader stage, so last_vert_prog *is* the current shader.)
Tested with llvmpipe (piglit's gpu profile), and freedreno (a3xx,
gpu profile with -t transform.feedback). Fixes several hundred CTS
tests on my new driver.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
There are some cases where the VS is the only stage enabled, it uses the
entire URB, and the URB is large enough that placing later stages after
the VS exceeds the number of bits for "URB Starting Address".
For example, on Icelake GT2, "varying-packing-simple mat2x4 array" from
Piglit is getting a starting offset of 128 for the GS/HS/DS. But the
field is only large enough to hold an offset of 127.
i965 doesn't hit any genxml assertions because it's still using the old
OUT_BATCH mechanism. 128 << GEN7_URB_STARTING_ADDRESS_SHIFT (57) == 0,
with the extra bit falling off the end. So we place the disabled stage
at the beginning of the URB (overlapping with push constants). This is
likely okay since it's a zero size region (0 entries).
It seems like the Vulkan driver might hit this assertion, however, and
the situation seems harmless. To work around this, always place
disabled stages at the start of the URB, so the last enabled stage can
fill the remaining space without overflowing the field.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
libmesa_git_sha1 whole static dependency is added to get git_sha1.h header
and avoid following building error:
external/mesa/src/amd/vulkan/radv_device.c:46:10:
fatal error: 'git_sha1.h' file not found
^
1 error generated.
Fixes: 9d40ec2cf6 ("radv: Add support for VK_KHR_driver_properties.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
In some cases (not building with llvm, which automatically pulls in
pthreads) nine needs to be directly linked with pthreads. Fixes building
on x86 (32 bit) without llvm.
Distro bug: https://bugs.gentoo.org/670094
Fixes: 6b4c7047d5
("meson: build gallium nine state_tracker")
Tested-by: Rafal Lalik <rafallalik@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
WA_1606682166:
Incorrect TDL's SSP address shift in SARB for 16:6 & 18:8 modes.
Disable the Sampler state prefetch functionality in the SARB by
programming 0xB000[30] to '1'. This is to be done at boot time and
the feature must remain disabled permanently.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In the same spirit as commit a5889d70f2
"i965/icl: Disable binding table prefetching". Fixes some 110+
intermittent piglit failures with tex-miplevel-selection variants.
WA_1606682166:
Incorrect TDL's SSP address shift in SARB for 16:6 & 18:8 modes.
Disable the Sampler state prefetch functionality in the SARB by
programming 0xB000[30] to '1'. This is to be done at boot time and
the feature must remain disabled permanently.
Anuj: Set SamplerCount = 0 for vs, gs, hs, ds and wm units as well.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We cannot use nir_build_alu() to create the new alu as it has no
way to know how many components of the src we will use. This
results in it guessing the max number of components from one of
its inputs.
Fixes the following CTS tests:
dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_geom
dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_tessc
dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_vert
Fixes: 2975422ceb ("nir: propagates if condition evaluation down some alu chains")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
To avoid build error in u_debug_stack_android.cpp
due to now missing u_debug.h header:
external/mesa/src/gallium/auxiliary/util/u_debug_stack_android.cpp:26:10:
fatal error: 'u_debug.h' file not found
#include "u_debug.h"
^
1 error generated.
Fixes: 37db383abb ("util: Move u_debug to utils")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The bind flags defined by mesa/gallium might not always be in sync
with the ones copied to virglrenderer/gallium. Therefore, use the
flags defined in virgl like it is done for all the other calls to
create resources.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This only adds support on the Gallium core level, for the drivers
it is likely that additional changes are needed to support the
new texture format and thereby enabling the extension.
Enables on softpipe and makes pass:
dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.*
v2: - add include for getting GL_SR8_EXT
v4: - since the extension is not required don't bother providing
a fallback (Ilia Mirkin)
- split patch (2/2) to separate Gallium and mesa/st parts
(Roland Scheidegger)
- trim commit message to only contain the history of the patch
relevant to this part
v5: - don't include GLES headers (required enum has been added to glheader.h)
(Ilia Mirkin)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This format is needed to support EXT_texture_sRGB_R8. THe patch adds a new
format enum, the format entries in Gallium and and svga, the mapping between
sRGB and linear formats, and tests.
v2: - add mapping to linear format for PIPE_FORMATR_R8_SRGB
v3: - Add texture format to svga format table since otherwise building
mesa will fail when this driver is enabled. It was not tested
whether the extension actually works.
v4: - svga: remove the SVGA specific format definitions and table entries
and only add correct the location of PIPE_FORMAT_R8_SRGB in the
format_conversion_table (Ilia Mirkin)
- Split patch (1/2) to separate Gallium part and mesa/st part.
(Roland Scheidegger)
- Trim the commit message to only contain the relevant parts from the
split.
v5: - svga: correct location of PIPE_FORMAT_SRGB_R8 (Ilia Mirkin)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: - fix format definition line
- disable for desktop GL
- don't add GL_R8_EXT to glext.h since it is already in
GLES2/gl2ext.h in glext.h and include this header where needed
(all Emil)
v3: - swrast: Fill the function table for sRGB_R8
The size of the function table is checked at compile time and must
correspond to the number of mesa texture formats.
dri/swrast being gles-2.0 doesn't support the extension though
v4: - correct format layout comment (Ilia Mirkin)
- correct logic for accepting GL_RED only textures (in part Ilia Mirkin)
EXT_texture_sRGB_R8 requires OpenGL ES 3.0 which includes
ARB_texture_rg/EXT_texture_rg, so one only must check for the first
when SR8_EXT is really requested.
v5: - add define for GL_ES8_XT to glheader.h and don't include GLES
headers (Ilia Mirkin)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The GLSL 4.6 specification (section 4.1.14. "Implicit Conversions")
says:
"There are no implicit array or structure conversions. For
example, an array of int cannot be implicitly converted to an
array of float."
So let's add a check in place when assigning array initializers to
implicitly sized arrays, to avoid incorrectly allowing code on the
form:
int[] foo = float[](1.0, 2.0, 3.0)
This fixes the following dEQP test-cases:
- dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.int_to_float_vertex
- dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.int_to_float_fragment
- dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.int_to_uint_vertex
- dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.int_to_uint_fragment
- dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.uint_to_float_vertex
- dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.uint_to_float_fragment
- dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.int_to_float_vertex
- dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.int_to_float_fragment
- dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.int_to_uint_vertex
- dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.int_to_uint_fragment
- dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.uint_to_float_vertex
- dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.uint_to_float_fragment
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
EXT_shader_implicit_conversions adds support for implicit conversions
for GLES 3.1 and above.
This is essentially a subset of ARB_gpu_shader5, and augments
OES_gpu_shader5.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
In GLES, we currently either need an exact match with a local function,
or an exact match with a builtin.
However, if we add support for implicit conversions for GLES shaders,
we also need to fall back to a non-exact match in the case where there
were no builtin match either.
Luckily, we already have a variable ready with this, so let's just
return it if the builtin-search failed.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This makes the code a bit easier to read, as well as reduces repetition,
especially when we add support for EXT_shader_implicit_conversions.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This makes the code a bit easier to read, as well as will reduce
repetition when we add support for EXT_shader_implicit_conversions.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
If the user gives 0 counterBuffers then the driver should still
enable transform feedback on all targets. This changes the
driver to always enable xfb, and use counter buffers where
one is defined for the target in question.
Fixes: b4eb029062 (radv: implement VK_EXT_transform_feedback)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
In order to handle pause/resume properly, the offset should
be added to the buffer binding not to the begin/end paths.
v2: don't add offset to size
Fixes ext_transform_feedback-alignment* under zink
Fixes: b4eb029062 (radv: implement VK_EXT_transform_feedback)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Since 0c1dd9dee0 ("broadcom/vc4: Allow importing linear BOs with
arbitrary offset/stride."), we have the vc4-side BO properly laid out
(assuming it's linear) in the winsys BO so that we can skip this extra
copy.
The recompile reduction is nice, but this also makes it so that a straight
texture copy could get optimized some day to not unpack/repack the f16
values.
If the caller has passed in a stride for (linear) BO import, we should use
that stride when rendering to the BO (or, if we some day support texturing
from linear-imported BOs, when doing the linear-to-UIF shadow copy). This
lets us remove the extra stride-changing relayout in the simulator.
This came from vc4, where we had a file format for GPU hangs. I don't
have one of those for V3D, and I probably won't ever have the simulator
side produce dumps even if I do.
Some environment (like Travis apparently) set LC_* vars, messing up the
sort ordering, so let's use envvar with the highest priority to make
sure this is actually sorted in ASCII order.
Suggested-by: Michel Dänzer <michel@daenzer.net>
Fixes: b42dc50a5f "egl: fix entrypoint sorting test"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Some memory and file descriptors are not freed/closed.
v2: fixed case where we skipped the 'aub' variable initialization
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Ported from RadeonSI. It's always TRUE for CIK+ because RADV
doesn't support 16 samples.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Some of these functions were distributed across different
implementation and header files. Put them at a central place.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The array type draw is no longer directly dependent on the vbo module.
Thus move array type draws into mesa/main/draw.c.
Rename symbols starting with vbo_* to _mesa_* and apply some
reindenting to make it consistent.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
These calls are just the same in each if branch. So pull that
before the if.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
With this change we preserve the no_current_update property when we
observe a glPrimitiveRestart call. That means that we now also get the
no_current_update optimization for display lists that are made
out of indexed draws using primitive restart.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Instead of coding additional information into the primitive
mode, make the only remaining flag there a direct argument to
vbo_save_NotifyBegin.
v2: Fix incorrect no_current_update in glRectf.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The _mesa_prim::no_current_update flag should tell the compiled
display list if the current attributes that are placed in the dlists
vbo shall take a defined state past replay of a display list.
Immediate mode draws compiled into display lists should set the
current values. Array draws may leave the current values in
undefined state.
So finally this flag is not a property of every primitive
but it is a property of the compiled display list and there it
is a property of the last primitive compiled into the list.
So move the flag out of _mesa_prim into vbo_save.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The previous patch left a constant if (0) in the code.
Clean that up now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
When setting the _mesa_prim::mode field we always filter out
all non OpenGL primitive mode bits. So this tested bit cannot be
there anymore and the test evaluates to zero.
The zero is removed with the next patch to ease review.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Now looking at the implementation of vbo_save_NotifyBegin.
The VBO_SAVE_PRIM_WEAK flag, delivered in the primitive mode
argument to vbo_save_NotifyBegin, is not evaluated anymore.
The two users of the mode argument are the primitive mode
itself, where the VBO_SAVE_PRIM_WEAK bit is masked out to
retrieve the underlying OpenGL primitive mode. The other
user is to check for the VBO_SAVE_PRIM_NO_CURRENT_UPDATE bit
which is different from VBO_SAVE_PRIM_WEAK.
So, since vbo_save_NotifyBegin does not care about
VBO_SAVE_PRIM_WEAK, we can savely remove it from the call
arguments of vbo_save_NotifyBegin.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The only reader of the weak field in _mesa_prim is pretty
console printing. By that, remove the weak field from _mesa_prim.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
On finishing a display list playback the VBO_SAVE_FALLBACK bit
is still kept in vbo_save_context::replay_flags. But examining
replay_flags and the display list flags that feed this value
the corresponding bit is never set these days anymore.
So, since it is nowhere set or checked, we can safely remove it.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
One cannot have haiku and dri2 - surfaceless,x11,etc.
Group things up, which will make the addition of platform_device a bit
easier.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Now that we support the extensions, fully, enabled them.
The specs mandate that we always have at least one device and each dpy
has a device associated with it.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
This is the final requirement from the base EGLDevice spec.
v2:
- split from another patch
- move wayland hunk after we have the fd
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Add implementation based around the drmDevice API. As such it's only
available only when building with libdrm. With the latter already a
requirement when using !SW code paths in the platform code.
Note: the current code will work if a device is hot-plugged. Yet
hot-unplugged is not implemented, since I have no ways of testing it.
v2:
- ddd some _eglDeviceSupports checks
- require DRM_NODE_RENDER
- add _eglGetDRMDeviceRenderNode helper
v3:
- flip inverted asserts (Mathias)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Add a plain software device, which is always available.
We can safely assign it as the first/initial device in _eglGlobals,
although we ensure that's the case with a handful of _eglDeviceSupports
checks throughout the code.
v2:
- s/_eglFindDevice/_eglAddDevice/ (Eric)
- s/_eglLookupAllDevices/_eglRefreshDeviceList/ (Eric)
- move ^^ helpers into a earlier patch (Eric, Mathias)
- set the SW device on _eglGlobal init. (Eric)
- add a number of _eglDeviceSupports checks (Mathias)
- split Device/Display attach to a separate patch
v3:
- flip inverted asserts (Mathias)
- s/on-stack/static/ (Mathias)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The device extension string is expected to contain the name of the
extension defining what kind of device it is, so the caller can know
what kinds of operations it can perform with it. So that string had
better be non-empty, hence this trivial extension.
v2:
- drop "fallback", update history and update contributor list
Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Introduce the API for device query and enumeration. Those at the moment
produce nothing useful since zero devices are actually available.
That contradicts with the spec, so the extension isn't advertised just
yet.
With later commits we'll add support for software (always) and hardware
devices. Each one exposing the respective extension string.
v2:
- fold API boilerplate into this patch
- move _eglAddDevice, _eglDeviceSupports, _eglRefreshDeviceList to this
patch (Eric, Mathias)
- make _eglFiniDevice the one called last
v3:
- comment on the dummy _egl_device_extension enum entry (Eric)
- annotate dev as MAYBE_UNUSED (Mathias)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Write down both X and GLX visual types when mapping from one to the
other. Makes grepping through the code a tiny bit easier.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Earlier commit flipped the default to python2 but forgot to update the
travis file. Props to pip caching things "worked" for a little while.
Fixes: f22ad5ef18 ("travis: use python3 for the autoconf builds")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
This is incorrect, the offset is into the buffer, and it's legal
to write
loc 0,0 -> buffer0, offset 0
loc 0,1 -> buffer1, offset 0
This fixes a bunch of piglits running on my zink xfb code on
radv.
Fixes: 6c21645046 (radv: emit stream outputs for vertex and tessellation stages)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Instead of using a while loop with indexing. This is much cleaner. This
requires some other small changes.
Acked-by: Emil Velikov <emil.velikov@collabora.com>
This is a very common python anti-pattern. Not using length allows us to
go through faster C paths, but has the same meaning.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Using shell redirection to write to a file is more complicated than
necessary, and has the potential to run into unicode encoding problems.
It's also less code.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108530
v2: - update commit message to say less about LANG=C
- use flags instead of positional arguments for the script (Emil)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
gen_xmlpool uses a style unlike the rest of mesa, spaces between
function/method calls and the parens, strange whitespace to force lining
up method calls, and some other whitespace stuff. Since I'm going to be
doing some work in the file, I'm going to start cleaning those up.
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Meson won't put the .gmo files in the layout that python's
gettext.translation() expects, it puts them in the build directory in a
flat layout. This modifies android and autotools to do the same (scons
doesn't work with translations at all)
v3: - Squash 4 patches into this patch
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Meson has handy a handy built-in module for handling gettext called
i18n, this module works a bit differently than our autotools build does,
namely it doesn't automatically generate translations instead it creates
3 new top level targets to run. These are:
xmlpool-pot
xmlpool-update-po
xmlpool-gmo
v2: - Add new files to autotools dist tarball
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This is a little cleaner than just looking at sys.argv, but it's also
going to allow us to handle the differences in the way meson and
autotools handle translations more cleanly.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
We need to update the cursor before we check if the alu use is
dominated by the if condition. Previously we were checking if
the current location of the alu instruction was dominated by
the if condition which would miss some optimisation opportunities.
Fixes: a3b4cb3458 ("nir/opt_if: Rework condition propagation")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This patch fixes this freedreno autotools build error.
CXXLD ir3_compiler
/usr/lib/valgrind/libcoregrind-amd64-linux.a(libcoregrind_amd64_linux_a-m_main.o): In function `_start':
(.text+0x0): multiple definition of `_start'
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o:(.text+0x0): first defined here
/usr/bin/ld: /usr/lib/valgrind/libcoregrind-amd64-linux.a(libcoregrind_amd64_linux_a-m_main.o): relocation R_X86_64_32S against undefined symbol `vgPlain_interim_stack' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: /usr/lib/valgrind/libcoregrind-amd64-linux.a(libcoregrind_amd64_linux_a-m_trampoline.o): relocation R_X86_64_32 against `.text' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: /usr/lib/valgrind/libcoregrind-amd64-linux.a(libcoregrind_amd64_linux_a-dispatch-amd64-linux.o): relocation R_X86_64_32S against symbol `vgPlain_stats__n_xindirs_32' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
Fixes: f3cc0d2747 ("freedreno: import libdrm_freedreno + redesign submit")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108595
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Pretty much all of the scripts are python2+3 compatible.
Check and allow using python3, while adjusting the PYTHON2 refs.
Note:
- python3.4 is used as it's the earliest supported version
- python2 chosen prior to python3
v2: use python2 by default
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
gtest is an external project that is copied in this tree for technical
reasons, but isn't maintained by us, so its warnings are irrelevant.
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
This is an external project we have no control over, and will not be
fixing (other than by sometimes pulling the latest sources), so warnings
serve no purpose here.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Since commit "intel/compiler: Stop assuming the entrypoint is called
"main"" there is no need to force the entrypoint name to be "main".
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Patch does a 'dry run' of assign_attribute_or_color_locations before
optimizations to catch cases where we have aliasing of unused attributes
which is forbidden by the GLSL ES 3.x specifications.
We need to run this pass before unused attributes may be removed and with
attribute binding information from program, therefore we re-use existing
pass in linker rather than attempt to write another one.
This fixes WebGL2 test 'gl-bindAttribLocation-aliasing-inactive' and
Piglit test 'gles-3.0-attribute-aliasing'.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106833
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is a revert of Marek's 2cb9ab53dd revert.
It was needed to revert the previous commit, and didn't have any issue
itself.
--
The "DRI2" name was reported as confusing when printing EGL infos (one
user reported thinking DRI3 was not working on his X server), and the
only alternative is Haiku, which can only be used on a Haiku machine.
The name therefore doesn't add any information that the user wouldn't
know already, so let's just drop it.
Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Related-to: b174a1ae72 ("egl: Simplify the "driver" interface")
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This is a revert of Marek's 84f3afc2e1 revert, with a missing
line added back. I failed a rebase and dropped that crucial line, and
didn't do a runtime test after my rebase, and as a result broke EGL for
everyone.
This commit has been tested by Intel's CI and I re-read it once more, so
it should be good this time.
--
Note: dropping the EGL_BAD_ALLOC in egl_haiku because it's
overwritten by the EGL_NOT_INITIALIZED in eglInitialize().
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This reverts commit 773d6ea6e7.
Since kernel 4.17 (drm/etnaviv: remove the need for a gpu-subsystem DT
node) the etnaviv DRM driver doesn't have an associated DT node
anymore. This is technically correct, as the etnaviv device is a
virtual device driving multiple hardware devices.
Before 4.17 the userspace had access to the following information:
DRIVER=etnaviv
OF_NAME=gpu-subsystem
OF_FULLNAME=/gpu-subsystem
OF_COMPATIBLE_0=fsl,imx-gpu-subsystem
OF_COMPATIBLE_N=1
MODALIAS=of:Ngpu-subsystemT<NULL>Cfsl,imx-gpu-subsystem
DRIVER=imx-drm
OF_NAME=display-subsystem
OF_FULLNAME=/display-subsystem
OF_COMPATIBLE_0=fsl,imx-display-subsystem
OF_COMPATIBLE_N=1
Afer 4.17:
DRIVER=etnaviv
MODALIAS=platform:etnaviv
The OF node has never been part of the etnaviv UABI, simply due to the
fact that it's still possible to instantiate the etnaviv driver from a
platform file, instead of a devicetree node.
A patch set to fix this problem was send out [1] but it looks like
that a proper solution needs more time to bake.
[1] https://lists.freedesktop.org/archives/dri-devel/2018-October/194651.html
Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
I don't think we want to wait for something that hasn't been
correctly submitted. This is similar to the fallback path.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This isn't true for Vulkan so we have to whack it to "main" in anv which
is silly. Instead of walking the list of functions and asserting that
everything is named "main" and hoping there's only one function named
"main", just use the nir_shader_get_entrypoint() helper which has better
assertions anyway.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
ffs() just returns the bit that is set, we need to know what
stage that bit represents so use u_bit_scan() instead.
Fixes: 2ca5d9548f ("st/glsl_to_nir: gather next_stage in shader_info")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This isn't used in mesa, maybe vmware uses this in a closed source state
tracker?
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In order to pull u_debug into src/util we need to break the generically
useful bits from the bits that are tightly coupled to gallium.
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
amdgpu doesn't use the INPUT but the AVERAGE subfeature:
$ sensors -u
amdgpu-pci-0100
Adapter: PCI adapter
power1:
power1_average: 17.233
power1_cap: 180.000
Signed-off-by: Andre Heider <a.heider@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Dual source blending behaviour is undefined when shader doesn't
have second color output.
"If SRC1 is included in a src/dst blend factor and
a DualSource RT Write message is not used, results
are UNDEFINED. (This reflects the same restriction in DX APIs,
where undefined results are produced if “o1” is not written
by a PS – there are no default values defined)."
Dismissing fragment in such situation leads to a hang on gen8+
if depth test in enabled.
Since blending cannot be gracefully fixed in such case and the result
is undefined - blending is simply disabled.
v2 (Jason Ekstrand):
- Apply the workaround to each individual entry
- Emit a warning through debug_report
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Dual source blending behaviour is undefined when shader doesn't
have second color output, dismissing fragment in such situation
leads to a hang on gen8+ if depth test in enabled.
Since blending cannot be gracefully fixed in such case and the result
is undefined - blending is simply disabled.
v2 (Kenneth Graunke):
- Listen to BRW_NEW_FS_PROG_DATA in 3DSTATE_PS_BLEND
- Also whack BLEND_STATE[] to keep the two in sync, since we're not
sure exactly which copy of the redundant info the hardware will use.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107088
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Apparently, we're supposed to look at the texture object's built-in
sampler object's sRGB decode setting in order to decide whether to
decode/downsample/re-encode, or simply downsample as-is. Previously,
I had always done the decoding/encoding.
Fixes SKQP's Skia_Unit_Tests.SRGBMipMaps test.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
If we restore the 'new batch' using 'intel_batchbuffer_reset_to_saved'
function we must restore the default state of the batch using
'brw_new_batch' function because the 'intel_batchbuffer_flush'
function will not do it for the 'new batch' again.
At least the following fields of the batch
'state_base_address_emitted','aperture_space', 'state_used'
should be restored to default values to avoid:
1. the aperture_space overflow
2. the missed STATE_BASE_ADDRESS commad in the batch
3. the memory overconsumption of the 'statebuffer'
due to uncleared 'state_used' field.
etc.
v2: merge with new commits, changes was minimized, added the 'fixes' tag
v3: added in to patch series
Fixes: 3faf56ffbd "intel: Add an interface for saving/restoring
the batchbuffer state."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107626
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was required back when MSVC didn't support C99 and was missing this
header, but since MSVC 2013 (or maybe earlier?) this isn't it does and
this code isn't doing anything anymore.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We were doing this late after nir_lower_io, but we can just reuse the core
code. By doing it at this stage, we won't even set up the VS attributes
as inputs, reducing our VPM size.
We always emit 4 slots per slot because things like color output and
position processing in the epilogue will potentially look up more values
than the variable declaration had. However, when we get a .location_frac
!= 0, we don't want to overwrite components of the following
.driver_location.
For supporting scalar VPM i/o at the NIR level, we need to do a pass over
the vars to figure out how big each attribute is after DCE. Once we've
done that, we can just walk over c->vattr_sizes[] instead of bothering
with vars.
This will be used on V3D to cut down the size of the VS inputs in the VPM
(memory area for sharing data between shader stages).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
On older kernels, the drmGetDevice() call will wake up all the GPUs
on the system, while fetching the PCI revision.
Use the 2 version of the API and pass flags == 0, so we don't fetch the
device PCI revision, since we don't need that information.
Fixes: baa38c144f ("vulkan/wsi: Use VK_EXT_pci_bus_info for DRM fd matching")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously, we would create temporary variables and fill them out.
Instead, we create as many function parameters as we need and pass them
through as SSA defs.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
x86 32 bit generic target does not enable ARCH_X86_HAVE_SSE4_1
for this reason all Android library modules using SSE4_1 in mesa
are built conditionally to ARCH_X86_HAVE_SSE4_1
The same approach is now applied to libmesa_intel_tiled_memcpy_sse41
in order to avoid the following building errors:
external/mesa/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c:574:15:
error: initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int'
__m128i val = _mm_stream_load_si128((__m128i *)src);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
external/mesa/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c:578:15:
error: initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int'
__m128i val0 = _mm_stream_load_si128(((__m128i *)src) + 0);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
external/mesa/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c:579:15:
error: initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int'
__m128i val1 = _mm_stream_load_si128(((__m128i *)src) + 1);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
external/mesa/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c:580:15:
error: initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int'
__m128i val2 = _mm_stream_load_si128(((__m128i *)src) + 2);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
external/mesa/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c:581:15: error: initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int'
__m128i val3 = _mm_stream_load_si128(((__m128i *)src) + 3);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5 errors generated.
Fixes: 11b1afdc92 ("i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Preliminary work for adding handling of different pipes to gen_decoder. We
need to be able to distinguish between different pipes in order to decode
the packets correctly due to opcode re-use.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Use the 'DWord Length' and 'bias' fields from the instruction definition to
parse the packet length from the command stream when possible. The hardcoded
mechanism is used whenever an instruction doesn't have this field.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This function was there when the file was introduced in commit
38f10d5a03 "intel: tools: add aubinator viewer", but was
never actually used.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Meson already automatically tracks included headers, so there's no need
to add them everywhere; cleans up the code a bit.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
This implementation should work and potential bugs can be
fixed during the release candidates window anyway.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
For all streams. We basically just need to update the
base address and compute a stride for every stream.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Same as the previous patch, except that is only the number of
components.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
For multiple streams support we have to set the different ring
buffer sizes correctly. This relies on the number of output
components per stream.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This will be used for splitting the GS->VS ring buffer. The
stream ID is always 0 for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is different from the GL_ARB_spirv pass because it generates a much
simpler data structure that isn't tied to OpenGL and mtypes.h.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This doesn't include any new features but it does include an XML and
header typo fix for modifiers.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
In the 'inorder' case (ie. FD_MESA_DEBUG=inorder, or old kernel), if the
u_blitter clear path is used (a3xx, a4xx, and some fallback cases on
newer gens), util_blitter_restore_fb_state() will set_framebuffer_state()
to something that is identical to the current fb state, which triggers
an unnecessary flush, and then eventually an assert:
(gdb) bt
#0 0x0000007fbf24a078 in kill () from /lib64/libc.so.6
#1 0x0000007fbe061278 in _debug_assert_fail (expr=0x7fbe93a820 "!batch->flushed", file=0x7fbe93a628 "../src/gallium/drivers/freedreno/freedreno_batch.c", line=491, function=0x7fbe93a990 <__func__.17380> "fd_batch_check_size") at ../src/gallium/auxiliary/util/u_debug.c:322
#2 0x0000007fbe1ccb8c in fd_batch_check_size (batch=0x55556d5a70) at ../src/gallium/drivers/freedreno/freedreno_batch.c:491
#3 0x0000007fbe1d0e08 in fd_clear (pctx=0x55555c61e0, buffers=5, color=0x55556e388c, depth=1, stencil=0) at ../src/gallium/drivers/freedreno/freedreno_draw.c:463
#4 0x0000007fbe57afa4 in st_Clear (ctx=0x55556e17b0, mask=18) at ../src/mesa/state_tracker/st_cb_clear.c:452
The assert was introduced in 4b847b38ae, so from a functionality
standpoint this patch fixes that commit. But it should also avoid an
unnecessary flush in the 'inorder' case, fixing a performance bug.
Fixes: 4b847b38ae freedreno: make fd_batch a one-shot thing
Signed-off-by: Rob Clark <robdclark@gmail.com>
ZSA state can change whether depth or stencil is enabled
This plus previous patch fix stk, and various things w/
FD_MESA_DEBUG=inorder
Fixes: ec717fc629 freedreno: reduce resource dependency tracking overhead
Signed-off-by: Rob Clark <robdclark@gmail.com>
The problem isn't directly with ec717fc629 but rather that commit
exposes the problem. When we switch batch we cannot assume previous
state is clean so we should mark all state dirty.
Fixes: ec717fc629 freedreno: reduce resource dependency tracking overhead
Signed-off-by: Rob Clark <robdclark@gmail.com>
We were previously using relative timeouts and decrementing the
user-provided timeout as we waited. Instead, this commit refactors
things to use absolute timeouts throughout. This should fix a subtle
bug in the waitAll case where we aren't decrementing the timeout after a
successful GPU wait. Since pthread_cond_timedwait already takes an
absolute timeout, it's also significantly simpler.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It probably doesn't actually break anything but it does cause some
assertions in debug builds.
Fixes: 7a89a0d9ed "anv: Use separate MOCS settings for external BOs"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Now that it is just called once per draw (instead of once for binning
and once for draw), let's just inline it. If nothing else, it makes
perf-annotate easier to look at.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Historically this wasn't in fdN_emit_state(), because prior to addition
of blitter in a5xx, fdN_emit_state() was also used in the clear path.
These days that is only true for a2xx (a3xx and a4xx use u_blitter). So
the reason for it not to be in fd6_emit_state() no longer exists.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Noticed that with webgl (in chromium, at least) we end up generating a
lot of no-op submits just to get a fence. Tracking the last fence and
returning that if there is no rendering since last flush avoids this.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The scissor maxx/maxy are non-inclusive, so don't subtract one from
framebuffer width and height.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
We get a warning here for assigning a const char * pointer to
char *swizzle in struct ir2_src_register. The constructor strdups a 4
byte string here, so just memcpy to that instead.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
In the pursuit of lowering driver overhead, it became clear that some
amount of redesign of how libdrm_freedreno constructs the submit ioctl
would be needed. In particular, as the gallium driver is starting to
make heavier use of CP_SET_DRAW_STATE state groups/objects, the over-
head of tracking cmd buffers and relocs becomes too much. And for
"streaming" state, which isn't ever reused (like uniform uploads) the
overhead of allocating/freeing ringbuffer[1] objects is too high.
This redesign makes two main changes:
1) Introduces a fd_submit object for tracking bos and cmds table
for the submit ioctl, making ringbuffer objects more light-
weight. This was previously done in the ringbuffer. But we
have many ringbuffer instances involved in a submit (gmem +
draw + potentially 1000's of state-group rbs), and only need
a single bos and cmds table. (Reloc table is still per-rb)
The submit is also a convenient place for a slab allocator for
ringbuffer objects. Other options would have required locking
because, while we can guarantee allocations will only happen on
a single thread, free's could happen either on the application
thread or the flush_queue thread. With the slab allocator in
the submit object, any frees that happen on the flush_queue
thread happen after we know that the application thread is done
with the submit.
2) Introduce a new "softpin" msm_ringbuffer_sp implementation that
does not use relocs and only has cmds table entries for IB1 (ie.
the cmdstream buffers that kernel needs to CP_INDIRECT_BUFFER
to from the RB). To do this properly will require some updates
on the kernel side, so whether you get the softpin or legacy
submit/ringbuffer implementation at runtime depends on your
kernel version.
To make all these changes in libdrm would basically require adding a
libdrm_freedreno2, so this is a good point to just pull the libdrm code
into mesa. Plus it allows for using mesa's hashtable, slab allocator,
etc. And it lets us have asserts enabled for debug mesa buids but
omitted for release builds. And it makes life easier if further API
changes become necessary.
At this point I haven't tried to pull in the kgsl backend. Although
I left the level of vfunc indirection which would make it possible
to have other backends. (And this was convenient to keep to allow
for the "softpin" ringbuffer to coexist.)
NOTE: if bisecting a build error takes you here, try a clean build.
There are a bunch of ways things can go wrong if you still have
libdrm_freedreno cflags.
[1] "ringbuffer" is probably a bad name, the only level of cmdstream
buffer that is actually a ring is RB managed by kernel. User-
space cmdstream is all IB1/IB2 and state-groups.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
This reverts commit 0fa9e6d7b3. The real
issue appears to have been that HiZ ops don't like having WM thread
dispatch force-enabled. The previous commit fixes that problem so we
can go back to using the ForceThreadDispatchEnable bit even on SKL+.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Usually when a window is resized, the app calls d3d to resize the back
buffer to the window size. In some cases, it is not done,
and it expects the output resizes to the window size, even if
the back buffer size is unchanged.
This patch introduces the behaviour when a presentation buffer
is used.
ID3DPresent_GetWindowInfo is a function available with
D3DPresent v1.0, and thus we don't need to check if the
function is available.
The function had been introduced to implement this very
feature.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Windows drivers don't set this flag (which affects ff) to more than 8.
Do the same in case some games check for 8.
v2: Remove any dependence on MaxSimultaneousTextures. For non-ff
the number of textures is 16 when the device is able of vs/ps3.
Add this requirement of 16 textures to the driver requirements.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
A lot of these states are used only for the context,
and are unused for stateblocks (which just uses the
changed.* fields instead for a lot of them).
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
If NINE_STATE_FF_MATERIAL is set, the stateblock will upload
its recorded materials matrix.
If NINE_STATE_FF_LIGHTING is set, the lighting set is uploaded.
These flags could be set by a NineDevice9_SetTransform call
or by setting some states related to ff, but that shouldn't trigger
these stateblock behaviours.
We don't need to follow the context states dirtied by render states.
NINE_STATE_FF_VSTRANSF is exactly the state controlling stateblock
updates of transformation matrices, NINE_STATE_FF is too broad.
These two changes avoid setting the two mentionned states when we
shouldn't.
Fixes: https://github.com/iXit/Mesa-3D/issues/320
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
The device state changed.* field are never used.
These fields are used only for stateblocks.
Avoid setting them at all for clarity.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
We avoid allocating space for never unused matrices.
However we must do as if we had captured them.
Thus when a D3DSBT_ALL stateblock apply has fewer matrices
than device state, allocate the default matrices for the stateblock
before applying.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
D3DSBT_ALL stateblocks capture the transform matrices.
Fixes some d3d test programs not displaying properly.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
While to the application we have to track
accurately all 256 world matrices (including
in stateblocks), hw vertex processing enables
to set a limit to the number of world matrices
the hardware can access to in the advertised caps,
which is 8 for nine.
Thus don't bother in the stateblock code to send
the updated values for the unreachable matrices.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
At some point the project was to adapt the
commented version to csmt.
The csmt rework enabled to fix some state aliasing
issues between stateblocks and internal state updates.
The commented version needs a lot of work to work with that.
Just drop it.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
This reverts commit a5fd54f8bf.
The whole point was to add a way to pass -DVMX86_STATS to the build,
but we can do that with a command line argument when we invoke scons.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Follow the restriction of making sure the clear value is between the min
and max values defined in CC_VIEWPORT. Avoids a simulator warning for
some piglit tests, one of them being:
./bin/depthstencil-render-miplevels 146 d=z32f_s8
Jason found this to fix incorrect clearing on SKL.
Fixes: 09948151ab
("intel/blorp: Add the BDW+ optimized HZ_OP sequence to BLORP")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Jason Ekstrand <jason@jlekstrand.net>
MESA_GIT_SHA1 resolves to either an empty "" string if not build from git,
or " (git-DEADBEEF)" if it is. No need to wrap it in additional "()".
Fixes: 9d40ec2cf6 "radv: Add support for VK_KHR_driver_properties."
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Use utility function for converting h264 pipe video profile to profile idc,
instead of using array.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig at amd.com>
Use utility function for converting h264 pipe video profile to profile idc,
instead of using array.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig at amd.com>
Adding a function for converting h264 pipe video profile to profile idc
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig at amd.com>
Previously, we would always pull the bit size from the destination which
is wrong for opcodes like nir_ilt where the sources are variable-sized
but the destination is a fixed size. We were getting lucky before
because nir_op_ilt returns a 32-bit value and basically everyone who
uses spec constants uses 32-bit ones.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We have a helper that does exactly what the bany_inequal was doing. It
emits the same code but is a bit higher level and is designed to operate
on a bvec4.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
They do the same thing in the end but i2b is a bit simpler. Also, let's
clean up the mess of code for SSBO handling with one line of builder.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This isn't a great solution for bit-sizes but we don't have a
particularly convenient way to get a bit size from the system value enum
and this keeps the lowering pass from changing it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead of doing our own constant folding, we just emit instructions and
let constant folding happen. This is substantially simpler and lets us
use the nir_imm_bool helper instead of dealing with the const_value's
ourselves.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This requires that we rework the interface a bit to use nir_builder but
that's a nice little modernization anyway.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When depth testing is disabled, we shouldn't pay attention to the
specified depthCompareOp, and just treat it as always passing. Before,
if the depth test is disabled, but depthCompareOp is VK_COMPARE_OP_NEVER
(e.g. from the app having zero-initialized the structure), then
sanitize_stencil_face() would have incorrectly changed passOp to
VK_STENCIL_OP_KEEP.
v2: Roll the depthTestEnable check into the ds_aspect check below since
they now both do the same thing.
Fixes: 028e1137e6 "anv/pipeline: Be smarter about depth/stencil state"
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
While disassembling send(c) instruction print message descriptor as
immediate source operand along with message descriptor. This allows
assembler to read immediate source operand and set bits accordingly.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
While encoding the immediate floating point values in instruction we use
values upto precision 9, but while disassembling, we print precision to
6 places, which round up the value and gives wrong interpretation for
encoded immediate constant.
To avoid misinterpretation of encoded immediate values in instruction
and disassembled output, print hex representation along with floating
point value which can be used by assembler in future.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
After discussion with Timothy Arceri. disk_cache_get_function_identifier
was using only the first byte of the sha1 build-id. Replace
disk_cache_get_function_identifier with implementation from
radv_get_build_id. Instead of writing a uint32_t it now writes to a
mesa_sha1. All drivers using disk_cache_get_function_identifier are
updated accordingly.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Fixes: 83ea8dd99b ("util: add disk_cache_get_function_identifier()")
Following the commit 2385d7b066 and 8e798e28f7, for resource dependancy
tracking.
Fixes: dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo
with FD_MESA_DEBUG=inorder
Signed-off-by: Rob Clark <robdclark@gmail.com>
To avoid wrong result when identifying the type of register.
Ie. If the reg is an array, it might be identified as address or
predicate register.
Fixes: dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.6
Signed-off-by: Rob Clark <robdclark@gmail.com>
Don't leave vsconst/fsconst group enabled if we switch to shader with no
uniforms.
Fixes: abcdf5627a freedreno/a6xx: move const emit to state group
Signed-off-by: Rob Clark <robdclark@gmail.com>
Converted from x86 VFMADDPS intrinsic to generic LLVM intrinsic, and
removed createInstructionSimplifierPass, which were both removed in LLVM
7.0.0
These changes combine patches we received from the community and our own
internal patches
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tested-by: Chuck Atkins <chuck.atkins@kitware.com>
Gives a +3.89% to +5.27% FPS improvement with Hitman and +2.73% to +2.82%
FPS improvement with Dirt Rally on my GTX 1060.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Which is also required to put it in the tarball, a requirement for
building with meson from the tarball.
CC: Ian Romanick <ian.d.romanick@intel.com>
CC: Marek Olšák <marek.olsak@amd.com>
Fixes: 263c962cfd
("mesa: expose EXT_vertex_attrib_64bit")
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
snprintf() guarantees that it will not write more chars than allowed,
and that the string will be null-terminated, without the need to fill
the whole thing with zeroes to begin with.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
There are two problems with the fixed patch. First, it fails to create a
dependency on the sourced .c file, so changes to intel_tiled_memcpy.c
won't trigger a rebuild. It also doesn't get included in the dist
tarball.
Fixes: 11b1afdc92
("i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
extra_files is just a nice way to to tell certain IDEs (and those
reading the file) that this file is also a dependency. Meson will use
the .d file generated by the compiler to figure out what the target
actually depends on.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
GL_EXT_texture_buffer introduced texture buffers, which can be used
in shaders through a new type imageBuffer.
Because how image access is implemented in freedreno, calling
imageSize on an imageBuffer returns the size in bytes instead of texels,
which is incorrect.
This patch adds a division of imageSize result by the bytes-per-pixel
of the image format, when image is buffer-backed.
Fixes all tests under
dEQP-GLES31.functional.image_load_store.buffer.image_size.*
v2: Pre-compute and submit the log2 of the image format's bpp as shader
constant instead of emitting the LOG2 instruction in code. (Rob Clark)
v3: Use ffs (find-first-bit) helper for computing log2 (Ilia Mirkin)
Reviewed-by: Rob Clark <robdclark@gmail.com>
anv_GetPhysicalDeviceSurfaceSupportKHR will already return success for
this, but anv_GetPhysicalDevice{Xcb,Xlib}PresentationSupportKHR do not.
Apps which check for presentation support via the latter (all Feral
Vulkan games at least) will therefore fail.
This allows me to render on an Intel GPU and present to a display
connected to an AMD card (tested HD 530 + Vega 64).
v2: Rebase on current master.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
nir_lower_io_to_scalar_early() is really part of the link time
optimisations. Moving it here allows the code to be simplified
and also keeps the code easy to follow in the next patch.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The linking opts shouldn't try removing or compacting XFB varyings
in the consumer. To avoid this we copy the always_active_io flag
from the producer.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
To have uniform behavior while disassembling send(c) instruction use
register type of unsigned doubleword for src1 when message descriptor is
immediate value. Bspec does not specifiy anything for src1 immediate
default type.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
There exists _mesa_components_in_format() which already includes
all cases handled in _mesa_base_format_component_count().
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The conon_bit_class and canon_var_class variables got switched.
Fixes: 932c650e0b "nir/algebraic: Loosen a restriction on variables"
Reported-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Pointer arithmetic...
v2: s/4/sizeof(uint32_t)/ (Eric)
v3: Give bytes to print_batch() in error_decode (Lionel)
Make clear what values we're dealing with in error_decode (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Implement jpeg target buffer cmd by programming registers directly,
since there is no firmware for VCN Jpeg decode.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Implement jpeg bitstream buffer cmd by programming registers directly,
since there is no firmware for VCN Jpeg decode.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
Move the previous get_mjpeg_slice_heaeder function and eoi from
"radeon/vcn" to "st/va".
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Move the previous get_mjpeg_slice_heaeder function and eoi from
"radeon/vcn" to "st/va".
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Add a new file to handle VCN Jpeg decode specific functions. Use Jpeg
specific cmd sending function in end_frame call.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Use function pointer for sending cmd in end_frame call. By doing this, we can
assign different cmd sending logics for Jpeg decode later.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Add RING_VCN_JPEG for VCN Jpeg decode, and keep RING_VCN_DEC for other codecs.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Move radeon_decoder definition from "radeon_vcn_dec.c" to "radeon_vcn_dec.h",
so that it can be included by other files later.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Rename the (un)map_gtt functions to (un)map_map (map by
returning a map) and add new functions (un)map_tiled_memcpy that
return a shadow buffer populated with the intel_tiled_memcpy
functions.
Tiling/detiling with the cpu will be the only way to handle Yf/Ys
tiling, when support is added for those formats.
v2: Compute extents properly in the x|y-rounded-down case (Chris Wilson)
v3: Add units to parameter names of tile_extents (Nanley Chery)
Use _mesa_align_malloc for the shadow copy (Nanley)
Continue using gtt maps on gen4 (Nanley)
v4: Use streaming_load_memcpy when detiling
v5: (edited by Ken) Move map_tiled_memcpy above map_movntdqa, so it
takes precedence. Add intel_miptree_access_raw, needed after
rebasing on commit b499b85b0f.
v6: refactor to changes done for sse41 separation (Tapani)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
The reference for MOVNTDQA says:
For WC memory type, the nontemporal hint may be implemented by
loading a temporary internal buffer with the equivalent of an
aligned cache line without filling this data to the cache.
[...] Subsequent MOVNTDQA reads to unread portions of the WC
cache line will receive data from the temporary internal
buffer if data is available.
This hidden cache line sized temporary buffer can improve the
read performance from wc maps.
v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
v3: add Android build support (Tapani)
v4: squash 'fix i915: Fix streaming loads for intel_tiled_memcpy'
separate sse41 to own static library (Tapani)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
There is currently no use of returned memcpy functions outside
intel_tiled_memcpy. Patch changes intel_get_memcpy to return memcpy
type instead of actual function. This makes it easier later to separate
streaming load copy in to own static library.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes errors thrown by GCC's Undefined Behaviour sanitizer (ubsan) every
time this macro is used.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
`platforms` is no longer a comma-separated string, and some of our
option descriptions are way too long already. Just drop the incorrect
bit.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
They're not required to be the same as the access flag on the image
unit. For hardware that does shader image lowering based on the
qualifier (Intel), it may be required for state setup.
v2: (by Kenneth Graunke, incorporating feedback from Marek Olšák)
- Reduce both access and shader_access to uint16_t to avoid making
the pipe_image_view structure larger.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously, we would fail if a variable had an assigned but unknown bit
size X and we tried to assign it an actual bit size. However, this is
ok because, at the time we do the search, the variable does have an
actual bit size and it will match X because of the NIR rules.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
We rename some local variables in validate() to be more readable and
plumb the var through to get/set_var_bit_class instead of the var index.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
STATE_BASE_ADDRESS only modifies various bases if the "modify" bit is
set. Otherwise, we want to keep the existing base address.
Iris uses this for updating Surface State Base Address while leaving the
others as-is.
v2: Also update aubinator_viewer_decoder (caught by Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This not only makes them safe for more bit sizes but it also fixes a bug
in is_zero_to_one where it would return true for constant NaN.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
On scalar ISAs, nir_lower_io_to_scalar_early enables significant
optimizations. However, on vector ISAs, it is counterproductive and
impedes optimal codegen. This patch only calls
nir_lower_io_to_scalar_early for scalar ISAs. It appears that at present
there are no upstreamed drivers using Gallium, NIR, and a vector ISA, so
for existing code, this should be a no-op. However, this patch is
necessary for the upcoming Panfrost (Midgard) and Lima (Utgard)
compilers, which are vector.
With this patch, Panfrost is able to consume NIR directly, rather than
TGSI with the TGSI->NIR conversion.
For how this affects Lima, see
https://www.mail-archive.com/mesa-dev@lists.freedesktop.org/msg189216.html
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
r600 doesn't have a hard requirement on LLVM, and therefore doesn't have
a hard requirement on libelf. Currently the logic doesn't allow that
however.
Distro-bug: https://bugs.gentoo.org/669058
Fixes: 5060c51b6f
("meson: build r600 driver")
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch exposes support for the following two extensions:
* VK_GOOGLE_decorate_string
* VK_GOOGLE_hlsl_functionality1
There's nothing for the driver to do; it's all handled in spirv_to_nir.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107971
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This extension adds two new decorations which carry meaning only for
HLSL shaders. They are expected to be handled by higher level layers
and can be ignored by implementations. However, it does save the client
a bit of work if the implementation safely ignores them instead of the
client having to strip them out of the SPIR-V in order for it to be
valid.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The comment was wrong, since the loop above casts to a type with the
correct bitsize already.
Fixes: 7e7ee82698 ("ac: add support for 16bit buffer loads")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
And implement ac_bulid_expand_to_vec4() on top of it.
Fixes: 7e7ee82698 ("ac: add support for 16bit buffer loads")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
`nir_intrinsic_image_deref_size` is not being considered during scan for
driver constants, so image constants are not emitted if a shader
only ever query the size of an image (no load, store, atomic op, etc).
This is unlikely, but possible.
Reviewed-by: Rob Clark <robdclark@gmail.com>
This warning detects non-void functions with a missing return statement,
return statements with a value in void functions, and functions with an
bogus return type that ends up defaulting to int. It's already enabled
by default with -Wall. Generally, these are fairly serious bugs in the
code, which developers would like to notice and fix immediately. This
patch promotes it from a warning to an error, to help developers catch
such mistakes early.
I would not expect this warning to change much based on the compiler
version, so hopefully it won't become a problem for packagers/builders.
See the GCC documentation or 'man gcc' for more details:
https://gcc.gnu.org/onlinedocs/gcc-7.3.0/gcc/Warning-Options.html#index-Wreturn-type
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Instead of having weak references to the anv functions and separate
trampoline functions with their own dispatch table, just make the
trampoline functions weak. This gets rid of a dispatch table and
potentially lets the compiler delete the unused weak function. The
end result is a reduction in the .text section of 5.7K and a reduction
in the .data section of 1.4K.
Before:
text data bss dec hex filename
3190329 282232 8960 3481521 351fb1 _install/lib64/libvulkan_intel.so
After:
text data bss dec hex filename
3184548 280792 8960 3474300 35037c _install/lib64/libvulkan_intel.so
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 86b4bd52dc ("docs: update calendar, add news item and link
release notes for 18.2.3")
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
It's broken, and WGL state tracker is always built with GLES support
noawadays.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This lets us avoid passing the DRM fd around all over the place and gets
us closer to layer utopia.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
In that case, we have to wait for the fence to synchronize with the
corresponding drawing we triggered in the X server.
Fixes incorrect display with the i965 driver and some applications, e.g.
solvespace.
Bugzilla: https://bugs.freedesktop.org/108097
Fixes: aefac10fec "loader/dri3: Only wait for back buffer fences in
dri3_get_buffer"
Tested-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
We don't need weak references to instance entrypoints because we never
have more than one of each so we don't need the NULL fall-back. This
also helps us avoid forgetting things because we now get link errors for
missing instance entrypoints.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This got missed during 1.1 enabling because it was defined as an
interaction between device groups and WSI and it wasn't obvious it was
in the delta.
The idea behind it is that it's supposed to provide a hint to the
application in a multi-GPU setup to indicate which regions of the screen
are being scanned out by which GPU so a multi-device split-screen
rendering application can render each part of the screen on the GPU that
will be presenting it and avoid extra bus traffic between GPUs. On a
single-GPU setup or one which doesn't support this present mode, we need
to do something. We choose to return the window size (or a max-size
rect) if the compositor, X server, or crtc is associated with the given
physical device and zero rectangles otherwise.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We already have wsi_device and we know the instance allocator at
wsi_device_init time so there's no need to pass it into the physical
device queries. This also fixes a memory allocation domain bug that can
occur if CreateSwapchain gets called prior to any queries (not likely)
in which case the cached connection gets allocated off the device
instead of the instance.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The number of immediate constants was fixed and the size check was
only done by means of an assertion. Given this a shader that emits
more immediate constants would result in a memory corruption when
mesa is build in release mode.
Instead of using this fixed limit allocate the space dynamically, let it
grow as needed, and also remove the unused ImmArray.
Fixes: dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.1
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Totals from affected shaders:
SGPRS: 1112 -> 1112 (0.00 %)
VGPRS: 1492 -> 1196 (-19.84 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 112172 -> 101316 (-9.68 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 93 -> 98 (5.38 %)
Wait states: 0 -> 0 (0.00 %)
All affected shaders are from "Batman: Arkham City" over DXVK.
The pass detects that the temporary array created by DXVK for
storing TCS inputs is a copy of the input arrays and allows
us to avoid copying all of the input data and then indirecting
on it with if-ladders, instead we just do indirect indexing.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Offers three clocks, device, clock monotonic and clock monotonic
raw. Could use some kernel support to reduce the deviation between
clock values.
v2:
Ensure deviation is at least as big as the GPU time interval.
v3:
Set device->lost when returning DEVICE_LOST.
Use MAX2 and DIV_ROUND_UP instead of open coding these.
Delete spurious TIMESTAMP in radv version.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v4:
Add anv_gem_reg_read to anv_gem_stubs.c
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
v5:
Adjust maxDeviation computation to max(sampled_clock_period) +
sample_interval.
Suggested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Some of the .dir-locals.el had the wrong name for the truthy value so
it wasn’t setting indent-tabs-mode.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Now that a single cmdstream is used for both binning and draw passes, we
can skip allocation of cmdstream buffer for binning.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Now that state which is different for draw vs binning pass is split out
into different state-groups with appropriate enable_mask (so the
appropriate one is chosen for draw vs binning), switch over to using a
single cmdstream for both passes.
This should significantly lower draw overhead for CPU bound benchmarks.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Blob seems to manage to use same input registers for BS (binning pass)
vs VS (draw pass) shaders, so it can use the same VBO state for both.
We can't quite do that yet, so split them.
Signed-off-by: Rob Clark <robdclark@gmail.com>
We don't need to keep this IGNORE_VISIBILITY in binning pass. Prep work
for using single cmdstream for both draw and binning passes.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Move this to after ir3_cp (which can add lowered immediates to the const
state) for a6xx+, to ensure the uniform state matches between binning
and vertex shaders. This way we can emit just a single VS_CONST state-
group when we re-use single cmdstream for both binning and draw passes.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Use the in-memory cache to construct shader program state and re-use it
on subsequent draws, to lower driver overhead.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cache that maps gallium hwcso (in this case, 'struct ir3_shader') plus
shader variant key to a generation specific state object.
This could eventually replace the linked list of shader variants, but
for now it lets us re-use the work currently done in fdN_program_emit()
Signed-off-by: Rob Clark <robdclark@gmail.com>
Prep work for a following patch, that introduces a cache to map from
program state (all shader stages) plus variant key to pre-baked hw
state (which could be emit'd via CP_SET_DRAW_STATE, for example).
To do that, we really want the variant key to be immutable, and to
treat the binning pass shader as an extra shader stage, rather than
as a VS variant.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Unfortunately gallium doesn't match what the hw wants perfectly here, in
using a separate CSO for each texture/sampler. So we have to use a hash
table to map the collection of texture/samplers to hw state object.
We probably could use separate hw state objects for texture and sampler
state, but mesa/st tends to update the tex and samp state together.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Intended to be something more compact than a 64b pointer, which could be
used as a key into hashtables. Prep work for texture state objects.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Eventually we want to move nearly everything, but no other state depends
on const state, so this is the easiest one to move first.
For webgl aquarium, this reduces GPU load by about 10%, since for each
fish it does a uniform upload plus draw.. fish frequently are visible in
only a single tile, so this skips the uniform uploads for other tiles.
The additional step of avoiding WFI's when using CP_SET_DRAW_STATE seems
to be work an additional 10% gain for aquarium.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add helper to add state-groups to emit, and code to emit CP_DRAW_STATE
packet if we have any state-groups.
Signed-off-by: Rob Clark <robdclark@gmail.com>
These are not necessary because the corresponding settings are set via
the .dir-locals.el file anyway. Most of them were missing a ‘:’ after
“tab-width” which was making Emacs display an annoying warning
whenever you open the file.
This patch was made with:
sed -ri '/-\*- mode:/,/^$/d' \
$(find src/gallium/{drivers,winsys} -name \*.\[ch\] \
-exec grep -l -- '-\*- mode:' {} \+)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Needs to allocate batches from the cache so that it could
get a valid index and make resource dependancy tracking right.
In addition this fixes assertion on debug build since the commit
1a40faa8 landed.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Needs to specify nondraw when creating a batch through
fd_bc_alloc_batch since it'd better create a batch through
it rather than fd_batch_create.
Signed-off-by: Rob Clark <robdclark@gmail.com>
TODO not sure if this is best solution, but current logic is broken for
texcoord inputs. It is definitely the simplest solution.
Fixes: 1a24f51966 freedreno/ir3: ignore unused inputs
Signed-off-by: Rob Clark <robdclark@gmail.com>
We should get fewer context rolls with the SET_CONTEXT_REG optimization,
but it would have been for nothing if the scissor state rolled the context
anyway. Don't emit the scissor state if there is no context roll.
We know the divisors when we upload them, so instead we can precompute
and upload division factors derived from each divisor.
This fast division consists of add, mul_hi, and two shifts,
and we have to load 4 dwords intead of 1.
This probably won't affect any apps.
This will be more useful when we change the quant mode to increase subpixel
precision and decrease the viewport range (which might not be possible
if the viewport is not centered in the viewport range).
Even though the Intel GPU are always at the same PCI location, all the
info we need is already provided by libdrm. Let's be future proof.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
LLVMInt*Type() return types from the global context and therefore are
not safe for use in other contexts. Use types from our own context
instead.
Fixes frequent crashes seen when doing multithreaded pipeline creation.
Fixes: 4d0b02bb5a "ac: add support for 16bit load_push_constant"
Fixes: 7e7ee82698 "ac: add support for 16bit buffer loads"
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Adding compile time check for subroutine functions with
the same names. Similar check for intrastage linking was already
landed in commit 5f0567a4f6.
From Section 6.1.2 (Subroutines) of the GLSL 4.00 specification
"A program will fail to compile or link if any shader
or stage contains two or more functions with the same
name if the name is associated with a subroutine type."
Fixes:
* no-overloads.vert
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108109
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
See commit c0c46ca461f136a0ae1ed69da6c874e850aeeb53 in the Linux kernel,
where José Roberto de Souza added this new PCI ID there.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Extend the pass to propagate the copies information along the control
flow graph. It performs two walks, first it collects the vars
that were written inside each node. Then it walks applying the copy
propagation using a list of copies previously available. At each node
the list is invalidated according to results from the first walk.
This approach is simpler than a full data-flow analysis, but covers
various cases. If derefs are used for operating on more memory
resources (e.g. SSBOs), the difference from a regular pass is expected
to be more visible -- as the SSA copy propagation pass won't apply to
those.
A full data-flow analysis would handle more scenarios: conditional
breaks in the control flow and merge equivalent effects from multiple
branches (e.g. using a phi node to merge the source for writes to the
same deref). However, as previous commentary in the code stated, its
complexity 'rapidly get out of hand'. The current patch is a good
intermediate step towards more complex analysis.
The 'copies' linked list was modified to use util_dynarray to make it
more convenient to clone it (to handle ifs/loops).
Annotated shader-db results for Skylake:
total instructions in shared programs: 15105796 -> 15105451 (<.01%)
instructions in affected programs: 152293 -> 151948 (-0.23%)
helped: 96
HURT: 17
All the HURTs and many HELPs are one instruction. Looking
at pass by pass outputs, the copy prop kicks in removing a
bunch of loads correctly, which ends up altering what other
other optimizations kick. In those cases the copies would be
propagated after lowering to SSA.
In few HELPs we are actually helping doing more than was
possible previously, e.g. consolidating load_uniforms from
different blocks. Most of those are from
shaders/dolphin/ubershaders/.
total cycles in shared programs: 566048861 -> 565954876 (-0.02%)
cycles in affected programs: 151461830 -> 151367845 (-0.06%)
helped: 2933
HURT: 2950
A lot of noise on both sides.
total loops in shared programs: 4603 -> 4603 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 11085 -> 11073 (-0.11%)
spills in affected programs: 23 -> 11 (-52.17%)
helped: 1
HURT: 0
The shaders/dolphin/ubershaders/12.shader_test was able to
pull a couple of loads from inside if statements and reuse
them.
total fills in shared programs: 23143 -> 23089 (-0.23%)
fills in affected programs: 2718 -> 2664 (-1.99%)
helped: 27
HURT: 0
All from shaders/dolphin/ubershaders/.
LOST: 0
GAINED: 0
The other generations follow the same overall shape. The spills and
fills HURTs are all from the same game.
shader-db results for Broadwell.
total instructions in shared programs: 15402037 -> 15401841 (<.01%)
instructions in affected programs: 144386 -> 144190 (-0.14%)
helped: 86
HURT: 9
total cycles in shared programs: 600912755 -> 600902486 (<.01%)
cycles in affected programs: 185662820 -> 185652551 (<.01%)
helped: 2598
HURT: 3053
total loops in shared programs: 4579 -> 4579 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 80929 -> 80924 (<.01%)
spills in affected programs: 720 -> 715 (-0.69%)
helped: 1
HURT: 5
total fills in shared programs: 93057 -> 93013 (-0.05%)
fills in affected programs: 3398 -> 3354 (-1.29%)
helped: 27
HURT: 5
LOST: 0
GAINED: 2
shader-db results for Haswell:
total instructions in shared programs: 9231975 -> 9230357 (-0.02%)
instructions in affected programs: 44992 -> 43374 (-3.60%)
helped: 27
HURT: 69
total cycles in shared programs: 87760587 -> 87727502 (-0.04%)
cycles in affected programs: 7720673 -> 7687588 (-0.43%)
helped: 1609
HURT: 1416
total loops in shared programs: 1830 -> 1830 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 1988 -> 1692 (-14.89%)
spills in affected programs: 296 -> 0
helped: 1
HURT: 0
total fills in shared programs: 2103 -> 1668 (-20.68%)
fills in affected programs: 438 -> 3 (-99.32%)
helped: 4
HURT: 0
LOST: 0
GAINED: 1
v2: Remove the DISABLE prefix from tests we now pass.
v3: Add comments about missing write_mask handling. (Caio)
Add unreachable when switching on cf_node type. (Jason)
Properly merge the component information in written map
instead of replacing. (Jason)
Explain how removal from written arrays works. (Jason)
Use mode directly from deref instead of getting the var. (Jason)
v4: Register the local written mode for calls. (Jason)
Prefer cf_node instead of node. (Jason)
Clarify that remove inside iteration only works in backward
iterations. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Calls are not used yet (functions are inlined), but since new code is
already taking them into account, do it here too. The convention here
and in other places is that no writable memory is assumed to remain
unchanged, as well as global variables.
Also, explicitly state the modes affected (instead of using the
reverse logic) in one of the apply_for_barrier_modes calls.
Suggested by Jason.
v2: Consider local vars used by a call to be conservative, SPIR-V has
such cases. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Also tests for removal of redundant loads, that we currently handle as
part of the copy propagation.
Note some tests involve multiple blocks and are currently DISABLED
because they (expectedly) fail.
v2: Add missing DISABLED prefix to "multi block" tests. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Instead of doing this as part of the existing copy_prop_vars pass.
Separation makes easier to expand the scope of both passes to be more
than per-block. For copy propagation, the information about valid
copies comes from previous instructions; while the dead write removal
depends on information from later instructions ("have any instruction
used this deref before overwrite it?").
Also change the tests to use this pass (instead of copy prop vars).
Note that the disabled tests continue to fail, since the standalone
pass is still per-block.
v2: Remove entries from dynarray instead of marking items as
deleted. Use foreach_reverse. (Caio)
(all from Jason)
Do not cache nir_deref_path. Not worthy for this patch.
Clear unused writes when hitting a call instruction.
Clean up enumeration of modes for barriers.
Move metadata calls to the inner function.
v3: For copies, use the vector length to calculate the mask.
(all from Jason)
Use nir_component_mask_t when applicable.
Rename functions for clarity.
Consider local vars used by a call to be conservative (SPIR-V has
such cases).
Comment and assert the assumption that stores and copies are
always to a deref that ends with a vector or scalar.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Note at the moment the pass called is nir_opt_copy_prop_vars, because
dead write elimination is implemented there.
Also added tests that involve identifying dead writes in multiple
blocks (e.g. the overwrite happens in another block). Those currently
fail as expected, so are marked to be skipped.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Add basic helpers for doing tests on the vars related optimization
passes. The main goal is to lower the barrier to create tests during
development and debugging of the passes. Full coverage is not a
requirement.
v2: Make find_next_intrinsic() skip blocks before 'after'. (Jason)
Move nir_imm_ivec2() to nir_builder.h. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Useful to walk the array removing elements by swapping them with the
last element.
v2: Change iteration to make sure we never underflow. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For gallium drivers where you want to do some linking at variant compile
time, you don't have the other producer/consumer shader on hand to modify.
By exposing the inner function, the driver can have the used varyings in
the compiled shader cache key and still do linking.
This is also useful for V3D, where the binning shader wants to only output
position and TF varyings. We've been removing those after nir_lower_io,
but this will be less driver-specific code and let more of the shader get
DCEed early in NIR.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This means that TTN shaders more closely resemble GTN shaders: they have
inputs and outputs as variable derefs, with the variables having their
.driver_location already set up for you.
This will be useful for v3d to do input variable DCE in NIR, which we
can't do when the TTN shaders never have a pre-nir_lower_io stage.
Acked-by: Rob Clark <robdclark@gmail.com>
In TGSI we have a vec4 of which only .z is used, but for NIR we should be
using a float the same as other NIR IR. We were already moving TGSI's .z
to the .x channel.
Acked-by: Rob Clark <robdclark@gmail.com>
This fixes some new CTS that reads clip/cull distances
from the fragment shader stage:
dEQP-VK.clipping.user_defined.clip_*
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's the minimum value required by the spec.
This fixes dEQP-VK.api.info.device.properties.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
There's no reason why we need generate trampoline functions for instance
functions or carry N copies of the instance dispatch table around for
every hardware generation. Splitting the tables and being more
conservative shaves about 34K off .text and about 4K off .data when
built with clang.
Before splitting dispatch tables:
text data bss dec hex filename
3224305 286216 8960 3519481 35b3f9 _install/lib64/libvulkan_intel.so
After splitting dispatch tables:
text data bss dec hex filename
3190325 282232 8960 3481517 351fad _install/lib64/libvulkan_intel.so
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
My recent prog_to_nir patch started making new sampler uniforms, which
apparently increased the number of parameters. We used to poke at the
one parameter directly, making it important that there was only one,
but we haven't done that in a while. It should be safe to just delete
the assertion.
Fixes: 1c0f92d8a8 "nir: Create sampler variables in prog_to_nir."
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
u_transfer_helper already had code to handle treating packed Z32_S8
as separate Z32_FLOAT and S8_UINT resources, since some drivers can't
handle that interleaved format natively.
Other hardware needs depth and stencil as separate resources for all
formats. For example, V3D3 needs this for 24-bit depth as well.
This patch adds a new flag to lower all depth/stencils formats, and
implements support for Z24_UNORM_S8_UINT. (S8_UINT_Z24_UNORM is left
as an exercise to the reader, preferably someone who has access to a
machine that uses that format.)
Reviewed-by: Eric Anholt <eric@anholt.net>
This new function takes separate Z24 depth and S8 stencil sources,
and packs them into a single combined Z24S8 buffer.
Reviewed-by: Eric Anholt <eric@anholt.net>
This will be used by u_transfer_helper.c shortly, in order to split
packed depth-stencil into separate resources.
Reviewed-by: Eric Anholt <eric@anholt.net>
This is needed for nir_gather_info to actually count the textures,
since it operates solely on variables.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is needed for nir_gather_info to actually count the new textures,
since it operates solely on variables.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
sb/sb_bc_parser.cpp:620:27: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand]
if (cf->bc.op_ptr->flags && FF_GDS)
^ ~~~~~~
sb/sb_bc_parser.cpp:620:27: note: use '&' for a bitwise operation
if (cf->bc.op_ptr->flags && FF_GDS)
^~
&
sb/sb_bc_parser.cpp:620:27: note: remove constant to silence this warning
if (cf->bc.op_ptr->flags && FF_GDS)
~^~~~~~~~~
Fixes: da977ad907 ("r600/sb: start adding GDS support")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
ISL_AUX_USAGE_NONE happens to be the same as "false", but let's do the
right thing and use the enum.
v2: fix intel_miptree_finish_depth too (Caio)
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
SCons MSVC support relies on vcvarsall.bat to extract the PATH, CPP
includes, library paths, etc.
And SCons also has an build env var named MSVC_USE_SCRIPT which one can
use to point to alternative vcvarsall.bat script.
This change exposes this MSVC_USE_SCRIPT build env variable as a SCons
command line variable. This will enable using MSVC outside Program
Files (e.g, network shares, etc.)
This change also links advapi32 library, necessary for the Windows
Registry API used by WGL state tracker, avoiding missing symbols.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
v2: - change how the access qualifiers are accumulated
v3: - duplicate members in struct_member_decoration_cb()
- handle access qualifiers on variables
- remove access qualifiers handling in _vtn_variable_load_store()
- fix setting access qualifiers on type->array_element
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net
This is basically a port of commit,
3ade766684
("i965: Disable 3DSTATE_WM_HZ_OP fields.")
The BDW+ docs describe how to use the 3DSTATE_WM_HZ_OP instruction in
the section titled, "Optimized Depth Buffer Clear and/or Stencil Buffer
Clear." It mentions that the packet overrides GPU state for the clear
operation and needs to be reset to 0s to clear the overrides. Depending
on the kernel, we may not get a context with the GPU state for this
packet zeroed. Do it ourselves just in case.
Prevents a number of GPU hangs when running crucible on ICL. I tried to
get the exact number of hangs that occurs without this patch, but was
unsuccessful. The test machine became unresponsive before completing the
full run.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The b2f and b2i conversions always produce zero or one which are both
representable in every type and size. Since b2i and b2f support all bit
sizes, we can just get rid of the conversion opcode.
total instructions in shared programs: 15089335 -> 15084368 (-0.03%)
instructions in affected programs: 212564 -> 207597 (-2.34%)
helped: 896
HURT: 0
total cycles in shared programs: 369831123 -> 369826267 (<.01%)
cycles in affected programs: 2008647 -> 2003791 (-0.24%)
helped: 693
HURT: 216
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is valid NIR but you can't actually hit this case today. GLSL IR
doesn't have a bool to double opcode; it does f2d(b2f(x)). In SPIR-V we
don't have any to/from bool conversion opcodes at all. However, the
next commit will make us start generating it so we should be ready.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Several of the Atom GPUs have additional restrictions on alignment when
moving < 64-bit source to a 64-bit destination. All of the nir_op_*2*64
code generation paths respected this, but nir_op_b2[fi] did not.
Previous to commit a68dd47b91 it was not possible to generate such an
instruction from the GLSL path. It may have been possible from SPIR-V,
but it's not clear. The aforementioned patch converts a 64-bit
nir_op_fsign into a sequence of operations including a nir_op_b2f with a
64-bit result. This "just works" everywhere except these Atom parts.
This problem was not detected during normal CI testing because the Atom
parts are not included in developer builds.
v2 (idr): Make the patch compile, and make some cosmetic changes. Add a
commit message.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108319
Fixes: a68dd47b91 "nir/algebraic: Simplify fsat of fsign"
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixed pack_uint_Z_FLOAT32 by casting row data to float instead uint.
Remove code duplicate function pack_uint_Z_FLOAT32_X24S8.
Edited case in "_mesa_get_pack_uint_z_func".
Now it looks like "_mesa_get_pack_float_z_func".
Remove _mesa_problem call, which was added for debuging this issue.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91433
Signed-off-by: Illia Iorin <illia.iorin@globallogic.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Whiskey Lake uses the same gen graphics as Coffe Lake, including some
ids that were previously marked as reserved on Coffe Lake, but that
now are moved to WHL page.
This follows the ids and approach used on kernel's commit
b9be78531d27 ("drm/i915/whl: Introducing Whiskey Lake platform")
and commit c1c8f6fa731b ("drm/i915: Redefine some Whiskey Lake SKUs")
v2: Lionel noticed that GT{1,2,3} on kernel wasn't following
spec when looking to number of EUs, so kernel has been updated.
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
vlVaGetImage should respect the width, height, and coordinates x and y that
passed in. Therefore, pipe_box should be created with the passed in values
instead of surface width/height.
v2: add input size check, return error when size out of bounds
v3: fix the size check for vaimage
v4: add size adjustment for x and y coordinates
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Christian König <christian.koenig@amd.com>
R32G32B32 are weird formats and we are only going to support
some basic operations for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Not going to matter, but be consistent.
Found by coverity
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: caf41c78c (anv/allocator: Support softpin in the BO cache)
This fixes a bug uncovered by my NIR integer division by constant
optimization series.
Fixes: 19f9cb72c8 "i965/fs: Add pass to propagate conditional..."
Fixes: 627f94b72e "i965/vec4: adding vec4_cmod_propagation..."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
While I generally trust rediculousfish to have done his homework, we've
made some adjustments to suit the needs of mesa and it'd be good to
test those. Also, there's no better place than unit tests to clearly
document the different edge cases of the different methods.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There's nothing inherently fixed-width in the code. All that's required
to generalize it is to make everything internally 64-bit and pass
UINT_BITS in as a parameter to util_compute_fast_[us]div_info. With
that, it can now handle 8, 16, 32, and 64-bit integer division by a
constant.
We also add support for division by 1 and by other powers of 2. This is
useful if you want to divide by a uniform value in a shader where you
have the opportunity to adjust the uniform on the CPU before passing it
in.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Compilers can use this to generate optimal code for integer division
by a constant.
Additionally, an unsigned division by a uniform that is constant but not
known at compile time can still be optimized by passing 2-4 division
factors to the shader as uniforms and executing one of the fast_udiv*
variants. The signed division algorithm doesn't have this capability.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Currently mesa only supports EGL on Unix like systems, cygwin, and
haiku. Meson should actually enforce this. This fixes the default build
on MacOS.
v2: - invert the condition, mark darwin and windows as not supported
instead of trying to mark what is supported.
v3: - add missing )
v3: - Update comment to reflect condition change in v2
CC: 18.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The Nvidia/AMD binary drivers allow this, as does GCC.
This fixes shader compilation issues in the latest update of
No Mans Sky.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These allows us to not support fsign.sat in the Intel compiler backend,
and that will simplify some later changes.
No shader-db changes on any Intel platform.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
svga_destroy_shader_variant() itself flushes and retries the command
if there's a failure. So no need for the callers to do it. Other
callers of the function were already ignoring the return value.
This also fixes a corner-case double-free reported by Coverity
(and reported by Dave Airlie).
Tested with various OpenGL apps.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Otherwise building just vulkan (among other things) will build these
tests, pull in a bunch of stuff they shouldn't, and potentially fail to
compile.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
For some reason the 2d engine can't handle this. Red formats get special
treatment there, so perhaps related.
Fixes dEQP-GLES3 tests of the form:
dEQP-GLES3.functional.fbo.blit.conversion.r{8,16f,32f}_to_srgb8_alpha8
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
The current state tracker can generate these sometimes. Fixing this is
more involved, and due to some integer math we can generate
divisions-by-zero.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
As with a5xx, hidden behind FD_MESA_DEBUG=lrz due to being paranoid
about z-fighting issues with some games (in particular, this was
observed with 0ad on a5xx.. but I think the proper solution to enable
this by default is to figure out how to do driver specific driconf
options).
Signed-off-by: Rob Clark <robdclark@gmail.com>
My attempt was to set this field instead of duplicating one.
Fixes: 6cfa321c39 ("radv: add potential missing fields for DB_EQAA")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In the same fashion as is done for glEGLImageTextureTarget2D.
v2: share the fallback which sets baseformat and internalformat correctly
which makes both of the tests pass (Tapani)
Fixes android.hardware.nativehardware.cts.AHardwareBufferNativeTests:
#SingleLayer_ColorTest_GpuColorOutputCpuRead_R8G8B8X8_UNORM
#SingleLayer_ColorTest_GpuColorOutputIsRenderable_R8G8B8X8_UNORM
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
For testing it is of interest that all tests of dEQP pass, e.g. to test
virglrenderer on a host only providing software rendering like in a CI.
Hence make it possible to disable certain optimizations that make tests fail.
While we are there also add some documentation to the flags to make it clear
that this is opt-out.
Setting the environment variable "GALLIVM_PERF=no_filter_hacks" can be used to make
the following tests pass in release mode:
dEQP-GLES2.functional.texture.mipmap.2d.affine.*_linear_*
dEQP-GLES2.functional.texture.mipmap.cube.generate.*
dEQP-GLES2.functional.texture.vertex.2d.filtering.*_mipmap_linear_*
dEQP-GLES2.functional.texture.vertex.2d.wrap.*
Related:
https://bugs.freedesktop.org/show_bug.cgi?id=94957
v2: rename optimization disabling flag to 'safemath' and also move the
nopt flag to the perf flags.
v3: rename flag "safemath" to "no_filter_hacks" since safemath is usually
associated with floating point operations (Roland)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Pass the size of a resource when creating it so a backing can be kept in
the other side.
Also pass the required offset to transfer commands.
This moves vtest closer to how virtio-gpu works, making it more useful
for testing.
v2: - Use new messages for creation and transfers, as changing the
behavior of the existing messages would be messy given that we don't
want to break compatibility with older servers.
v3: - Use correct strides: The resource corresponding to the output display
might have a differnt line stride then the IOVs, so when reading back
to this resource take the resource stride and the the IOV stride
into account.
v4: Fix transfer size calculation (Andrey Simiklit)
v5: Add comment about transfer size value in the PUT commend (Gurchetan).
Add a comment about the size correction for transfers for reading and
writing the resource. Fixing this by correctly evaluating the size
upfront will need some work also on the virglrenderer side.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> (v2)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
The transfer size used in virglrenderer refers to uint32_t, so one
must add 3 and then divide by 4 instead of adding 3/4 which is a no-op
with integers.
Fixes: b3b82fe8ea virgl/vtest: add vtest driver
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
reg_saved will have 64 bits, and (1 << reg) where reg > 31 has undefined
behavior. (1ull << reg) would be correct for 64 bits.
This commit shifts the other way in order to merge the conditions.
Otherwise, they are removed during NIR linking or in some
lowering passes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The ssa_for_alu_src helper will correctly handle swizzles and other
source modifiers for you. The expansions for unpack_half_2x16,
pack_uvec2_to_uint, and pack_uvec4_to_uint were all broken with regards
to swizzles. The brokenness of unpack_half_2x16 was causing rendering
errors in Rise of the Tomb Raider on Intel ever since c11833ab24
which added an extra copy propagation to the optimization pipeline and
caused us to start seeing swizzles where we hadn't seen any before.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926
Fixes: 9ce901058f "nir: Add lowering of nir_op_unpack_half_2x16."
Fixes: 9b8786eba9 "nir: Add lowering support for packing opcodes."
Tested-by: Alex Smith <asmith@feralinteractive.com>
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
>From Section 6.1.2 (Subroutines) of the GLSL 4.00 specification
"A program will fail to compile or link if any shader
or stage contains two or more functions with the same
name if the name is associated with a subroutine type."
v2:
- error out earlier (Tapani)
- style fixes (Iago)
Fixes:
* no-overloads.vert
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108109
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Check if server supports version negotation by sending a PING_PROTOCOL_VERSION
message right before a dummy RESOURCE_BUSY_WAIT. If we don't get a reply
for the first, we know the server doesn't support it.
If it does support it, we can query the max protocol version supported
by the server and fall back if needed.
v2: - Send a new message to negotiate the protocol version, checking if
the server supports this message by immediately sending a busy wait
message. (Dave Airlie)
v3: - Send a zero-arg command PING_PROTOCOL_VERSION so we actually keep
compatibility with older servers. (Code by Dave Airlie)
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
construct correct gen xml filename when we try to load hardware xml
description from a given path
v2: remove temporary variable (Francesco Ansanelli)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Previously, we just went ahead and emitted MI_BATCH_BUFFER_START as
normal. If we are near enough to the end, this can cause us to start a
new BO just for the MI_BATCH_BUFFER_START which messes up chaining. We
always reserve enough space at the end for an MI_BATCH_BUFFER_START so
we can just increment cmd_buffer->batch.end prior to emitting the
command.
Fixes: a0b133286a "anv/batch_chain: Simplify secondary batch return..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926
Tested-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
On Broadwell and above, we have to use different MOCS settings to allow
the kernel to take over and disable caching when needed for external
buffers. On Broadwell, this is especially important because the kernel
can't disable eLLC so we have to do it in userspace. We very badly
don't want to do that on everything so we need separate MOCS for
external and internal BOs.
In order to do this, we add an anv-specific BO flag for "external" and
use that to distinguish between buffers which may be shared with other
processes and/or display and those which are entirely internal. That,
together with an anv_mocs_for_bo helper lets us choose the right MOCS
settings for each BO use.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99507
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This reverts commit 3d81e11b49.
As reported by Federico, some games require the 'sort by year' since
they truncate the extensions which do not fit the fixed size string
array.
Seemingly I did not consider that, as the documentation (both Mesa and
Nvidia) mentions about program crashes ... which are worked around by
setting the env. variable.
This commit reinstates the workaround and enhances the documentation.
Cc: Marek Olšák <maraeo@gmail.com>
Cc: Ian Romanick <idr@freedesktop.org>
Reported-by: Federico Dossena <info@fdossena.com>
Fixes: 3d81e11b49 ("mesa: remove unnecessary 'sort by year' for the GL
extensions")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Tested-by: Federico Dossena <info@fdossena.com>
Split into different sections, document each one as well as strange
cases like GL_ATI_texture_compression_3dc.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Remove all the desktop GL and GLX entries from the list.
Former are pulled by the gl.h and glext.h includes at the top while the
latter are no longer needed.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Earlier commit updated the code to use the DRI tokens, yet forgot to
update the comment.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Earlier commit updated the code to use the DRI tokens, yet forgot to
update the comment.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Will allow us to remove even bigger hack elsewhere. But more
importantly, we should not be using _any_ GLX tokens in DRI.
Document the gory details about the current side-effects.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Somewhat recently Thomas Hellstrom added the respective DRI tokens
and updated the drivers. Update the documentation to match reality.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The pipe_loader_release API closes the fd given, even if the pipe-loader
should _not_ take ownership of it.
With earlier commit we fixed pipe_loader_drm_probe_fd, and now with
cover the final piece.
Note that unlike the DRM case, here the caller _did_ forget to dup
before using it ... most likely leading to all sorts of fun.
Don't forget the close in the error path. Seems like the things are a
bit leaky/asymmetrical with the semi-recent config work. But we can shave
that yak another day ;-)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Currently pipe_loader_drm_probe_fd takes ownership of the fd given.
To match that, pipe_loader_release closes it.
Yet we have many instances which do not want the change of ownership,
and thus duplicate the fd before passing it to the pipe-loader.
Move the dup() within pipe-loader, explicitly document that and document
all the cases through the codebase.
A trivial git grep -2 pipe_loader_release makes things as obvious as it
gets ;-)
Cc: Leo Liu <leo.liu@amd.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Axel Davy <davyaxel0@gmail.com>
Cc: Patrick Rudolph <siro@das-labor.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com> (for nine)
With commit c6c0f94714, back in 2006 Brian removed the
_glapi_check_multithread() call from core mesa - _mesa_make_current.
It was done to remove fairly awkward #ifdef guard which caused subtle
differences in core mesa.
Since that guard is long gone, we can drop the duplication and
reintroduce the call in core.
Note that the function is was missing when using EGL + classic dri HW
drivers. Yet on TLS builds it's a no-op, so we're safe.
Any non TLS users - more or less anything !Linux (or even musl on Linux
up-to semi-recently) may have experienced problems.
v2: don't remove the call from swrast - move it to core (Eric)
Cc: Eric Anholt <eric@anholt.net>
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Earlier commit added support for 'front_buffers', erroneously adding a
return in vl_dri3_screen_destroy. Effectively leaking a lot of state.
Fixes: 8d7ac0a4e4 ("vl/dri3: implement DRI3 BufferFromPixmap")
Cc: Leo Liu <leo.liu@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
The header was used only by dri2.c, containing a two-member struct and cast wrapper.
Just inline it where it's used/needed.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This is the minimum value according to the spec.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This can be used as a drop in replacement for
disk_cache_get_function_timestamp().
Here we use build-id to generate a driver-id rather than build
timestamp if available. This should resolve issues such as
distros using reproducable builds and flatpak not having
real build timestamps.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Only some drivers use a timestamp here. Others use things such
as build-id, or even a combination of build-ids from Mesa and
LLVM.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In the GL_MESA_framebuffer_flip_y implementation
_mesa_is_winsys_fbo checks were replaced with
FlipY checks. rb->Name is also used to determine
if a buffer is winsys.
v2: Fixes annotation [for emil]
Fixes: ab05dd183c ("i965: implement GL_MESA_framebuffer_flip_y [v3]")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
We already call nir_rematerialize_derefs_in_use_blocks_impl prior to
calling nir_lower_ssa_defs_to_regs_block so the assertion that all deref
uses in the block should hold. This fixes the following CTS test when
SPIR-V optimization recipe 1:
dEQP-VK.glsl.struct.local.loop_nested_struct_array_vertex
Fixes: 606eb56ab9 "intel/nir: Only lower load/store derefs"
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
If the block in which the jump is inserted is the predecessor of a phi
then we need to remove phi sources otherwise the phi may end up with
things improperly connected. This fixes the following CTS test when
dEQP is run with SPIR-V optimization recipe 1:
dEQP-VK.glsl.functions.control_flow.return_in_nested_loop_vertex
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Timestamp can be zero for example when Flatpak is used. In this
case just disable the cache rather then segfaulting when
incompatible cache items are loaded.
V2: actually return false when mtime is 0.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Unnecessary. While we are at it, remove the check for pre-VI
because it's already checked earlier.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If apps use the MUTABLE bit and the same formats as the image one
in the list, we can still enable TC-compat HTILE. I don't think
this happens often but given the fact that TC-compat HTILE allows
a nice boost in some situations, it's worth checking.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Like we disable DCC/CMASK for small color surfaces as well.
Serious Sam 2017 creates a 1x1 depth surface and I think
it should be faster to do slow clears on the graphics queue
instead of fast clears on compute, and eventually a depth
expand if the surface isn't TC-compatible HTILE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I don't think this is required by Vulkan too.
Ported from RadeonSI (AMDVLK doesn't set it either).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
As discussed in the review of the patch which added the comment:
Nothing happens when a thread is created, because pthread_atfork doesn't
affect creating threads. However, spawning a child process will likely
crash.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We already track if the DMA engine is busy/idle with a flag,
and we emit a packet that waits for all CP DMA operations
to be complete. This is done at end of command buffer because
the kernel doesn't wait for them, and also when emitting
barriers, so it should be safe.
This improves small copies for both aligned and unaligned sizes.
Aligned sizes:
BEFORE:
1 KB: 59.840000 ms
2 KB: 71.200000 ms
AFTER:
1 KB: 31.200000 ms
2 KB: 31.040000 ms
Unaligned sizes:
BEFORE:
2 KB: 68.3200 ms
3 KB: 79.3600 ms
5 KB: 76.6400 ms
9 KB: 90.8800 ms
17 KB: 116.0000 ms
AFTER:
2 KB: 31.0400 ms
3 KB: 32.0000 ms
5 KB: 30.8800 ms
9 KB: 30.5600 ms
17 KB: 29.6000 ms
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
According to my benchmark results, it appears that we should
reduce the threshold to 1024.
BEFORE:
1 KB: 68.656000 ms
2 KB: 118.368000 ms
AFTER:
1 KB: 31.760000 ms
2 KB: 29.840000 ms
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's unnecessary because we can just check if the timestamp
is to different to the default value when a pool is created
or resetted. Instead of waiting for the availability bit to
be 1, we have to emit a not equal WAIT_REG_MEM for checking
if the timestamp is ready.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Pulling this logic out means we can share the logic and avoid a couple
of temporary variables that helped make things clearer before. Note
that in either vismode case, we always program vismode 0.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Now that we've copied the emit logic into each branch of the
if (info->index_size) statement, we can simplify the logic a bit
according to which case we're in.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
This splits the two code paths into separate functions and moves the
"if (info->indirect)" test into draw_impl().
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
This way the markers clearly bracket the draw call and isn't
duplicated for both direct and indirect draw code.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
According to the following definition,
int AtomicCompSwap(inout int mem, uint compare, uint data);
the preceding one in atomic_comp_swap of NIR is compare and data is
followed, while src0 for cmpxchg needs vec2(data, compare)
So for ssbo/image deref comp_swap, that should be reversed.
Fixes: dEQP-GLES31.functional.image_load_store.*.atomic.comp_swap*
Possibly these bits mean something else now. Blob always seems to use
FOUR_QUADS, and changing to TWO_QUADS seems to cause different threads
to overlap registers.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fix a few bits of confusion, as with previous gen's constlen is aligned
to 4, and value in bitfield is left-shifted by 2 (ie. divided by 4).
But this is done by the CONSTLEN() accessor/builder fxn, so don't do it
twice. Also HLSQ_FS_CNTL.CONSTLEN is not special.
Signed-off-by: Rob Clark <robdclark@gmail.com>
batch_flush_reset_dependencies() expects to be called unlocked, and can
call fd_batch_reference() which can try to aquire the screen lock again.
Signed-off-by: Rob Clark <robdclark@gmail.com>
In c3d9f29b we allowed ctx->batch to be null, and started tracking the
current framebuffer state in fd_context. But the existing logic in
fd_blitter_pipe_begin() would, if !ctx->batch, set null fb state to be
restored after blit. Which broke the world of deqp (and probably other
things)
Fixes: c3d9f29b78 freedreno: allocate ctx's batch on demand
Signed-off-by: Rob Clark <robdclark@gmail.com>
This is defined to always clear the entire surface(s) specified,
regardless of scissor state.. mesa/st will turn scissored clears
into a draw. So rip about a bunch of unnecessary machinery.
Also remove a comment that was obsolete since using u_blitter to
turn clear into draw (for the cases where there isn't a hw blitter
fast-path).
Signed-off-by: Rob Clark <robdclark@gmail.com>
The logic to force a flush every draw was short-circuited with newer
kernels. Also it should apply to clears as well.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The effective scissor changes based on rasterizer->scissor flag, so we
need to re-emit scissor state when rasterizer state changes.
Signed-off-by: Rob Clark <robdclark@gmail.com>
In st_renderbuffer_alloc_storage, we avoid allocating storage for
zero-sized buffers, leading to this pointer being NULL. We already
take care to avoid dereferencing these pointers for color-buffers,
but not for depth/stencil-buffers.
So let's thread a bit more carefully here.
This avoids a crash while running Piglit's glx/glx-visuals-stencil
test, both on virgl and r600g.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Guillaume Charifi <guillaume.charifi@sfr.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Use sample rate shading instead, should give better locality.
Makes Nier with 8x msaa on a Raven go 5 fps -> 7 fps in the menu.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
I was about to make the claim to someone that every field in isl_surf
is either an enum or has explicit units. Then I looked at isl_surf and
discovered this claim was wrong. We should fix that. This commit does
a few refactors:
* Add _B suffixes to some struct fields
* Add _B to some variables and parameters
* Rename row_pitch_tiles -> row_pitch_tl
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Previously if only ff vs or only ff ps was used,
the constants for both were marked as updated,
while only the constants of the used ff shader
were updated.
Now that NINE_STATE_FF_VS and
NINE_STATE_FF_PS do not intersect anymore,
we can correctly mark the correct set of constant
as updated.
Fixes: https://github.com/iXit/Mesa-3D/issues/319
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
NINE_STATE_FF_OTHER was mostly ff vs states.
Rename it to NINE_STATE_FF_VS_OTHER and
move common states with ps to
NINE_STATE_FF_PS_CONSTS (renamed from
NINE_STATE_FF_PSSTAGES).
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Some states only affect the ff shader,
not its constants.
Currently we don't check anything and
always recompute the ff shader key.
However we do check for NINE_STATE_FF_OTHER
and if set we reupload some constants.
Thus for those states which had NINE_STATE_FF_OTHER
set but didn't need it,
replace by a dummy ff shader state (which is
easier to understand for an external reader than
just setting 0 and more future proof).
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
The pointsize states were missing the ff
NINE_STATE_FF_OTHER flag, and thus might
miss state updates when using ff.
Fixes some wine tests.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Rename NINE_STATE_FOG_SHADER,
NINE_STATE_POINTSIZE_SHADER and NINE_STATE_PS1X_SHADER
into
NINE_STATE_VS_PARAMS_MISC and NINE_STATE_PS_PARAMS_MISC.
The behaviour is unchanged, except one minor change:
D3DRS_FOGTABLEMODE doesn't need to affect VS.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Previously we had already found that for
MANAGED buffers the buffer started dirty
(which meant all writes out of bound
before the first draw call using the
buffer have to be taken into account).
Possibly it is the same for the other types of buffers.
For now always lock the entire buffer (starting from the offset)
for these (except for DYNAMIC buffers, which might hurt
performance too much).
Fixes: https://github.com/iXit/Mesa-3D/issues/301
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
The previous code was ignoring the input
until a cursor is set inside d3d
(with SetCursorProperties), as expected
by wine tests.
However it did still make a call to ID3DPresent_SetCursor,
which would result into a SetCursor(NULL) call, thus
hidding any cursor set outside d3d, which we shouldn't do.
Add comment about not avoiding redundant ID3DPresent_SetCursor
calls once a cursor has been set in d3d, as it has been tested to
cause regressions.
Fixes: https://github.com/iXit/Mesa-3D/issues/197
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
For some applications SetCursorPosition
is called when a cursor event is received.
Our SetCursorPosition was always calling
wine SetCursorPos which would trigger
a cursor event.
The infinite loop is avoided by not calling
SetCursorPos when the position hasn't changed.
Found thanks to wine tests.
Fixes irresponsive GUI for some applications.
Fixes: https://github.com/iXit/Mesa-3D/issues/173
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
d3d9_get_pipe_depth_format_bindings assumes the input format
is a depth stencil format.
Previously the user could hit this function with an invalid format.
Protect the last non protected call with a depth_stencil_format check.
Another solution is to have d3d9_get_pipe_depth_format_bindings
support non depth stencil format, but we don't want the user
to create depth buffers with d3d formats that can't be one,
it's better to check if the format can be depth buffer with d3d.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
colorarg0, etc are 3 bits wide.
Make the code more readable by adding an & 0x7
to further indicate we only remember the first 3 bits only.
The 4th bit is always 0,
and colorarg_b4, colorarg_b5, etc are used to store
the 5th and 6th bits.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
When using csmt, ff shader creation happens on the csmt
thread. Creating the shaders, then calling RefToBind causes
the device ref to be increased then decreased.
However the device dtor assumes than no work pending on the
csmt thread could increase the device ref, leading to hang.
The issue is avoided by creating the shaders with a bind
count directly.
Fixes: https://github.com/iXit/Mesa-3D/issues/295
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Emulate perspective interpolation of depth for programmable ps fog
ff ps fog uses position z, or 1/w depending on the ff
projection matrix set. This is according to public documents
found describing the algorithm and tests we made.
In the case of programmable ps, we used position's z,
which was sufficient to pass wine tests (which test shaders
don't set w).
Issue https://github.com/iXit/Mesa-3D/issues/315 showed
that this calculation was wrong.
Using perspective interpolation on z, that is using z * 1/w
seems to satisfy both this application and wine tests.
Fixes: https://github.com/iXit/Mesa-3D/issues/315
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tests done on several devices of all 3 vendors and
of different generations showed that there are several
ways of handling infs and NaN for d3d9.
Tests showed Intel on windows does always clamp
RCP, RSQ and LOG (thus preventing inf/nan generation),
for all shader versions (some vendor behaviours vary
with shader versions).
Doing this in nine avoids 0*inf issues for drivers
that can't generate 0*inf=0 (which is controled by
TGSI's MUL_ZERO_WINS).
For now clamp for all drivers. An ulterior optimization
would be to avoid clamping for drivers with MUL_ZERO_WINS
for the specific shader versions where NV or AMD don't
clamp.
LOG and RSQ being already clamped, this patch only
clamps RCP.
Fixes: https://github.com/iXit/Mesa-3D/issues/316
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Since blending will be disabled later for integer formats we have to
consider that in the case of a mixed set of integer/non-integer format
buffers blending must be handled on a per buffer basis.
Fixes on r600:
dEQP-GLES31.functional.draw_buffers_indexed.random.
max_required_draw_buffers.13
Fixes: 8fb966688b
st/mesa: Disable blending for integer formats.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
That warning fires every time a string function takes an argument that
could possibly be longer than its max output, which triggers all over
the place, especially when working with file paths ("what if every file
path is MAX_PATH long?" is what GCC is saying, which is really annoying
when we *know* that "/dev/dri/cardN" is not gonna be 4096 char long and
it's safe to store it in a 32-char array).
Anyway, we either add a ton of dead code all over the place to make GCC
happy, or we get rid of its spam. I chose the latter.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
A ZPASS_DONE packet doesn't make sense for the compute queue. It will
result in a gpu hang.
This change resolves a gpu hang for SteamVR+Vega.
Cc: mesa-stable@lists.freedesktop.org
Fixes: 1f616a840e "radv: emit a dummy ..."
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This allows NIR to CSE more operations. LLVM does this also so the
impact is limited, however doing this in NIR allows other opts to
make progress. For example in radeonsi more loops are unrolled in
Civilization Beyond Earth.
The actual pipeline-db stats are not overwhelming but even in the
negatively affected shaders the NIR is clearly better. It just
happens that the code shuffling and in some cases calls to max
rather than a flt result in the final output from LLVM not
giving as good numbers.
However this is an incremental opt that further passes build off
so the change should be made IMO.
Totals from affected shaders:
SGPRS: 20192 -> 20184 (-0.04 %)
VGPRS: 19516 -> 19524 (0.04 %)
Spilled SGPRs: 437 -> 444 (1.60 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 1527444 -> 1522276 (-0.34 %) bytes
LDS: 6 -> 6 (0.00 %) blocks
Max Waves: 1018 -> 1016 (-0.20 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
By adding `_llvm == 'true'` to the required argument we can check the
'auto' and 'true' case in one path.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
NVC0_CB_AUX_BINDLESS_INFO isn't written to on Maxwell+ and it's too small
anyway.
With these changes, TXQ is used to determine the number of samples and
the coordinate adjustment information looked up in a small array in the
driver constant buffer.
v2: rework to use TXQ and a small array instead of a larger array with an
entry for each texture
v3: get rid of the small array and calculate the adjustments in the shader
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: c2ae9b4052 ('nvc0: implement multisampled images on Maxwell+')
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
I also updated the developer instructions; presumably someone who's been
given commit rights already knows how to clone a repository :)
A more useful thing is to show how to update the pushurl, and how to use
access tokens to push over HTTPS (especially for us at Intel, where
non-http traffic is a pain).
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
It's just over 10 months since 17.3.0 was released with s3tc support enabled.
Probably a good idea to update the FAQ page.
v2: Incorporate feedback from Adam Jackson <ajax@redhat.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes: 04396a134f ("mesa: Import libtxc_dxtn sources")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
SDL has some shaders that compute sin(angle) and cos(angle) for a rotation
matrix in the VS, and angle is usually 0.0. Our previous implementation
had quite a bit of error around 0.0, causing single-pixel rotations at
typical window sizes. SDL2 has changed as of August 28th (commit
12156:e5a666405750) to not need sin/cos in the VS, but we should still fix
this for existing implementations or similar patterns that other programs
may have.
glsl-cos goes from 32 instructions to 36, but 9 uniforms to 7.
glsl-sin goes from 32 instructions to 34, but 8 uniforms to 7.
This seems like a fine impact to have for the bugfix.
Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>
Fixes: https://github.com/anholt/mesa/issues/110
v2: - fix typo
These are how FreeBSD and Debian handle multiple versions of LLVM
installed at the same time, respectively.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
We already correctly handle va being auto, but we force it to being
true, which is bad.
Fixes 94cf397092
("meson: Fix auto option for va")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Corrects building glx as gallium-xlib without any dri targets.
v2: - fix ugly formatting
Fixes: 66c94b9313
("meson: build gallium winsys for dri, null, and wrapper")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This was added as part of 1.1 but it's very hard to track exactly what
extension added it. In any case, we should implement it.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dave Airlie <Airlied@redhat.com>
The throughput is similar to 32-bit integers on GFX8 and
AMDVLK does not expose 16-bit integers on pre Vega as well.
On GFX9+, only LLVM 7+ has support.
This fixes a bunch of CTS crashes on GFX9/LLVM 6.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This patch fixes uninitialized fields in DefineDepthStencilView and
DefineStreamOutput commands that are not relevant in SM4 device.
Reviewed-by: Brian Paul <brianp@vmware.com>
The function pointer declarations in dd.h for the BufferData() and
BufferSubData() use the ARB-suffixed datatypes. This patch changes
the buffer_data_fallback() and buffer_sub_data_fallback() functions
to use those datatypes too.
This fixes a build warning when building 32-bit libraries. Evidently,
GLsizeiptrARB and GLsizeiptr are defined differently in that situation.
All all implementations of these driver hooks use the ARB-suffixed
types.
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
With this patch, svga driver will start advertising OpenGL 3.3
compatibility profile.
Tested with some mesa demos, piglit and glretrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We need to convert unnormalized texcoords to normalized texcoords
when we are sampling from texture. We don't need this conversion
if there is no sampler view.
Tested with piglit, glretrace
Fixes vmware bug 2101970
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
In gallium, the layer index of a texture array to be mapped
is specified in the z component, whereas in svga device, the
index is specified in a separate argument.
Currently in svga_texture_transfer_map(), we explicitly modify
the z value in the base transfer map to 0 so the layer offset will not be
applied twice, but this causes problem when state tracker later
refers to the base transfer map and expects the slice index to be
specified in z (commit 463b0ea1f6).
To fix the problem, this patch makes a local copy of the box in
svga_transfer and modifies the z value in this copy instead.
Fixes spec@khr_texture_compression-astc piglit test crashes.
Fixes regression in the dma path with commit 1fdd3dd94a.
Tested with mtt glretrace, piglit on Windows VM and Linux VM.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since commit 75881bed9e "i965: Rework the TCS passthrough shader to
use NIR." the created nir_shader is not dummy, and it is compiled by
the backend like the others.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently u_math needs gallium utils for cpu detection. Most of what
u_math uses out of u_cpu_detection is duplicated in src/mesa/x86
(surprise!), so I've just reworked it as much as possible to use the
x86/common_x86_features.h macros instead of the gallium ones. The mesa
implementation is a header only approach, with no external dependencies.
There is one small function that was copied over, as promoting
u_cpu_detection is itself a fairly hefty undertaking, as it depends on
u_debug, and this fixes the bug for now.
bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107870
Tested-by: Vinson Lee <vlee@freedesktop.org>
Unlike the other platforms, here we aim do guess if the device that we
somewhat arbitrarily picked, is supported or not.
In particular: when a vendor is _not_ requested we loop through all
devices, picking the first one which can create a DRI screen.
When a vendor is requested - we use that and do _not_ fall-back to any
other device.
The former seems a bit fiddly, but considering EGL_EXT_explicit_device and
EGL_MESA_query_renderer are MIA, this is the best we can do for the
moment.
With those (proposed) extensions userspace will be able to create a
separate EGL display for each device, query device details and make the
conscious decision which one to use.
v2:
- update droid_open_device_drm_gralloc()
- set the dri2_dpy->fd before using it
- return a EGLBoolean for droid_{probe,open}_device*
- do not warn on droid_load_driver failure (Tomasz)
- plug mem leak on dri2_create_screen failure (Tomasz)
- fixup function name typo (Tomasz, Rob)
v3:
- add forward declaration for droid_load_driver()
Fixes the HAVE_DRM_GRALLOC build (Mauro)
- split dup() assignment and check in separate lines (Tomasz, Eric)
- make droid_load_driver() static (Tomasz)
- drop unused prop_set variable (Tomasz)
v4:
- rebase
- fwd declarationi should be for droid_probe_device()
Cc: Robert Foss <robert.foss@collabora.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
do_assignment validated assigment but when rhs type was not compatible
it proceeded without issues and returned error_emitted = false.
On the other hand process_initializer expected do_assignment to always
return compatible type and never fail.
As a result when variable was initialized with incompatible type
the type of variable changed to the incompatible one.
This manifested in unnecessary error messages and in one case in crash.
Example GLSL:
vec4 tmp = vec2(0.0);
tmp.z -= 1.0;
Past error messages:
initializer of type vec2 cannot be assigned to variable of type vec4
invalid swizzle / mask `z'
type mismatch
operands to arithmetic operators must be numeric
After this patch:
initializer of type vec2 cannot be assigned to variable of type vec4
In the other case when we initialize variable with incompatible struct,
accessing variable's field leaded to a crash. Example:
uniform struct {float field;} data;
...
vec4 tmp = data;
tmp.x -= 1.0;
After the patch there is only error line without a crash:
initializer of type #anon_struct cannot be assigned to variable of
type vec4
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107547
The only thing that matters is the size since we never specify any
offsets in terms of blocks.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
All CTS pass on Polaris/Vega with LLVM 6, 7 and master, so
I think it's safe to enable the feature.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.depth_stencil.d32_sfloat_s8_uint_d32_sfloat_s8_uint.optimal_optimal_nearest
And all friends that try to blit a surface with different
depth and stencil formats.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If we update the program-state etc, we risk compiling needless shaders,
which can cost quite a bit of performance.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
CC: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 82799a5d1b8 ("nir: Add a small pass to rematerialize derefs
per-block")
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The lcssa and phis_to_regs passes are used by various NIR optimizations
that modify the CFG. Putting a couple of asserts will help ensure that
we don't accidentally put derefs in phis as part of an optimization
pass.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
When we're about to re-arrange a bunch of blocks, it's a good idea to
make sure that we don't have deref uses crossing block boundaries.
Otherwise we may end up with a deref going through a phi and that would
be bad.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
This pass re-materializes deref instructions on a per-block basis to
ensure that every use of a deref occurs in the same block as the
instruction which uses it.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
This reverts commit 90819abb56.
This logic was wrong, the original code is correct. The direct
impact is that we allocate up to approximately a squared amount
of memory compared to what we should allocate.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Only supported by GFX9+.
The conservativeraster Sascha demo seems to work as expected.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Since user defined names are not allowed in core profile
we remove the allow_user_names bool and just check if
we have a core profile like all other buffer/texture
object handling code does.
This extension is required by "Wolfenstein: The Old Blood"
and is exposed in core in the Nvidia binary driver.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This game is looking for some odd extension after creating a core
context such as ARB_vertex_program and EXT_framebuffer_object.
Rather then enabling these in core this forces the game to use
compat. This allows the game to run and seems to work without
issues. All other id tech games/engines use a compat profile.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The HW for FLUSH_ALL_STATE isn't validated, since the closed driver only
uses FLUSH. Now that we don't have any new state at the end of our bin
CLs, follow their lead.
Ever since we added OQ support, we've been clearing OQ state at the start
of the job anyway. We're intentionally breaking old-and-new-driver-mix
systems, because we need to stop using the unvalidated FLUSH_ALL_STATE.
The HW's FLUSH_ALL_STATE is not validated, so we probably shouldn't use
it, meaning that we need to reset state at the start. By doing this, we
also make ourselves more resilient to another client leaving the TF state
enabled at the end of their batch (as we now do, ourselves).
However, we still need to emit a single TF disable at the end of the
frame, for SWVC5-718.
Currently gallium's xlib target will fail to link due to multiple
definitions of all the symbols in libmesautil, this only shows up in
autotools, and not in meson due to differences in the way that meson and
autotools handle linking static archives into static archives. Autotools
uses -Wl,--whole-archive implicitly, meson requires this behavior to be
opted-into. The solution is just to remove libmesautils from the
libgl-xlib target, since it will get all of those symbols form
libmesagallium.
I've dropped the link from meson as well, it doesn't seem to hurt
anything and should make linking just a little faster.
Fixes: 8396043f30
("Replace uses of _mesa_bitcount with util_bitcount")
bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107923
Tested-by: Brian Paul <brianp@vmware.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Cc: Sergii Romantsov<sergii.romantsov@globallogic.com>
Rather than trying to encode all of the rules in a header, lets just put
them in the build system where they belong. This fixes the build on
FreeBSD, which does have pthraed_setaffinity_np, but it's in a
pthread_np.h, not behind _GNU_SOURCE. FreeBSD also implements cpu_set
slightly differently, so additional changes would be required to get it
working right there anyway.
v2: - fix #define in autotools
Fixes: 9f1bbbdbbd
("util: try to fix the Android and MacOS build")
Cc: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Missing break; causes parameter checking to
never pass GL_FRAMEBUFFER_FLIP_Y_MESA parameters.
Fixes: 318c265160 ("mesa: GL_MESA_framebuffer_flip_y extension [v4]")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Instances where direction was determined based on
winsys or user fbo and should be determined based on
FlipY.
Key STATE_FB_WPOS_Y_TRANSFORM for of FlipY instead of
_mesa_is_user_fbo. This corrects gl_FragCoord usage
when applying GL_MESA_framebuffer_flip_y.
Fixes: ab05dd183c ("i965: implement GL_MESA_framebuffer_flip_y [v3]")
Reviewed-by: Brian Paul <brianp@vmware.com>
To get an useful UUID for systems that have a non-useful mtime
for the binaries.
I started using SHA1 to ensure we get reasonable mixing in the
various possibilities and the various build id lengths.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Not sure if this is all wired up. CTS does pass and the Tangrams
demo works fine on Vega. There are corruption issues on Polaris
but not sure if that related to 16-bit support.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Instead of passing around BOs and offsets, use addresses which are anv's
GPU equivalent of pointers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Each query slot is a uint64_t and we were only zeroing half of it.
Fixes: 7ec6e4e689 "anv/query: implement multiview interactions"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of computing an index at the end which we hope maps to the
number of things written, just count the number of things as we go.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
No shader-db changes on any Intel platform... which probably explains
why no bugs have been bisected to this problem since it landed in Mesa
18.1. :( The commit mentioned below is in 18.2, so 18.1 would need a
slightly different fix (due to code refactoring).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 77f269bb56 "i965/fs: Refactor propagation of conditional modifiers from compares to adds"
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (reviewed the original patch)
Cc: Matt Turner <mattst88@gmail.com> (reviewed the original patch)
Apparently for compute there are only 16 instead of the 32 for the
graphics path.
Fixes dEQP-VK.binding_model.descriptorset_random.sets16.noarray.ubolimitlow.sbolimitlow.imglimitlow.noiub.comp.0
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes the following building error in vc4 build:
In file included from external/mesa/src/gallium/drivers/vc4/kernel/vc4_render_cl.c:34:
In file included from external/mesa/src/gallium/drivers/vc4/kernel/vc4_drv.h:27:
In file included from external/mesa/src/gallium/drivers/vc4/vc4_simulator_validate.h:34:
In file included from external/mesa/src/gallium/drivers/vc4/vc4_context.h:39:
In file included from external/mesa/src/gallium/drivers/vc4/vc4_cl.h:56:
gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h:12:10:
fatal error: 'cle/v3d_packet_helpers.h' file not found
^~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Fixes: 5b102160ae ("broadcom/genxml: Introduce a V3D packet/struct decoder.")
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Fixes the following building error:
In file included from external/mesa/src/broadcom/cle/v3d_decoder.c:38:
In file included from external/mesa/src/broadcom/cle/v3d_packet_helpers.h:29:
external/mesa/src/gallium/auxiliary/util/u_math.h:42:10:
fatal error: 'pipe/p_compiler.h' file not found
^~~~~~~~~~~~~~~~~~~
1 error generated.
Fixes: 5b102160ae ("broadcom/genxml: Introduce a V3D packet/struct decoder.")
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Fixes the following building error, happening when building both intel and broadcom:
Gen Header: libmesa_broadcom_genxml_32 <= v3d_packet_v21_pack.h
FAILED: gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h
/bin/bash -c "python external/mesa/src/broadcom/cle/gen_pack_header.py \
external/mesa/src/broadcom/cle/v3d_packet_v21.xml \
> gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h"
Traceback (most recent call last):
File "external/mesa/src/broadcom/cle/gen_pack_header.py", line 626, in <module>
p = Parser(sys.argv[2])
IndexError: list index out of range
header-gen macro is already defined by Intel genxml building rules
and the existing header-gen does not have the $(PRIVATE_VER) argument,
infact the bash command line logged in the building error is missing
exactly $(PRIVATE_VER) argument
Renaming the macro as pack-header-gen in src/broadcom/Android.genxml.mk
solves the building error, another possible way is to keep the gen rules
commands expanded and not use the macros.
Fixes: 7f80a9ff13 ("vc4: Introduce XML-based packet header generation like Intel's.")
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
The offsets now come from the anv_address, these references were not
updated and using the old variable.
Fixes: e1ab834557 "anv/memcpy: Use addresses instead of bo+offset"
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
This shouldn't matter as we'll never write OOB anyway but we may as well
get it right. It's supposed to be in dwords - 1.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Because this was setting image to true we would end up calling
si_load_image_desc() when we sould be calling
si_load_sampler_desc().
This fixes an assert() in Deus Ex: MD
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When using Freecad, I was getting intermittent segfaults inside of
mesa. I traced it down to this path in st_cb_drawpixels.c where the
result of pipe_transfer_map wasn't being checked. In my case, it was
returning NULL because nouveau_bo_new returned ENOENT. I'm by no
means a mesa developer, but this patch solves the problem for me and
seems reasonable enough.
v2: Marek - also unmap the PBO and release the texture, and call
the make_texture function sooner for less cleanup
Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>
It shouldn't be needed to emit the initial graphics or compute
state when beginning a new command buffer. Emitting them in
the preamble should be enough and this will reduce IB sizes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Indirect descriptors only need one entry, we don't have to
emit a location for every descriptors.
Fixes GPU hangs with new CTS:
dEQP-VK.binding_model.descriptorset_random.*
CC: 18.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Let say, we first bind a graphics pipeline that needs indirect
descriptors sets. The userdata pointers will be emitted at draw
time. Then if we bind a compute pipeline that doesn't need any
indirect descriptors, the driver will re-emit them for all
grpahics stages.
To avoid this to happen, just check the bind point type.
CC: 18.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This was wrong for descriptor #0 when all of them are indirect.
This is because indirect_offset was 0 and we emitted a
"normal" descriptor pointer for nothing.
While we are at it remove
radv_userdata_info::indirect_offset which is useless.
CC: 18.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
According to RadeonSI, it's unnecessary to multiply by
the stride. That field seems to always be 64.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Instead of having holes. The other ring parameters like
offset and stride can be updated later.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is just for consistency because LLVM can detect and
remove unused loads.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Same code is generated because LLVM ends up by using bfe, but
that seems cleaner to me.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In GLSL IR we cheat with switch statements and simply convert them
into loops with a single iteration. This allowed us to make use of
the existing jump instruction handling provided by the loop handing
code, it also allows dead code to be cleaned up once we have
wrapped the code in a loop.
However using loops in this way created previously unrollable loops
which limits further optimisations. Here we provide a way to unroll
loops that end in a break and have multiple other exits.
All shader-db changes are from the dolphin uber shaders. There is a
small amount of HURT shaders but in general the improvements far
exceed the HURT.
shader-db results IVB:
total instructions in shared programs: 10018187 -> 10016468 (-0.02%)
instructions in affected programs: 104080 -> 102361 (-1.65%)
helped: 36
HURT: 15
total cycles in shared programs: 220065064 -> 154529655 (-29.78%)
cycles in affected programs: 126063017 -> 60527608 (-51.99%)
helped: 51
HURT: 0
total loops in shared programs: 2515 -> 2308 (-8.23%)
loops in affected programs: 903 -> 696 (-22.92%)
helped: 51
HURT: 0
total spills in shared programs: 4370 -> 4124 (-5.63%)
spills in affected programs: 1397 -> 1151 (-17.61%)
helped: 9
HURT: 12
total fills in shared programs: 4581 -> 4419 (-3.54%)
fills in affected programs: 2201 -> 2039 (-7.36%)
helped: 9
HURT: 15
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- only allow nir_op_inot or nir_op_b2i when alu input is 1.
- use some helpers as suggested by Jason.
v3:
- evaluate alu op for single input alu ops
- add helper function to decide if to propagate through alu
- make use of nir_before_src in another spot
shader-db IVB results:
total instructions in shared programs: 9993483 -> 9993472 (-0.00%)
instructions in affected programs: 1300 -> 1289 (-0.85%)
helped: 11
HURT: 0
total cycles in shared programs: 219476091 -> 219476059 (-0.00%)
cycles in affected programs: 7675 -> 7643 (-0.42%)
helped: 10
HURT: 1
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Since we know what side of the branch we ended up on we can just
replace the use with a constant.
All the spill changes in shader-db are from Dolphin uber shaders,
despite some small regressions the change is clearly positive.
V2: insert new constant after any phis in the
use->parent_instr->type == nir_instr_type_phi path.
v3:
- use nir_after_block_before_jump() for inserting const
- check dominance of phi uses correctly
v4:
- create some helpers as suggested by Jason.
v5 (Jason Ekstrand):
- Use LIST_ENTRY to get the phi src
shader-db results IVB:
total instructions in shared programs: 9999201 -> 9993483 (-0.06%)
instructions in affected programs: 163235 -> 157517 (-3.50%)
helped: 132
HURT: 2
total cycles in shared programs: 231670754 -> 219476091 (-5.26%)
cycles in affected programs: 143424120 -> 131229457 (-8.50%)
helped: 115
HURT: 24
total spills in shared programs: 4383 -> 4370 (-0.30%)
spills in affected programs: 1656 -> 1643 (-0.79%)
helped: 9
HURT: 18
total fills in shared programs: 4610 -> 4581 (-0.63%)
fills in affected programs: 374 -> 345 (-7.75%)
helped: 6
HURT: 0
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When we're mapping temp-resources, we clip the resource to the
transfer-box, which means the stride might not be correct any more.
So let's update the stride from the temp-resource, and recompute the
layer-stride.
This fixes crashes when running dEQP with --deqp-gl-config-name=rgba8888d24s8ms4
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: a8987b88ff "virgl: add driver for virtio-gpu 3D (v2)"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Those operations do not map to actual hardware instructions, therefore
those should always be lowered to 32-bit instructions.
Fixes: 009c54aa7a "nv50/ir: Split 64-bit integer MAD/MUL operations"
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Using output buffer with 8 bits video RGB as back buffer
certainly is not working for 30 bits color depth visual.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
v2: Tell B10G10R10X2 and R10G10B10X2 formats for different HW.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
We only need to wait for the fence before drawing to a buffer, not
before reading from it.
This might avoid hangs when re-allocating the fake front buffer, similar
to the previous change. But I haven't seen any evidence that this was
actually happening in practice.
Tested-by: Olivier Fourdan <ofourdan@redhat.com>
From Section 4.6.4 (Invariance and Linkage) of the GLSL ES 1.0 specification
"The invariance of varyings that are declared in both the vertex and
fragment shaders must match. For the built-in special variables,
gl_FragCoord can only be declared invariant if and only if
gl_Position is declared invariant. Similarly gl_PointCoord can only
be declared invariant if and only if gl_PointSize is declared
invariant. It is an error to declare gl_FrontFacing as invariant.
The invariance of gl_FrontFacing is the same as the invariance of
gl_Position."
Fixes:
* glsl-pcoord-invariant.shader_test
* glsl-fcoord-invariant.shader_test
* glsl-fface-invariant.shader_test
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107734
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Currently position is set before widgets are sized by gtk and
calculation can get wrong results where window is positioned
offscreen. Patch fixes this by setting aubfile window position
as 0,0 only when size_allocate has been called to the widget.
Now window is always positioned to 0,0 if imgui.ini is missing.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
If we end up never taking the loop that writes ret, we can end up with
an uninitialized value, and if we're *really* unlucky, that value can
be -1, causing us to go down an error-path instead of a success path.
This was obviously not intended, so let's just initialize this to zero.
Noticed by Valgrind:
Conditional jump or move depends on uninitialised value(s)
at 0xBA640A0: virgl_drm_winsys_resource_cache_create (virgl_drm_winsys.c:348)
by 0xBA62FCF: virgl_buffer_create (virgl_buffer.c:170)
by 0xBA605AC: virgl_resource_create (virgl_resource.c:60)
by 0xBCF816F: bufferobj_data (st_cb_bufferobjects.c:344)
by 0xBCF816F: st_bufferobj_data (st_cb_bufferobjects.c:390)
by 0xBB7E836: vbo_use_buffer_objects (vbo_exec_api.c:1136)
by 0xBCFCC6E: st_create_context_priv (st_context.c:414)
by 0xBCFD3CD: st_create_context (st_context.c:590)
by 0xBBB30CA: st_api_create_context (st_manager.c:896)
by 0xB981E76: dri_create_context (dri_context.c:155)
by 0xB97BDCE: driCreateContextAttribs (dri_util.c:473)
by 0x5288331: dri3_create_context_attribs (dri3_glx.c:309)
by 0x5264D64: glXCreateContextAttribsARB (create_context.c:78)
Fixes: a8987b88ff ("virgl: add driver for virtio-gpu 3D (v2)")
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Gallium may pick L16A16_FLOAT to represent GL_INTENSITY16F if no intensity
format is provided by the driver. However, when calling
glGetTexLevelParameteriv(..., GL_TEXTURE_INTENSITY_SIZE, ...)
mesa will return a zero size because the actually used format has no
intensity channel and as a fallback only the sizes of the red/green
channels are checked.
Also checking for LA sizes in the allocated texture resolves this problem.
v2: Only check alpha channel size and return it (Marek)
L and A size are always the same in this case.
Fixes (on virgl):
ext_framebuffer_multisample-fast-clear GL_ARB_texture_float *
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107832
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Not handling caps explicitly means that we're likely getting incorrect
values -- these need to be reviewed and set appropriately.
While we're at it, add in some missing caps, and set all the subpixel
stuff to 8 as that seems to be what the blob reports.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This removes duplicate tests from gl_core_functions_possible
that are already covered by common_desktop_functions_possible.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This patch fixes the following Piglit test:
spec@egl_mesa_configless_context@basic
It also fixes few test in a virgl guest.
v2: Evaluate the value of no_config (Ilia)
Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This patch extends the format_conversion table to support
different view formats on texture buffer.
For legacy image formats such as INTENSITY, LUMINANCE, LUMINANCE_ALPHA,
special swizzle masks will be used on the red or RG channels.
This fixes piglit test arb_texture_buffer_object-formats fs|vs arb
Reviewed-by: Brian Paul <brianp@vmware.com>
The svga_reemit_gs_bindings function is no longer needed. Remove it.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes a crash since the variant object isn't allocated until later
in the function. Not sure how this got through.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This patch makes sure a valid color buffer is bound before
checking its resource. This fixes Unigine Valley running in SM41 device.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes some of tests cases in arb_copy_image-formats and also fixes
SurfaceCopy related errors in vmware.log when multi sampled surfaces are
used.
Tested with piglit, glretrace on windows and linux VM.
v2: As per Brian's comment
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
According to the current SVGA contract, any view format can be
used on the underlying resource that is multisample. So there
is no need to check the MULTISAMPLE devcap for the view format.
Fixes black rendering issue with Tropics running with 4xMSAA.
Reviewed-by: Brian Paul <brianp@vmware.com>
Explicit set the DXFMT_SHADER_SAMPLE bit for depth stencil formats
for pre-SM41 device only. This bit is now set by the SM41 device.
Reviewed-by: Brian Paul <brianp@vmware.com>
See comments for details. This allows the piglit
ext_framebuffer_multisample-point-smooth test to pass.
Also, test the pipe_rasterizer_state::point_quad_rasterization field
to see if sprite point rasterization is needed because it's possible
for no sprite_coord_enable bits to be set when drawing sprites.
Finally, remove old, stale comments.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
If real MSAA is not available, we only support 1 sample/pixel. In that
case, we must not declare MSAA resources or emit MSAA opcodes. Do that
by checking the sample count.
Fixes several piglit MSAA tests, such as
arb_texture_multisample-sample-depth (when the hard-coded sample count
of 4 is fixed in that test).
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We removed the special cases referred to in this comment in the commit
"svga: add a separate function to get dx format capabilities from
vgpu10 device".
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Technically, SM4.1 doesn't support cube map arrays, but our backend
renderers actually do. This allows the Piglit textureGather cube
map array tests to pass.
Tested with GLrenderer, DX11renderer and SWrenderer.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Force direct map on multisample surface.
Fixes SVGA Driver Errors running multisample piglit tests on Linux VM
v2: use texture for the check.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Fixes SVGA Driver Errors with piglit test arb_copy_image-targets
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
In commit e4048f6cd1, svga_is_dx_format_supported() is supposed to
also check the SVGA3D_DXFMT_MULTISAMPLE bit for multisample
support of a format. Somehow that code is not included in that commit.
This patch fixes it.
Fixes piglit test spec@ext_framebuffer_multisample@formats all_samples.
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit e4048f6cd1 unintentionally allows multisample support for VGPU9 device.
This patch fixes this regression.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Currently we have one function to get format capabailities and
we convert DX10 devcaps back to DX9. This can be confusing.
Going forward we will have a separate function for dealing with dx formats.
This patch also fixes the depth stencil devcap. Instead of hardcoding
the capabilities for the depth stencil formats, we will inquire the
device for the capabilities. Note: we will still need to explicity set
the SVGA3D_DXFMT_SHADER_SAMPLE bit for SVGA3D_R32_FLOAT_X8X24 and
SVGA3D_R24_UNORM_X8 since this bit is not advertised but supported
by the device.
v2: reapply the patch after svga_is_format_supported is moved to svga_format.c
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
This patch adds a new function svga_is_dx_format_supported() to check
for format support in a VGPU10 device.
v2: reapply the patch after svga_is_format_supported is moved to svga_format.c
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Fixes piglit tests spec@arb_sample_shading@builtin-gl-sample-position 2
spec@arb_texture_multisample@fb-completeness@2
Reviewed-by: Brian Paul <brianp@vmware.com>
As with 1D and 2D array textures, if there's only one array element
(one cubemap in this case) we have to issue different shader code.
This fixes a number of Piglit cubemap array tests.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
No need to check for HW_CAPS2 or SM4_1 support if VGPU10 is not
enabled or is explicitly disabled via the environment variable
SVGA_VGPU10.
Reviewed-by: Deepak Rawat <drawat@vmware.com>
A new argument "quality level" is added in surface define v3 which
represets precision settings for surface. This commit add support
for quality level in DRM_VMW_GB_SURFACE_CREATE_EXT and
DRM_VMW_GB_SURFACE_REF_EXT.
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
With drm version 2_15, we can inquire for support of HW_CAP2.
If it is supported, we can enable intra_surface_copy support.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
This patch fixes the layer index when rendering to a
backed surface view of a cubemap array.
Fixes piglit test fbo-generatemipmap-cubemap array.
Reviewed-by: Brian Paul <brianp@vmware.com>
Texture swizzling for texture gather needs to be done to the selected texels
rather than to the returned vector. This patch has specical cases
for the different swizzles in emit_tg4().
Fixes a lot of piglit texture gather tests.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Currently, the starting index for system values is assigned to
the next index after the highest index of the tgsi declared input registers.
But the tgsi index might be different from the actual assigned index, hence
this might cause overlap of indices.
With this patch, the shader linker keeps track of the highest index of the
translated input registers, and the next index will be used for the
starting index for system values.
Fixes SHIM errors running arb_copy_image-formats on SM4_1 device.
Reviewed-by: Brian Paul <brianp@vmware.com>
Kernel driver version 2.15 added new surface ioctl named:
DRM_VMW_GB_SURFACE_CREATE_EXT
DRM_VMW_GB_SURFACE_REF_EXT
The new ioctl has support for 64-bit svga3d_flags if
DRM_VMW_PARAM_SM4_1 is available.
Multisampling surface mob size calculation is added. Also synced the
relevant header update.
svga device modified the surface define command V3 with new parameter
multisampling pattern. Adding support for that in winsys.
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The SVGA device is deprecating the DX9 MSAA support.
This patch enables MSAA for SM4_1 device by explicitly
setting the SVGA3D_SURFACE_MULTISAMPLE bit.
For SM4_1 device, only 4 samples is supported.
Reviewed-by: Brian Paul <brianp@vmware.com>
Just translate the TGSI LODQ intruction to VGPU10 LOD instruction.
All (4) Piglit GL_ARB_texture_query_lod tests pass.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
With sm4_1, we can support single channel 2D or CubeMap textures.
This patch exercises this feature.
Tested with piglit
v2: As per Brian's comment
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Vs. sampling at the centroid or the fragment center.
Note that this does not fix failures with the Piglit
arb_sample_shading-interpolate-at-sample-position or
arb_sample_shading-ignore-centroid-qualifier.exe tests at this time.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We translate TGSI system value registers to VGPU10 input registers.
Add a comment and set file = TGSI_FILE_INPUT. That's not stricly
necessary since we map both TGSI_FILE_INPUT and TGSI_FILE_SYSTEM_VALUE
to VGPU10_OPERAND_TYPE_INPUT, but this makes the code a bit more
understandable.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This, with the previous work for sample position/id query, allows
us to enable per-sample shading for VGPU 10.1.
Note that quite a few Piglit arb_sample_shading tests still do not
pass, but many do.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Sample ID is just a system value. Sample position must be implemented
with the VGPU10_OPCODE_SAMPLE_POS instruction.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This patch adds support for GL_ARB_draw_buffers_blend extension
for SM4_1 device.
Fixes piglit test fbo-draw-buffers-blend.
This patch is squashed with a subsequent patch which fixed a
regression.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Fixes piglit test arb_texture_cube_map_array-fbo-cubemap-array
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch adds support for cubemap array texture lookup with
explicit LOD.
Fixes piglit test arb_texture_cube_map_array-cubemap-lod
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch adds support for cubemap array for SM4_1.
Fixes piglit test arb_texture_cube_map_array-cubemap
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Spite using thrd_t types, these functions are wed to pthreads, and break
Windows builds, because thrd_current() is not implemented there, as it's
impossible to have an efficient thrd_current() implementation on
Windows.
Trivial.
When creating textures, we avoid creating backing-store for all
multisampled textures, not just depth buffers.
So we can't try to map them later. That's just going to fail. So
let's take the blit-based code-path that seems to avoid this problem.
This make this piglit test-case no longer crash (although it still
fails):
bin/copyteximage 2D -samples=2 -auto
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
We don't use the size we calculate in this function, so let's just
drop the calculation
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
We always return TRUE, and we never check the return-value. Let's
just drop the return value instead.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
When we fail to map memory, we should also free trans to avoid
leaking memory.
Noticed while reading code.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Introduce a new capability for the maximum value of
pipe_vertex_element::src_offset. Initially just every driver
backend returns the value previously set from _mesa_init_constants.
So this shall end up in no functional change.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
This fixes the situation where we'd send a shader with just the
header and no data.
piglit/glsl-max-varyings test was causing this to happen, and
the renderer fix was breaking it.
v2: drop fprintf
Fixes: a8987b88ff "virgl: add driver for virtio-gpu 3D (v2)"
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
This extension is required by "Wolfenstein: The Old Blood"
and is exposed in core in the Nvidia binary driver.
All the functions are just alias of the core functions so
there should be nothing more to do.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The Vulkan 1.1.81 spec says:
"It is legal for offset.x + extent.width or offset.y + extent.height
to exceed the dimensions of the framebuffer - the scissor test still
applies as defined above. Rasterization does not produce fragments
outside of the framebuffer, so such fragments never have the scissor
test performed on them."
Elsewhere, the Vulkan 1.1.81 spec says:
"The application must ensure (using scissor if necessary) that all
rendering is contained within the render area, otherwise the pixels
outside of the render area become undefined and shader side effects
may occur for fragments outside the render area. The render area
must be contained within the framebuffer dimensions."
Unfortunately, there's some room for interpretation here as to what the
consequences are of having the render area set to exactly the
framebuffer dimensions and having a scissor that is larger than the
framebuffer. Given that GL and other APIs provide automatic clipping to
the framebuffer, it makes sense that applications would assume that
Vulkan does this as well. It costs us very little to play it safe and
just clamp client-provided scissors to the framebuffer dimensions.
Fortunately, the user is required to provide us with at least one
scissor so we don't need to handle the case where they don't.
Fixes: fb2a5ceb32 "anv: Emit DRAWING_RECTANGLE once at driver..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
State trackers will not use the new param directly, but will instead use
a helper in MakeCurrent that does the right thing.
v2: rework the interface
Reviewed-by: Brian Paul <brianp@vmware.com>
and _mesa_bitcount_64 with util_bitcount_64. This fixes a build problem
in nir for platforms that don't have popcount or popcountll, such as
32bit msvc.
v2: - Fix additional uses of _mesa_bitcount added after this was
originally written
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently we have two sets of functions for bit counts, one in gallium
and one in core mesa. The ones in core mesa are header only in many
cases, since they reduce to "#define _mesa_bitcount popcount", but they
provide a fallback implementation. This is important because 32bit msvc
doesn't have popcountll, just popcount; so when nir (for example)
includes the core mesa header it doesn't (and shouldn't) link with core
mesa. To fix this we'll promote the version out of gallium util, then
replace the core mesa uses with the util version, since nir (and other
non-core mesa users) can and do link with mesautils.
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
gen9 hardware has a bug in the sampler cache that can cause GPU hangs
whenever an texture with aux compression enabled is in the sampler cache
together with an ASTC5x5 texture. Because we can't control what the
client binds at any given time, we have two options: resolve the CCS or
decompresss the ASTC. Doing a CCS or HiZ resolve is far less drastic
and will likely have a smaller performance impact.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
There were two bugs working together to make things mostly work: I wasn't
dividing the VPM output size available by the size of a batch (vertex),
but I also had the size of the VPM reduced by a factor of 8.
Fixes dEQP-GLES3.functional.vertex_array_objects.all_attributes and it
seems also my intermittent varying failures.
Fixes: 1561e4984e ("v3d: Emit the VCM_CACHE_SIZE packet.")
Fixes
dEQP-GLES3.functional.fragment_ops.blend.default_framebuffer.rgb_func_alpha_func.dst.src_alpha_saturate_src_alpha_saturate
and friends with --deqp-egl-config-name=rgb565d0s0
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
The brw_vs_prog_data::double_inputs_read field comes directly from
shader_info::double_inputs which may contain inputs which are not
actually read. Instead of using it directly, AND it with inputs_read
which is only things which are read. Otherwise, we may end up
subtracting too many elements when computing elem_count.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103241
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It was very inconsistently handled; the only things that made use of it
were glsl_to_nir, glspirv, and nir_gather_info. In particular,
nir_lower_io completely ignored it so anyone using nir_lower_io on
64-bit vertex attributes was going to be in for a shock. Also, as of
the previous commit, it's set by every driver that supports 64-bit
vertex attributes. There's no longer any reason to have it be an option
so let's just delete it.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We were going out of our way to disable dual-location re-mapping in NIR
only to then do the remapping in st_glsl_to_nir.cpp. Presumably, this
was so that double_inputs would be correct for the core state tracker.
However, now that we've it to gl_program::DualSlotInputs which is
unaffected by NIR lowering, we can let NIR lower things for us. The one
tricky bit here is that we have to remap the inputs_read bitfield back
to the single-slot convention for the gallium state tracker to use.
Since radeonsi is the only NIR-capable gallium driver that also supports
GL_ARB_vertex_attrib_64bit, we only have to worry about radeonsi when
making core gallium state tracker changes.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Previously, we had two field in shader_info: double_inputs_read and
double_inputs. Presumably, the one was for all double inputs that are
read and the other is all that exist. However, because nir_gather_info
regenerates these two values, there is a possibility, if a variable gets
deleted, that the value of double_inputs could change over time. This
is a problem because double_inputs is used to remap the input locations
to a two-slot-per-dvec3/4 scheme for i965. If that mapping were to
change between glsl_to_nir and back-end state setup, we would fall over
when trying to map the NIR outputs back onto the GL location space.
This commit changes the way slot re-mapping works. Instead of the
double_inputs field in shader_info, it adds a DualSlotInputs bitfield to
gl_program. By having it in gl_program, we more easily guarantee that
NIR passes won't touch it after it's been set. It also makes more sense
to put it in a GL data structure since it's really a mapping from GL
slots to back-end and/or NIR slots and not really a NIR shader thing.
Tested-by: Alejandro Piñeiro <apinheiro@igalia.com> (ARB_gl_spirv tests)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
By the time Mesa 18.3 comes out (probably December '18), Meson 0.45 will
be 9 months old (March '18), so I think this is reasonable.
(btw, the currently-required Meson 0.44.1 was released less than 12 days
before 0.45, so we're really not bumping by much.)
Currently, the Meson versions in the major distributions are:
Arch: ships 0.47.2
CentOS: 7 ships 0.47.1
Debian: stable ships 0.37.1, so it hasn't been usable in a long time.
everything more recent ships 0.47.2
Fedora: 28 ships 0.45.1
FreeBSD: ships 0.46.1 (ports)
Gentoo: ships 0.46.1
OpenSUSE: 15 ships 0.46
Ubuntu: 18.04 ships 0.45.1
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
We should exit from the function 'util_vasprintf'
with error code -1 for case where 'malloc'
returns NULL
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes: 864148d69e "util: add util_vasprintf() for Windows (v2)"
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
The current minimum meson version supported is 0.44.1, so we have met
both the 0.43 and 0.44 requirement to not need these hacks anymore :)
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Fix an other regression of
mesa: Make gl_vertex_array contain pointers to first order VAO members.
The regression showed up with drivers using the tnl module and
was reproducible using xonotic-glx -benchmark demos/the-big-keybench.dem.
Fixes: 64d2a20480
mesa: Make gl_vertex_array contain pointers to first order VAO members.
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
If we have something like:
#ifdef NOT_DEFINED
#define A_MACRO(x) \
if (x)
#endif
The # on the #define is not skipped but the define itself is so
this then gets recognised as #if.
Until 28a3731e3f this didn't happen because we ended up in
<HASH>{NONSPACE} where BEGIN INITIAL was called stopping the
problem from happening.
This change makes sure we never call RETURN_TOKEN_NEVER_SKIP for
if/else/endif when processing a define.
Cc: Ian Romanick <idr@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107772
Tested-By: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For example,
result0 = texture(sampler[indexBase + 5], coords);
result1 = texture(sampler[indexBase + 0], coords);
result2 = texture(sampler[indexBase + 0], coords);
out_result0 = result0;
out_result1 = result1;
out_result2 = result2;
In this kind of case we need to insert an extra mov to the outputs
so that the result could be assigned to each register respectively.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Since most shaders wouldn't need that large array of immediates, making
the array dynamic could save unnecessary spaces.
In addition, sometimes we can potentially have a much larger array
of immediates to be lowered, which might be more than 64.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Port fixes from a5xx (f0715442)
TODO maybe this should move to shared code, since it seems to be the
same.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The border_color_uploaders need to be torn down before the transfer_pool
is destroyed.
Fixes: e11e9d6394 freedreno: fix context teardown race
Signed-off-by: Rob Clark <robdclark@gmail.com>
We could end up w/ inputs larger than vec4, simply because unused inputs
are not split.
Fixes things like dEQP-GLES31.functional.separate_shader.random.77 (and
probably a handful of others)
Signed-off-by: Rob Clark <robdclark@gmail.com>
We require a single version of libdrm for all of our libdrm
dependencies (core and driver), but the way this is structured can make
the error message less than helpful, as one driver might be the one
setting the libdrm requirement, while another might be the one that
generates the version failure.
This adds a simple message to the output announcing which libdrm module
set the version, which might be more helpful.
v2: - Use message suggested by Eric Engstrom
Fixes: c445b1d56f
("meson: Use the same version for all libdrm checks")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
SVGA device now supports 64 bits surface flags. This patch
updates the winsys interface to allow 64 bits surface flags.
The linux winsys layer will for now only honor the lower 32 bits of
the surface flags.
Reviewed-by: Brian Paul <brianp@vmware.com>
On non vgpu10, driver doesn't support util_blitter_blit for SVGA3D_Z_D16,
SVGA3D_Z_D24x8, SVGA3D_Z_D24S8. Patch fixes following piglit tests regression on hwv8 caused
by commit 27bf35caea5e:
spec@arb_depth_texture@fbo-depth-gl-depth-component16-blit
spec@arb_depth_texture@fbo-depth-gl-depth-component24-blit
spec@arb_depth_texture@fbo-depth-gl-depth-component32-blit
Tested with mtt-piglit on hw 8,9,10,11,13 and mtt-glretrace on windows and linux.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
When blending is enabled, framebuffer colorspace has to be linear.
Previously, we never hit this case because we were not supporting sRGB
drawable. Previous patch added that support.
Tested with mtt glretrace, viewperf, piglit, conform.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Basically, SVGA3dCmdIntraSurfaceCopy command allow copying when
source and destination are same.
Tested with MTT piglit, glretrace, viewperf, conform
v2: changes as per Charmaine's comment
v3: changes as per Charmaine's comment
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch makes sure there is a valid fd before merging it
to the context's fd in vmw_svga_winsys_fence_server_sync().
This fixes the assert running webot.
No regression running kmscube.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
This is the second patch needed to fix the following piglit tests:
tests/spec/arb_gl_spirv/linker/uniform/multisampler.shader_test
tests/spec/arb_gl_spirv/linker/uniform/multisampler-array.shader_test
Although in this case it doesn't affect so many borrowed tests, as
there aren't too many tests using multisamplers on Intel.
It is worth to note that this patch is also needed when those tests
are run on GLSL mode (using the --glsl option). Although most Intel
drivers would not be able to run/execute tests using multisamplers, as
GL_MAX_IMAGE_SAMPLES is zero, technically those tests are expected to
link correctly, so linking tests should pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At this moment that lowering is using info coming from the
UniformStorage, so for the ARB_gl_spirv codepath, it needs to be done
after calling gl_nir_link_uniforms. As for the GLSL codepath it can
also be called later, we just move the call on both cases, to avoid
adding several shader->spirv_data checks, and keep the patch as small
as possible.
This is the first patch needed to fix the following piglit tests:
tests/spec/arb_gl_spirv/linker/uniform/multisampler.shader_test
tests/spec/arb_gl_spirv/linker/uniform/multisampler-array.shader_test
but fixes thousands of tests when borrowing the tests from other specs
(that needs to be done manually right now).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
To brw_nir_lower_gl_images, as it will be also used on the
ARB_gl_spirv codepath, that doesn't involves GLSL at all. So the
lowering is about images following the OpenGL semantics. In any case
"brw_nir_lower_opengl_images" seemed too long to me, so I just used
gl. That shortening is already used on other parts of the code.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The resource bo array must already extended when the target index is
equal to the current size of the array.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Silences:
Conditional jump or move depends on uninitialised value(s)
at 0xB72F2C0: virgl_drm_winsys_create (virgl_drm_winsys.c:854)
by 0xB72F2C0: virgl_drm_screen_create (virgl_drm_winsys.c:926)
by 0xB21C885: pipe_virgl_create_screen (drm_helper.h:275)
by 0xB7201F0: pipe_loader_create_screen (pipe_loader.c:137)
by 0xB639C91: dri2_init_screen (dri2.c:2112)
by 0xB634F68: driCreateNewScreen2 (dri_util.c:153)
by 0x63023E6: dri3_create_screen (dri3_glx.c:893)
by 0x62D35BD: AllocAndFetchScreenConfigs (glxext.c:820)
by 0x62D35BD: __glXInitialize (glxext.c:946)
by 0x62CECB3: GetGLXPrivScreenConfig (glxcmds.c:174)
by 0x62CF69C: glXQueryExtensionsString (glxcmds.c:1304)
by 0x60AA7D9: ??? (in /usr/lib/x86_64-linux-gnu/libwaffle-1.so.0.5.2)
by 0x4F81450: wfl_checked_display_connect (piglit-util-waffle.h:74)
by 0x4F829E0: piglit_wfl_framework_init (piglit_wfl_framework.c:627)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Fixes crash with
piglit/bin/map_buffer_range-invalidate CopyBufferSubData \
increment-offset -auto -fbo
* Resize the resource storage already when the count is equal to the
allocated size, fixes:
Invalid write of size 8
at 0xB72E4CF: virgl_drm_add_res (virgl_drm_winsys.c:629)
by 0xB72E4CF: virgl_drm_emit_res (virgl_drm_winsys.c:663)
by 0xB72A44A: virgl_encode_resource_copy_region (virgl_encode.c:776)
by 0xB40CD12: st_copy_buffer_subdata (st_cb_bufferobjects.c:585)
by 0xB244A3B: _mesa_CopyBufferSubData (bufferobj.c:2940)
by 0x109A1E: upload (invalidate.c:169)
by 0x109C2F: piglit_display (invalidate.c:215)
by 0x4F80FBE: run_test (piglit_fbo_framework.c:52)
by 0x4F66E5F: piglit_gl_test_run (piglit-framework-gl.c:229)
by 0x10949D: main (invalidate.c:47)
Address 0xbe07d30 is 0 bytes after a block of size 4,096 alloc'd
at 0x4C31B25: calloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0xB72DAAF: virgl_drm_cmd_buf_create (virgl_drm_winsys.c:567)
* Also resize the space allocated for the handles, fixes:
Invalid write of size 4
at 0xB72E4F0: virgl_drm_add_res (virgl_drm_winsys.c:631)
by 0xB72E4F0: virgl_drm_emit_res (virgl_drm_winsys.c:663)
by 0xB72A44A: virgl_encode_resource_copy_region (virgl_encode.c:776)
by 0xB40CD12: st_copy_buffer_subdata (st_cb_bufferobjects.c:585)
by 0xB244A3B: _mesa_CopyBufferSubData (bufferobj.c:2940)
by 0x109A1E: upload (invalidate.c:169)
by 0x109C2F: piglit_display (invalidate.c:215)
by 0x4F80FBE: run_test (piglit_fbo_framework.c:52)
by 0x4F66E5F: piglit_gl_test_run (piglit-framework-gl.c:229)
by 0x10949D: main (invalidate.c:47)
Address 0xbe08570 is 0 bytes after a block of size 2,048 alloc'd
at 0x4C2FB0F: malloc (
in /usr/lib/valgrind/vgpreload_memcheck-amd64- linux.so)
by 0xB72DAC8: virgl_drm_cmd_buf_create (virgl_drm_winsys.c:572)
Fixes: 4b15b5e803 ("virgl: resize resource bo allocation if we need to.")
v2: - Use REALLOC macro and avoid memory leak when re-allocation fails
- add Fixes tag (both Emil Velikov)
- reorder commit message
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Emulating atomics on top of ssbos can lead to too small max SSBO count,
so let's use the hw-atomics mechanism to expose atomic buffers instead.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
virgl_protocol.h is considered to have it's upstream in the
virglrenderer repository, and somehow these minor differences has
crept in.
Let's sync with the upstream to avoid this.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
This moves the evergreen-specific max-sizes out as a driver-cap, so
other drivers with less strict requirements also can use hw-atomics.
Remove ssbo_atomic as it's no longer needed.
We should now be able to use hw-atomics for some stages and not for
other, if needed.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
This gets rid of a r600 specific hack in the state-tracker, and prepares
for other drivers to be able to use hw-atomics.
While we're at it, clean up some indentation in the various drivers.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
MaxAtomicCounters has already been assigned in the loop above in the
ssbo_atomic = true case, so this will calculate the same value as the
default.
While we're at it, fixup indentation on the MaxAtomicBufferBindings
assign.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
This makes the code a bit easier to follow; we first set up
MaxShaderStorageBlocks, then we either set up a dedicated
MaxAtomicBuffers, or we split MaxShaderStorageBlocks in two.
While we're at it, also make the SSBO-splitting code tolerate the
hypothetical case of having an odd number of SSBOs without incorrectly
dropping the last SSBO.
This has the nice result that the SSBOs and atomic buffers are dealt
with almost completely orthogonally, easing some upcoming patches.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
GL_STENCIL_INDEX uses GL_INTENSITY for the border color, which is nicer
to hardware that doesn't read the stencil border value from the X channel.
This fixes a bunch of dEQP tests on Vega & Raven.
Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>
Reported by Coverity: arr_live_ranges is freed in a different branch
than the one in which it was allocated.
Signed-off-by: Ernestas Kulik <ernestas.kulik@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Now that we have the util function for the default values, we can get rid
of the boilerplate.
v2: Rebase on new gallium caps
Reviewed-by: Rob Clark <robdclark@gmail.com> (v1)
Now that we have the util function for the default values, we can get rid
of the boilerplate.
v2: drop GLSL level in favor of defaults.
v3: Rebase on new gallium caps
One of the pains of implementing a gallium driver is filling in a million
pipe caps you don't know about yet when you're just starting out. One of
the pains of working on gallium is copy-and-pasting your new PIPE_CAP into
each driver. We can fix both of these by having each driver call into the
default helper from their default case, so that both sides can ignore each
other until they need to.
v2: fix i915g build, revert swr change to avoid breaking scons build
(https://travis-ci.org/anholt/mesa/jobs/419739857)
v3: Rebase on 3 new gallium caps.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Cc: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: George Kyriazis <george.kyriazis@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
As of 07a2098a70, brw_nir_optimize calls nir_remove_dead_variables as
the last optimization. Doing it again is just pointless.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
We're hitting an assert in gfxbench because one of the local variable
is a sampler (according to Jason this isn't valid) :
testfw_app: ../src/compiler/nir_types.cpp:551: void glsl_get_natural_size_align_bytes(const glsl_type*, unsigned int*, unsigned int*): Assertion `!"type does not have a natural size"' failed.
Since this particular variable isn't used, it can be eliminated by
removing unused local variables at the end of the optimization loop.
This makes sense also for valid local variables.
v2: Move additional local variable removal out of optimization loop,
but before large constant removal (Jason/Lionel)
v3: Move the removal at the end of brw_nir_optimize()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107806
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The "gen_group_get_length" function can return a negative value
and it can lead to the out of bounds group_iter.
v2: printing of "unknown command type" was added
v3: just the asserts are added
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
ac_surface.c: gfx6_compute_surface says
/* DB doesn't support linear layouts. */
Now if we expose linear depth and create a linear depth image
and use CmdCopyImage to copy into it, we can't map the underlying
memory and read it linearly which I think should work.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Amber Lake uses the same gen graphics as Kaby Lake, including a id
that were previously marked as reserved on Kaby Lake, but that
now is moved to AML page.
This follows the ids and approach used on kernel's commit
e364672477a1 ("drm/i915/aml: Introducing Amber Lake platform")
Reported-by: Timo Aaltonen <timo.aaltonen@canonical.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
According to internal docs, some gen9 platforms have a pixel shader push
constant synchronization issue. Although not listed among said
platforms, this issue seems to be present on the GeminiLake 2x6's we've
tested.
We consider the available workarounds to be too detrimental on
performance. Instead, we mitigate the issue by applying part of one of
the workarounds. Re-emit PUSH_CONSTANT_ALLOC at the top of every batch
(as suggested by Ken).
Fixes ext_framebuffer_multisample-accuracy piglit test failures with the
following options:
* 6 depth_draw small depthstencil
* 8 stencil_draw small depthstencil
* 6 stencil_draw small depthstencil
* 8 depth_resolve small
* 6 stencil_resolve small depthstencil
* 4 stencil_draw small depthstencil
* 16 stencil_draw small depthstencil
* 16 depth_draw small depthstencil
* 2 stencil_resolve small depthstencil
* 6 stencil_draw small
* all_samples stencil_draw small
* 2 depth_draw small depthstencil
* all_samples depth_draw small depthstencil
* all_samples stencil_resolve small
* 4 depth_draw small depthstencil
* all_samples depth_draw small
* all_samples stencil_draw small depthstencil
* 4 stencil_resolve small depthstencil
* 4 depth_resolve small depthstencil
* all_samples stencil_resolve small depthstencil
v2: Include more platforms in WA (Ken).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106865
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93355
Cc: <mesa-stable@lists.freedesktop.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Though the SARGB8888 format is used internally through its FourCC value,
it is not a real format as defined by drm_fourcc.h; it cannot be used
with KMS or other interfaces expecting drm_fourcc.h format codes.
Ensure we don't advertise it through the dmabuf format/modifier query
interfaces, preventing us from tripping over an assert.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes: 8c1b9882b2 ("egl/dri2: Guard against invalid fourcc formats")
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
CTS doesn't test input clip/cull distances for the fragment
shader stage, which explains why this was totally broken. I
wrote a simple test locally that works now.
This fixes a crash with GTA V and DXVK.
Note that we are exporting unused parameters from the vertex
shader now, but this can't be optimized easily because we don't
keep the fragment shader info...
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107477
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If color buffer is locked, do not set its wayland buffer to NULL;
otherwise it can not be freed later.
Rather, flag it in order to destroy it later on the release event.
v2: instruct release event to unlock only or free wl_buffer too (Daniel)
This also fixes dEQP-EGL.functional.swap_buffers_with_damage.* tests.
CC: Daniel Stone <daniel@fooishbar.org>
Reviewed-by: Daniel Stone <daniels@collabora.com>
While adding transfer queues to radv, I started writing some tests,
the first test I wrote fell over copying a buffer larger than this
limit.
Checked AMDVLK and found the correct limit.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes:
tests/spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-dvec3.shader_test
since I reworked the 64-bit swizzles.
Fixes: bb17ae49ee (gallivm: allow to pass two swizzles into fetches.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
I have piglit results from my machine, but I must have messed up,
and not built mesa in between properly.
Fixes: bb17ae49ee (gallivm: allow to pass two swizzles into fetches.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This reverts commit f8cfc77660.
This appears to break intel_dump_gpu for Gen9 systems - I can load them
in the simulator, but nothing happens. Reverting the patch makes the
simulator properly execute our commands and shaders again.
This fixes the GL_ARB_fragment_shader_interlock piglit test on gen8
platforms where the lack of metadata dirtying was causing another pass
to accidentally delete a much needed loop.
https://bugs.freedesktop.org/show_bug.cgi?id=107745
Fixes: 37f7983bcc "intel/compiler: Do image load/store lowering..."
Jason Ekstrand <jason@jlekstrand.net> writes:
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This effectively reverts a266934935 which
was a misguided attempt at protecting intel_query_dma_buf_modifiers from
invalid formats. Unfortunately, in some internal EGL cases, we can get
an SRGB format validly in this function. Rejecting such formats caused
us to not allow CCS in some cases where we should have been allowing it.
This regressed the performance of some SynMark tests as well as GfxBench
ALU2, Tessellation and Manhattan 3.0 tests
There's some question of whether or not we really should be using SRGB
"fourcc" formats that aren't actually in drm_foucc.h but there's not
much harm in allowing them through here.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107223
Fixes: a266934935 "i965/screen: Return false for unsupported..."
Tested-By: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
We already reject attempts to import images with invalid fourcc formats
but don't really guard the queries all that well. This makes us error
out in any calls to eglQueryDmaBufModifiersEXT if the given format is
not a valid fourcc format. We also add an assert to ensure that drivers
don't advertise any non-fourcc formats.
Cc: mesa-stable@lists.freedesktop.org
Tested-By: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Now that image load/store intrinsics are variable-width, we need to set
num_components accordingly. In 15d39f474b, both glsl_to_nir and
spirv_to_nir were updated to properly set num_components but radv meta
was left behind.
Fixes: 15d39f474b "nir: Make image load/store intrinsics..."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Previously gallivm would attempt to use VSX instructions on all systems
where it detected that Altivec is supported; however, VSX was added to
POWER long after Altivec, causing lots of crashes on older POWER/PPC
hardware, e.g. PPC Macs. By detecting VSX separately from Altivec we can
automatically disable it on hardware that supports Altivec but not VSX
Signed-off-by: Vicki Pfau <vi@endrift.com>
Passes the compat piglits. I'm sure that there will be odd issues that
aren't caught by them, but at least it should basically work.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
With compat creeping up to geometry and tess shaders, lowering texcoord
accesses/writes becomes more complicated. Since it's an optimization
anyways, just avoid the complication for now.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The spec seems clear this is not allowed but the Nvidia binary
forces apps to add layout qualifiers so this works around the
issue for No Mans Sky until the CTS can be sorted out.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The spec is quite clear this is not allowed:
From Section 4.4. (Layout Qualifiers) of the GLSL 4.60 spec:
"Layout qualifiers can appear in several forms of declaration.
They can appear as part of an interface block definition or
block member, as shown in the grammar in the previous section.
They can also appear with just an interface-qualifier to establish
layouts of other declarations made with that qualifier:
layout-qualifier interface-qualifier ;
Or, they can appear with an individual variable declared with
an interface qualifier:
layout-qualifier interface-qualifier declaration ;"
From Section 4.10 (Memory Qualifiers) of the GLSL 4.60 spec:
"Layout qualifiers cannot be used on formal function parameters,
and layout qualification is not included in parameter matching."
However on the Nvidia binary driver they actually fail to compile
if image function params don't have a layout qualifier. This results
in applications such as No Mans Sky using layout qualifiers on params.
I've submitted a CTS test to expose this problem in the Nvidia driver
but until that is resolved this patch will help Mesa drivers work
around the issue.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes compilation of some "No Mans Sky" shaders where the stringification
happens in branches intended for DX12.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This hijacks the top 16-bits of swizzle, to pass in the swizzle
for the second channel.
This fixes handling .yx swizzles of 64-bit values.
This should fixup radeonsi and llvmpipe.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107524
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We could enable it for lower versions of GL but this allows us
to just use the existing version/extension checks that are already
used by the core profile.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Now that the drivers are lowering to surface indices themselves, we no
longer need to push the surface index into the shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, the back-end compiler turn image access into magic uniform
reads and there was a complex contract between back-end compiler and
driver about setting up and filling out those params. As of this
commit, both drivers now lower image_deref_load_param_intel intrinsics
to load_uniform intrinsics controlled by the driver and lower the other
image_deref_* intrinsics to image_* intrinsics which take an actual
binding table index. There are still "magic" uniforms but they are now
added and controlled entirely by the driver and that contract no longer
spans components.
This also has the side-effect of making most image use compile-time
binding table indices. Previously, all image access pulled the binding
table index from a uniform. Part of the reason for this was that the
magic uniforms made it difficult to decouple binding table indices from
the uniforms and, since they are indexed completely differently
(especially in Vulkan), it was hard to pull them apart. Now that the
driver is handling both, it's trivial to decouple the two and provide
actual binding table indices.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166872 -> 15164293 (-0.02%)
instructions in affected programs: 115834 -> 113255 (-2.23%)
helped: 191
HURT: 0
total cycles in shared programs: 571311495 -> 571196465 (-0.02%)
cycles in affected programs: 4757115 -> 4642085 (-2.42%)
helped: 73
HURT: 67
total spills in shared programs: 10951 -> 10926 (-0.23%)
spills in affected programs: 742 -> 717 (-3.37%)
helped: 7
HURT: 0
total fills in shared programs: 22226 -> 22201 (-0.11%)
fills in affected programs: 1146 -> 1121 (-2.18%)
helped: 7
HURT: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit expands the current memory access enum to contain the extra
two bits provided for images. We choose to follow the SPIR-V convention
of NonReadable and NonWriteable because readonly implies that you *can*
read so readonly + writeonly doesn't make as much sense as NonReadable +
NonWriteable.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The GLSL spec allows you to set both the "readonly" and "writeonly"
qualifiers on images to indicate that it can only be used with
imageSize. However, we had no way of representing this int he linked
shader and flagged it as GL_READ_ONLY. This is good from a "does it use
this buffer?" perspective but not from a format and access lowering
perspective. By using GL_NONE for if "readonly" and "writeonly" are
both set, we can detect this case in the driver and handle it correctly.
Nothing currently relies on the type of surface in the "readonly" +
"writeonly" case but that's about to change. i965 is the only drier
which uses the ImageAccess field and gl_bindless_image::access is
currently unused.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Having the array length component stored in .z was a small convenience
for the ISL image param filling code and an annoyance in the NIR
lowering code. The only convenience of treating 1D arrays like 2D
arrays in the lowering code is in the address calculation code so let's
put all the complexity there as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit moves our storage image format conversion codegen into NIR
instead of doing it in the back-end. This has the advantage of letting
us run it through NIR's optimizer which is pretty effective at shrinking
things down. In the common case of rgba8, the number of instructions
emitted after NIR is done with it is half of what it was with the
lowering happening in the back-end. On the downside, the back-end's
lowering is able to directly use predicates and the NIR lowering has to
use IFs.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166910 -> 15166872 (<.01%)
instructions in affected programs: 5895 -> 5857 (-0.64%)
helped: 15
HURT: 0
Clearly, we don't have that much image_load_store happening in the
shaders in shader-db....
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Dead code will get rid of them eventually but it's better if they're
just gone so we guarantee they won't trip up later passes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of requiring 4 components, this allows them to potentially use
fewer. Both the SPIR-V and GLSL paths still generate vec4 intrinsics so
drivers which assume 4 components should be safe. However, we want to
be able to shrink them for i965.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We have a name for that, it's called a uvec. This just makes the
function name a bit shorter. While we're here, we also add an assert
for one of the assumptions this function makes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There is nothing inherent about these opcodes that requires them to only
take scalars. It's very convenient if we let them take vectors as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Found by inspection. This doesn't help much now but we'll see this
pattern with images if you load UNORM and then store UNORM.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166916 -> 15166910 (<.01%)
instructions in affected programs: 761 -> 755 (-0.79%)
helped: 6
HURT: 0
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This adds the "(a << N) >> M" family of mask or sign-extensions. Not a
huge win right now but this pattern will soon be generated by NIR format
lowering code.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15166918 -> 15166916 (<.01%)
instructions in affected programs: 36 -> 34 (-5.56%)
helped: 2
HURT: 0
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If it's not the right bit-size, it may not actually be the correct
extraction. For now, we'll only worry about 32-bit versions.
Fixes: 905ff86198 "nir: Recognize open-coded extract_u16"
Fixes: 76289fbfa8 "nir: Recognize open-coded extract_u8"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Blending isn't valid for integer formats. Rather than having drivers
worry about this, just disable blending in this case. This hopefully
will increase hits in the CSO cache as well, by eliminating most of the
meaningless fields in this case.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This doesn't seem to make any difference in testing, but it fixes a
failed assertion when dumping sm3 shaders.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Setting GL_POINT_SPRITE_COORD_ORIGIN to GL_LOWER_LEFT did not work for
vgpu9. We can use the rasterizer sprite_coord_enable bitfield as-is.
We need to index into it using the TGSI semantic index, not the
register index.
This fixes the Piglit fbo-gl_pointcoord and glsl-fs-pointcoord tests.
Testing done: Piglit, Mesa sprite demos
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The flag_rect and flag_buffer fields didn't sufficiently capture
the state changes needed for those resource types. For example,
if a texture binding was changed from a 500x500 rect texture to a
400x400 rect texture we didn't set SVGA_NEW_TEXTURE_CONSTS. But
we need to do that to emit the new texcoord scale factors to the
constant buffers. Rather than track the sizes of all bound
resources, just set the flag if the resource is a rect. Same
story with texture buffers.
Also, since rect/buffer textures are usable with VS/GS shaders,
add SVGA_NEW_TEXTURE_CONSTS to the flags we check for emitting
VS/GS constants.
This seems to help with XFCE / xfwm4 desktop scaling.
VMware issue 2156696.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Add const qualifiers. Add 'f' suffix on floats to avoid double
promotion.
Remove unneeded shader type assertion since the switch statement
handled it already.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Move this now-unused function into the existing comment block, which was its only prior use.
../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp:2645:1: warning:
unused function 'partitionLoadStore' [-Wunused-function]
partitionLoadStore(uint8_t comp[2], uint8_t size[2], uint8_t mask)
Fixes: ("86e4440361 nouveau: codegen: Disable more old resource handling code")
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Newer blit tests are enabling depth&stencils blits. We currently don't
support it but can do by iterating over the aspects masks (copy some
logic from the CopyImage function).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9f44745eca ("anv: Use blorp to implement VkBlitImage")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This adds support for unrolling the classic
do {
// ...
} while (false)
that is used to wrap multi-line macros. GLSL IR also wraps switch
statements in a loop like this.
shader-db results IVB:
total loops in shared programs: 2515 -> 2512 (-0.12%)
loops in affected programs: 33 -> 30 (-9.09%)
helped: 3
HURT: 0
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now that SSA values can be derefs and they have special rules, we have
to be a bit more careful about our LCSSA phis. In particular, we need
to clean up in case LCSSA ended up creating a phi node for a deref.
This avoids validation issues with some CTS tests with the following
patch, but its possible this we could also see the same problem with
the existing unrolling passes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In order to be sure loop_terminator_list is an accurate
representation of all the jumps in the loop we need to be sure we
didn't encounter any other complex behaviour such as continues,
nested breaks, etc during analysis.
This will be used in the following patch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will help later patches with unrolling loops that end with a
break i.e. loops the always exit on their first interation.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
All of the other brw_*_desc functions take a devinfo parameter, and all
of the others at least have an assert that uses it. Keep the parameter,
but mark it as unused.
Silences 37 warnings like:
In file included from src/intel/common/gen_disasm.c:27:0:
src/intel/compiler/brw_eu.h: In function ‘brw_pixel_interp_desc’:
src/intel/compiler/brw_eu.h:377:53: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
brw_pixel_interp_desc(const struct gen_device_info *devinfo,
^~~~~~~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Gen >= 9 have ability to control clamping of depth values separately at
near and far plane.
z_w is clamped to the range [min(n,f), 0] if clamping at near plane is
enabled, [0, max(n,f)] if clamping at far plane is enabled and [min(n,f)
max(n,f)] if clamping at both plane is enabled.
v2: 1) Use better coding style (Ian Romanick)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
_mesa_set_enable() and _mesa_IsEnabled() extended to accept new two
tokens GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD.
v2: Remove unnecessary parentheses (Marek Olsak)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Enable _mesa_PushAttrib() and _mesa_PopAttrib() to handle
GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD tokens.
Remove DepthClamp, because DepthClampNear + DepthClampFar replaces it,
as suggested by Marek Olsak.
Driver that enables AMD_depth_clamp_separate will only ever look at
DepthClampNear and DepthClampFar, as suggested by Ian Romanick.
v2: 1) Remove unnecessary parentheses (Marek Olsak)
2) if AMD_depth_clamp_separate is unsupported, TEST_AND_UPDATE
GL_DEPTH_CLAMP only (Marek Olsak)
3) Clamp against near and far plane separately (Marek Olsak)
4) Clip point separately for near and far Z clipping plane (Marek
Olsak)
v3: Clamp raster position zw to the range [min(n,f), 0] for near plane
and [0, max(n,f)] for far plane (Marek Olsak)
v4: Use MIN2 and MAX2 instead of CLAMP (Marek Olsak)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add some basic types and storage for the AMD_depth_clamp_separate
extension.
v2: 1) Drop unnecessary definition (Marek Olsak)
2) Expose extension in compatibility profile (Marek Olsak)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Currently we run the script but don't actually load any files, even in a
tarball where they exist.
Fixes: 3218056e0e
("meson: Build i965 and dri stack")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Adds suppport for INTEL_fragment_shader_ordering. We achieve
the fragment ordering by using the same instruction as for
beginInvocationInterlockARB() which is by issuing a memory
fence via sendc.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
This extension provides new GLSL built-in function
beginFragmentShaderOrderingIntel() that guarantees
(taking wording of GL_INTEL_fragment_shader_ordering
extension) that any memory transactions issued by
shader invocations from previous primitives mapped to
same xy window coordinates (and same sample when
per-sample shading is active), complete and are visible
to the shader invocation that called
beginFragmentShaderOrderingINTEL().
One advantage of INTEL_fragment_shader_ordering over
ARB_fragment_shader_interlock is that it provides a
function that operates as a memory barrie (instead
of a defining a critcial section) that can be called
under arbitary control flow from any function (in
contrast the begin/end of ARB_fragment_shader_interlock
may only be called once, from main(), under no control
flow.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
When the SVBI Payload Enable is false I guess the register R1.4
which contains the Maximum Streamed Vertex Buffer Index is filled by zero
and GS stops to write transform feedback when the transform feedback
is not active.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107579
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This adds an environment-varaible that can be used for driver-specific
flags, as well as a flag for it to enable verbose output.
While we're at it, quiet some overly chatty debug-output by default.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
This is the only direct call-site for fprintf in virgl; all other
call-sites call debug_printf instead. So let's follow in style here.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
INTEL_DEBUG=hex prints 32 bit hex value and due to endianness of CPU
byte order is reversed. In order to disassemble binary files, print
each byte instead of 32 bit hex value.
v2: Print blank spaces in order to vertically align output of compacted
instructions hex value with uncompacted instructions hex value.
(Matt Turner)
v3: Fix line wrap at correct length
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Gives a +7.79% increase in FPS with Hitman on lowest quality settings on
my GTX 1060.
total instructions in shared programs : 5787979 -> 5748677 (-0.68%)
total gprs used in shared programs : 669901 -> 669373 (-0.08%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21068 -> 21064 (-0.02%)
local shared gpr inst bytes
helped 1 0 152 274 274
hurt 0 0 0 0 0
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Rather than the usual three that would be created.
total instructions in shared programs : 5796385 -> 5786560 (-0.17%)
total gprs used in shared programs : 670103 -> 669968 (-0.02%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21164 -> 21068 (-0.45%)
local shared gpr inst bytes
helped 1 0 64 1040 1040
hurt 0 0 27 0 0
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
total instructions in shared programs : 5819319 -> 5796385 (-0.39%)
total gprs used in shared programs : 670571 -> 670103 (-0.07%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21164 -> 21164 (0.00%)
local shared gpr inst bytes
helped 0 0 318 1758 1758
hurt 0 0 63 0 0
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
With this commit, OP_MAD is handled on nv50 too. This commit is also
useful for later commits.
Also, instead of creating a shladd, it relies on LateAlgebraicOpt to
create one. This simplifies the code and helps shader-db slightly overall.
total instructions in shared programs : 5820882 -> 5819319 (-0.03%)
total gprs used in shared programs : 670595 -> 670571 (-0.00%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21164 -> 21164 (0.00%)
local shared gpr inst bytes
helped 0 0 18 230 230
hurt 0 0 8 263 263
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
This hits the shader-db numbers a good bit, though a few xmads is way
faster than an imul or imad and the cost is mitigated by the next commit,
which optimizes many multiplications by immediates into shorter and less
register heavy instructions than the xmads.
total instructions in shared programs : 5768871 -> 5820882 (0.90%)
total gprs used in shared programs : 669919 -> 670595 (0.10%)
total shared used in shared programs : 548832 -> 548832 (0.00%)
total local used in shared programs : 21068 -> 21164 (0.46%)
local shared gpr inst bytes
helped 0 0 38 0 0
hurt 1 0 365 3076 3076
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
v4: make the immediate field 16 bits
v5: don't ever emit h1 flags for immediates
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
v4: remove uint16_t(...)
v4: don't allow immediates outside [0,65535] in insnCanLoad()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
>From Section 4.3.4 (Inputs) of the GLSL 1.50 spec:
"Only the input variables that are actually read need to be written
by the previous stage; it is allowed to have superfluous
declarations of input variables."
Fixes:
* interstage-multiple-shader-objects.shader_test
v2:
Update comment in ir.h since the usage of "used" field
has been extended.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101247
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
VkPhysicalDeviceProtectedMemoryProperties structure is new on Vulkan 1.1.
Fixes Vulkan CTS CL#2849.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The recent commit 4616639b49 introduced
the new function aubinator_error() which is a trivial wrapper around
fprintf() to STDERR. The call to fprintf() however is passed the message
msg directly:
fprintf(stderr, msg);
This is a format-security violation and leads to an FTBFS with
-Werror=format-security (GCC 8):
../../../src/intel/tools/aubinator.c: In function 'aubinator_error':
../../../src/intel/tools/aubinator.c:74:4: error: format not a string literal and no format arguments [-Werror=format-security]
fprintf(stderr, msg);
^~~~~~~
This patch fixes this trivially by introducing a catch-all "%s" format
argument.
Fixes: 4616639b49 ("intel: tools: split aub parsing from aubinator")
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of printing addresses like everyone else, we were accidentally
printing the offset from state base address. Also, state_map is a void
pointer so we were incrementing in bytes instead of dwords and every
state other than the first was wrong.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
First of all, setting iter->name in advance_field is unnecessary because
it gets set by gen_decode_field which gets called immediately after
gen_decode_field in the one call-site. Second, we weren't properly
initializing start_bit and end_bit in the initial condition of
gen_field_iterator_next so the first field of a struct would get printed
wrong if it doesn't start on the first bit. This is fixed by adding a
iter_start_field helper which sets the field and also sets up the other
bits we need. This fixes decoding of 3DSTATE_SBE_SWIZ.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Some hardware can do PIPE_TEX_WRAP_MIRROR_REPEAT but not
PIPE_TEX_WRAP_MIRROR_CLAMP and PIPE_TEX_WRAP_MIRROR_CLAMP_TO_BORDER.
Drivers for such hardware would like to advertise support for
ARB_texture_mirror_clamp_to_edge but not EXT_texture_mirror_clamp.
This commit adds a new PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE bit,
changes the extension enable to be based on that, and enables it
in all upstream drivers which supported PIPE_CAP_TEXTURE_MIRROR_CLAMP
(so they continue supporting this mode).
The batch decoder looks for a field with a particular name to decide
whether an MI_BB_START leads into a second batch buffer level. Because
the names are different between Gen7.5/8 and the newer generation we
fail that test and keep on reading (invalid) instructions.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently droid_probe_device, does not do any 'probing' but filtering
out a device if it doesn't match the vendor string given.
Rename the function, straighten the return type and call it only as
needed - an actual vendor string is provided.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
The function name is misleading - it effectively checks if
loader_get_driver_for_fd fails. Which can happen only only on strdup
error - a close to impossible scenario.
Drop the function - we call the loader API at at later stage.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
The name string is guaranteed to be NULL terminated. Drop the explicit
length check that comes with strncmp().
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Replace the manual handling of /dev/dri in favor of the drmDevice API.
The latter provides a consistent way of enumerating the devices,
providing device details as needed.
v2:
- Use ARRAY_SIZE (Frank)
- s/famour/favor/ typo (Frank)
- Make MAX_DRM_DEVICES a macro - fix vla errors (RobF)
- Remove left-over dev_path instance (RobF)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Robert Foss <robert.foss@collabora.com> (v1)
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
This reverts commit ae7898dfdb.
Turns out the python scripts are _not_ fully python 3 compatible.
As Ilia reported using get_xmlpool.py with LANG=C produces some weird
output - see the link for details.
Even though the issue was spotted with the autoconf build, it exposes a
genuine problem with the script (and lack of lang handling of the meson
build.)
https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html
This reverts commit 855af9a5a2.
Turns out the python scripts are _not_ fully python 3 compatible.
As Ilia reported using get_xmlpool.py with LANG=C produces some weird
output - see the link for details.
Even though the issue was spotted with the autoconf build, it exposes a
genuine problem with the script (and lack of lang handling of the meson
build.)
https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html
This reverts commit 095515e16c.
This breaks KHR-GL46.map_buffer_alignment.functional on i965.
This code was apparently not reviewed and I don't know why we would
move from a driver configurable constant to a hardcoded value for all
drivers. This really looks like an accidental hack push.
As far as I can tell, no one reviewed these changes, they made i965
assert fail on driver load, and I am not certain they are correct.
(Hopefully reverting these does not break radeonsi too badly...)
The uniform related changes seem fine and reasonable, but the texture
image units change is possibly incorrect. According to the
OES_tessellation_shader spec issue 5:
(5) How are aggregate shader limits computed?
RESOLVED: Following the GL 4.4 model, but we restrict uniform
buffer bindings to 12/stage instead of 14, this results in
MAX_UNIFORM_BUFFER_BINDINGS = 72
This is 12 bindings/stage * 6 shader stages, allowing a static
partitioning of the bindings even though at most 5 stages can
appear in a program object).
MAX_COMBINED_UNIFORM_BLOCKS = 60
This is 12 blocks/stage * 5 stages, since compute shaders can't
be mixed with other stages.
MAX_COMBINED_TEXTURE_IMAGE_UNITS = 96
This is 16 textures/stage * 6 stages.
which definitely is including compute shaders in that last limit.
Not including compute shaders breaks the following test:
dEQP-GLES31.functional.state_query.integer.max_combined_texture_image_units_getinteger
There was enough breakage that I figured we should just send this back
to the drawing board.
Revert "i965: don't include compute resources in "Combined" limits"
Revert "st/mesa: don't include compute resources in "Combined" limits"
Revert "mesa: don't include compute resources in MAX_COMBINED_* limits"
This reverts commit b03dcb1e5f.
This reverts commit cff290df4c.
This reverts commit 45f87a48f9.
These have been removed. Unfortunately auto-upgrade doesn't work for
jit. (Worse, it seems we don't get a compilation error anymore when
compiling the shader, rather llvm will just do a call to a null
function in the jitted shaders making it difficult to detect when
intrinsics vanish.)
Luckily the signed ones are still there, I helped convincing llvm
removing them is a bad idea for now, since while the unsigned ones have
sort of agreed-upon simplest patterns to replace them with, this is not
the case for the signed ones, and they require _significantly_ more
complex patterns - to the point that the recognition is IMHO probably
unlikely to ever work reliably in practice (due to other optimizations
interfering). (Even for the relatively trivial unsigned patterns, llvm
already added test cases where recognition doesn't work, unsaturated
add followed by saturated add may produce atrocious code.)
Nevertheless, it seems there's a serious quest to squash all
cpu-specific intrinsics going on, so I'd expect patches to nuke them as
well to resurface.
Adapt the existing fallback code to match the simple patterns llvm uses
and hope for the best. I've verified with lp_test_blend that it does
produce the expected saturated assembly instructions. Though our
cmp/select build helpers don't use boolean masks, but it doesn't seem
to interfere with llvm's ability to recognize the pattern.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106231
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
because the closed driver exposes it.
It's equivalent to ARB_gpu_shader_int64.
In this patch, I did everything the same as we do for ARB_gpu_shader_int64.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We have to be a bit careful with this one because we want it to run in
the optimization loop but only in the first brw_nir_optimize call.
Later calls assume that we've lowered away copy_deref instructions and
we don't want to introduce any more.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15176942 -> 15176942 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
In spite of the lack of any shader-db improvement, this patch completely
eliminates spilling in the Batman: Arkham City tessellation shaders.
This is because we are now able to detect that the temporary array
created by DXVK for storing TCS inputs is a copy of the input arrays and
use indirect URB reads instead of making a copy of 4.5 KiB of input data
and then indirecting on it with if-ladders.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This peephole optimization looks for a series of load/store_deref or
copy_deref instructions that copy an array from one variable to another
and turns it into a copy_deref that copies the entire array. The
pattern it looks for is extremely specific but it's good enough to pick
up on the input array copies in DXVK and should also be able to pick up
the sequence generated by spirv_to_nir for a OpLoad of a large composite
followed by OpStore. It can always be improved later if needed.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Shader-db results on Kaby Lake:
total instructions in shared programs: 15177605 -> 15176765 (<.01%)
instructions in affected programs: 4259 -> 3419 (-19.72%)
helped: 1
HURT: 0
total spills in shared programs: 10954 -> 10855 (-0.90%)
spills in affected programs: 295 -> 196 (-33.56%)
helped: 1
HURT: 0
total fills in shared programs: 22222 -> 22117 (-0.47%)
fills in affected programs: 417 -> 312 (-25.18%)
helped: 1
HURT: 0
The helped shader is from the OglCSDof synmark test. On my Kaby Lake
laptop, the actual framerate of the benchmark didn't appear to improve
beyond the noise.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This pass looks for variables with vector or array-of-vector types and
narrows the type to only the components used.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We call structure splitting once because it is guaranteed to split all
the structures in the entire shader in one go. We call array splitting
in the loop in case future optimizations turn indirects into direct
dereferences and we can split more arrays.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15177605 -> 15177605 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
This is unsurprising because nir_lower_vars_to_ssa already effectively
does structure and array splitting internally. It doesn't actually
split the variables but it's ability to reason about aliasing in the
presence of arrays and structures and pick out scalars or vectors to be
lowered to SSA values is fairly advanced.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This pass looks for array variables where at least one level of the
array is never indirected and splits it into multiple smaller variables.
This pass doesn't really do much now because nir_lower_vars_to_ssa can
already see through arrays of arrays and can detect indirects on just
one level or even see that arr[i][0][5] does not alias arr[i][1][j].
This pass exists to help other passes more easily see through arrays of
arrays. If a back-end does implement arrays using scratch or indirects
on registers, having more smaller arrays is likely to have better memory
efficiency.
v2 (Jason Ekstrand):
- Better comments and naming (some from Caio)
- Rework to use one hash map instead of two
v2.1 (Jason Ekstrand):
- Fix a couple of bugs that were added in the rework including one
which basically prevented it from running
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This pass doesn't really do much now because nir_lower_vars_to_ssa can
already see through structures and considers them to be "split". This
pass exists to help other passes more easily see through structure
variables. If a back-end does implement arrays using scratch or
indirects on registers, having more smaller arrays is likely to have
better memory efficiency.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The combined limits should only include shader stages that can be active
at the same time. We don't need to include compute.
See also cff290df4c for st/mesa.
Unbreaks i965 from assert failing on driver load since Marek's
45f87a48f9, which dropped the core
Mesa capabilities before adjusting driver limits down to match.
Same as the closed driver.
This causes a failure in GL45-CTS.compute_shader.max, which has a trivial
bug.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
5 is the maximum number of shader stages that can be used by 1 execution
call at the same time (e.g. a draw call). The limit ensures that each
stage can use all of its binding points.
Compute is separate and doesn't need the 5x multiplier.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
The extension was exposed but not the functions.
This fixes:
dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.readn_pixels
dEQP-GLES31.functional.debug.negative_coverage.get_error.state.get_nuniformfv
dEQP-GLES31.functional.debug.negative_coverage.get_error.state.get_nuniformiv
Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Pretty much all of the scripts are python2+3 compatible.
Check and allow using python3, while adjusting the PYTHON2 refs.
Note:
- python3.4 is used as it's the earliest supported version
- python3 chosen prior to python2
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
The script is executed explicitly via the build system, that uses
PYTHON/prog_python and equivalent.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
It will be ignored by x11_swapchain_result() anyway (because reaching
the `fail` label without setting `result` means the swapchain status was
already a hard error), but the compiler still complains about reading
uninitialised memory.
While at it, drop the unused assignment right before returning.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Single-sample color and single-sample depth (not stencil)
are coherent with shaders.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
Since the script is never executed directly, but launched by Meson as an
argument to the Python interpreter, those are not needed any more.
In addition, they are the reason this script was missed when I moved the
Meson buildsystem to Python 3, so removing them helps avoiding future
confusion.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Just like the rest of the tree - these should be run either as part of
the build system check target, or at the very least with an explicitly
versioned python executable.
Fixes: db8cd8e367 ("glcpp/tests: Convert shell scripts to a python script")
Fixes: 97c28cb082 ("glsl/tests: Convert optimization-test.sh to pure python")
Fixes: 3b52d29227 ("glsl/tests: reimplement warnings-test in python")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Handling the version comparison by hand is a bad idea. Python has a handy
module distutils for that - use it.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Currently we use AC_CHECK_PROGS looking for python2.7, python2 and
finally python. That is due to the varying names used across the
different OS.
Use the handy AM_PATH_PYTHON which finds the correct name and checks for
the version.
Note: python2.7 has been an unofficial requirement for quite some time.
Update the docs to reflect that.
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
This is a strictly alphabetic sort, as is done in extensions_table.h
There are other options. We should pick one and document it. Right
now, this file is chaos.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
v2: Split changes to the message type field to another patch. Suggested
by Caio.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This is necessary for a new Gen9 message type that will be added in the
next patch. There are also Gen8 message types that need the extra bit
(mostly for bindless).
v2: Split off from the next patch. Suggested by Caio.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
v2: Describe interactions with the capabilities added by
SPV_INTEL_shader_atomic_float_minmax
v3: Remove 64-bit float support.
v4: Explain NaN issues. Explain issues with atomicMin(-0, +0) and
atomicMax(-0, +0).
v5: Fix whitespace issues noticed by Caio.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Haven't tested this, but we do include loader.h
in platform_android.c
Fixes: c5ec155685 ("meson: wire up egl/android")
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Since there's no particular reason for the index to be 0, choose an
index that is not used by other block. This is convenient when we
store "per-block" data in an array AND look for the successors
data (e.g. any kind of backwards data-flow analysis).
v2: Add a note about end_block's index. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Deref paths may share the same deref instructions in their chains,
e.g.
ssa_100 = deref_var A
ssa_101 = deref_struct "array_field" of ssa_100
ssa_102 = deref_array "[1]" of ssa_101
ssa_103 = deref_struct "field_a" of ssa_102
ssa_104 = deref_struct "field_a" of ssa_103
when comparing the two last deref instructions, their paths will share
a common sequence ssa_100, ssa_101, ssa_102. This patch skips to next
iteration if the deref instructions are the same. Path[0] (the var)
is still handled specially, so in the case above, only ssa_101 and
ssa_102 will be skipped.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Without this patch mesa doesn't compile:
In file included from ../mesa-9999/src/amd/addrlib/addrinterface.cpp:39:
../mesa-9999/src/util/macros.h:29:10: fatal error: c99_compat.h: No such file or directory
#include "c99_compat.h"
^~~~~~~~~~~~~~
compilation terminated.
Fixes: 15ca5ce99a
("amd/addrlib: mark returnCode as MAYBE_UNUSED in")
Signed-off-by: Mariusz Ceier <mceier+mesa-dev@gmail.com>
Acked-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Currently if 64bit and 32bit programs are used interchangeably, radv
will keep overwriting the cache. Use separate cache files to avoid
that.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The bsr instruction modifies flags, so that needs to be indicated to the
compiler. No effect on generated code, but still needed for correctness.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Fix rendering issues on BDW and SKL.
Fixes: 0288fe8d04
("i965/miptree: Use the correct BLT pitch")
Fixes the following regressions seen
exclusively on SKL:
* KHR-GL46.texture_barrier_ARB.disjoint-texels
* KHR-GL46.texture_barrier_ARB.overlapping-texels
* KHR-GL46.texture_barrier.disjoint-texels
* KHR-GL46.texture_barrier.overlapping-texels
and both on BDW and SKL:
* GTF-GL46.gtf21.GL2FixedTests.buffer_corners.buffer_corners
* GTF-GL46.gtf21.GL2FixedTests.stencil_plane_corners.stencil_plane_corners
v2: Note the fixed tests (Andres).
Don't cause failures with multisampled buffers (Andres).
Don't hamper SKL GT4 (Ken).
v3: Fix the Fixes tag (Dylan).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107359
Cc: <mesa-stable@lists.freedesktop.org>
Tested-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Check the destination's row pitch against the BLT engine's row pitch
limitation as well.
Fixes: 0288fe8d04
("i965/miptree: Use the correct BLT pitch")
v2: Fix the Fixes tag (Dylan).
Check the destination row pitch (Chris).
Reported-by: Dylan Baker <dylan@pnwbakers.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It looks like we can't rely on the simulator to always translate virtual
addresses to physical ones correctly. So let's use physical everywhere.
Since our current GGTT maps virtual to physical addresses in a 1:1 way,
no further changes are required.
Additionally, we have other address spaces not in use right now. So
let's make it easier to switch which one we are using but putting the
default one into the aub_file struct.
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Only used, when asserts are enabled.
Fixes an unused-but-set-variable warning with GCC 8:
../../../src/amd/addrlib/r800/egbaddrlib.cpp: In member function 'virtual long long unsigned int Addr::V1::EgBasedLib::HwlGetSizeAdjustmentMicroTiled(unsigned int, unsigned int, ADDR_SURFACE_FLAGS, unsigned int, unsigned int, unsigned int, unsigned int*, unsigned int*) const':
../../../src/amd/addrlib/r800/egbaddrlib.cpp:4111:13: warning: variable 'physicalSliceSize' set but not used [-Wunused-but-set-variable]
UINT_64 physicalSliceSize;
^~~~~~~~~~~~~~~~~
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Only used, when asserts are enabled.
Fixes an unused-variable warning with GCC 8:
../../../src/amd/addrlib/r800/egbaddrlib.cpp: In member function 'int Addr::V1::EgBasedLib::SanityCheckMacroTiled(ADDR_TILEINFO*) const':
../../../src/amd/addrlib/r800/egbaddrlib.cpp:982:13: warning: unused variable 'numPipes' [-Wunused-variable]
UINT_32 numPipes = HwlGetPipes(pTileInfo);
^~~~~~~~
v2: Don't realign other variable definitions, to keep in line with file
style (Marek)
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Only used, when asserts are enabled.
Fixes an unused-variable warning with GCC 8:
../../../src/amd/addrlib/gfx9/gfx9addrlib.cpp: In member function 'ADDR_E_RETURNCODE Addr::V2::Gfx9Lib::ComputeStereoInfo(const ADDR2_COMPUTE_SURFACE_INFO_INPUT*, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT*, unsigned int*) const':
../../../src/amd/addrlib/gfx9/gfx9addrlib.cpp:3879:34: warning: unused variable 'pEqToCheck' [-Wunused-variable]
const ADDR_EQUATION *pEqToCheck = &m_equationTable[eqIndex];
^~~~~~~~~~
v2: Don't realign other variable definitions, to keep in line with file
style (Marek)
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Only used, when asserts are enabled.
Fixes an unused-but-set-variable warning with GCC 8:
../../../src/amd/addrlib/gfx9/gfx9addrlib.cpp: In member function 'virtual ADDR_E_RETURNCODE Addr::V2::Gfx9Lib::HwlComputeBlock256Equation(AddrResourceType, AddrSwizzleMode, unsigned int, ADDR_EQUATION*) const':
../../../src/amd/addrlib/gfx9/gfx9addrlib.cpp:2473:15: warning: variable 'microBlockDim' set but not used [-Wunused-but-set-variable]
Dim2d microBlockDim = Block256_2d[elementBytesLog2];
^~~~~~~~~~~~~
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Only used, when asserts are enabled.
Fixes an unused-but-set-variable warning with GCC 8:
../../../src/amd/addrlib/addrinterface.cpp: In function 'int ElemGetExportNorm(ADDR_HANDLE, const ELEM_GETEXPORTNORM_INPUT*)':
../../../src/amd/addrlib/addrinterface.cpp:835:23: warning: variable 'returnCode' set but not used [-Wunused-but-set-variable]
ADDR_E_RETURNCODE returnCode = ADDR_OK;
^~~~~~~~~~
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is available through a "Show URB" button on the 3DPRIMITIVE
instructions.
v2: Fix urb allocation end value in tooltip (Rafael)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
A graphical user interface version of aubinator.
Allows you to :
- simultaneously look at multiple points in the aub file (using all
the goodness of the existing decoding in aubinator)
- edit an aub file
v2: Switch from GLFW to GTK+3
v3: Fix warning when exiting
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rafael Antognolli <rafael.antognolli@intel.com> (v1)
We want to add a new UI tool to decode aub files. This will use the
Dear ImGui library to render its interface. The build of this UI
toolkit is conditional to -Dwith_tools=intel-ui which superseeds
-Dwith_tools=intel.
The main way to use ImGui is to embed its source code at a particular
revision. Most embedding projects have to do a bit of integration
which is really specific to one's project. In our case the only
modification is to include libepoxy. We also choose to use Gtk+3 for
the window system integration. As oppose to the previous previous
version of this patch using GLFW, Gtk+ is able to handle X11/Wayland
session as well as property DPI scaling on retina monitors.
The import was done at this commit (https://github.com/ocornut/imgui) :
commit 6211f40f3d903dd9df961256e044029c49793aa3
Author: omar <omarcornut@gmail.com>
Date: Fri Jul 27 12:29:33 2018 +0200
Internals: Drag and Drop: default drop preview use a narrower clipping rectangle (no effect here, but other branches uses a narrow clipping rectangle that was too small so this is a fix for it) + Comments
v2: Switch from GLFW to GTK+ (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>
When we map a PPGTT buffer into a continous address space of aubinator
to be able to inspect it, we currently add it to the list of BOs to
unmap once we're finished. An optimization we can apply it to look up
that list before trying to remap PPGTT buffers again (we already do
this for GGTT buffers).
We need to take some care before doing this because the list also
contains GGTT BOs. As GGTT & PPGTT are 2 different address spaces, we
can have matching addresses in both that point to different physical
locations.
This changes adds a flag on the elements of the list of mapped BOs to
differenciate between GGTT & PPGTT, which allows use to reuse that
list when looking up both address spaces.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
This will allow the aubinator viewer tool to modify the aub data that
was loaded at a particular gtt address.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We are testing the behaviour of a tool, for different input files, each
one using a different newline sequence. ('\n' on UNIX, '\r\n' on
Windows, …)
Unfortunately, when opening a file in text mode, Python 3 will by
default enable the "universal newlines" mode, which means it replaces
all the known newline sequences by '\n'.
This (usually useful) behaviour breaks the tests, which are specifically
trying to handle files with newline sequences different from '\n'.
Disabling the universal newlines mode fixes the tests.
However, to keep the script compatible with both Python 2 and 3, we must
use the io.open() function instead of the open() builtin, as the latter
only knows about the `newline` argument on Python 3.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Python 3 does not automatically convert from bytes to unicode strings
like Python 2 used to do.
This commit makes sure we pass unicode strings to difflib.unified_diff,
so that the script works on both Python 2 and 3.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
v2: - explicitly decode the output of subprocesses
- handle bytes and string types consistently rather than relying on
python 2's coercion for bytes and ignoring them in python 3
v3: - explicitly set encode as well as decode
- python 2.7 and 3.x `bytes` instead of defining an alias
Reviewed-by: Mathieu Bridon <bochecha@daitauha.fr>
This extension can be supported on SKL+. With this patch,
all corresponding tests (6K+) in CTS can pass. No test fails.
I verified CTS with the command below:
deqp-vk --deqp-case=dEQP-VK.pipeline.sampler.view_type.*reduce*
v2: 1) support all depth formats, not depth-only formats, 2) fix
a wrong indention (Jason).
v3: fix a few nits (Lionel).
v4: fix failures in CI: disable sampler reduction when sampler
reduction mode is not specified via this extension (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It fixes simulator warnings in vulkancts tests complaining about missing
support for headerless sampler messages for pre-emptable contexts.
Bit 5 in SAMPLER MODE register is newly introduced for ICLLP.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen 11 workarounds table #2056 WABTPPrefetchDisable suggests to
disable prefetching of binding tables for ICLLP A0 and B0
steppings. We have a similar patch for i965 driver in Mesa
commit a5889d70.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It fixes simulator warnings in piglit tests complaining about missing
support for headerless sampler messages for pre-emptable contexts.
Bit 5 in SAMPLER MODE register is newly introduced for ICLLP.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With the current code, we didn't do the space checks prior
to atomic counter setup emission, but we also didn't add
atomic counters to the space check so we could get a flush
later as well.
These flushes would be bad, and lead to problems with
parallel tests. We have to ensure the atomic counter copy in,
draw emits and counter copy out are kept in the same command
submission unit.
This reworks the code to drop some useless masks, make the
counting separate to the emits, and make the space checker
handle atomic counter space.
[airlied: want this in 18.2]
Fixes: 06993e4ee (r600: add support for hw atomic counters. (v3))
We need to handle the gaps in the streamout bindings on the guest
side and enable if it the host has the rest enabled.
Reviewed-by: Jakob Bornecrantz <jakob@collabora.com>
Testing:
- Manually tested a low-latency handwriting demo that toggles
EGL_RENDER_BUFFER. Toggling changed the display latency as expected.
Used Android on Chrome OS, Kabylake GT2.
- No change in dEQP-EGL.functional.* on Fedora 27, Wayland, Skylake
GT2. Used deqp at tag android-p-preview-5.
- No regressions in dEQP-EGL.functional.*, ran on Android on Chrome
OS, Kabylake GT2. Some dEQP-EGL.functional.mutable_render_buffer.*
test change from NotSupported to Pass.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Specifically, implement the extension DRI_MutableRenderBufferLoader.
However, the loader enables EGL_KHR_mutable_render_buffer only if the
DRI driver implements its half of the extension,
DRI_MutableRenderBufferDriver.
Testing:
- No change in dEQP-EGL.functional.* on Fedora 27, Wayland, Skylake
GT2. Used deqp at tag android-p-preview-5.
- No change in dEQP-EGL.functional.*, ran on Android on Chrome OS,
Kabylake GT2.
- Manually inspected Android apps on same Chrome OS device.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Use a helper to avoid the common issues of upcasting after the right shift
(losing the upper bits) and shifting signed values (sign gets shifted too).
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This file is regenerated at build time anyway, so this would just get
overwritten anyway. No reason to ship it in the tarball.
Fixes: 44df06211c "autotools: include git_sha1.h in dist tarball"
Fixes: 471f708ed6 "git_sha1: simplify logic"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The git core.autocrlf setting defaults to true (ie, all text files get
checked out as CRLF on Windows), except on Appveyor where's set to
"input" (ie, all text files get checked out with the upstream
repository's line endings, which for us typically means LF.)
And this was masking on Appveyor a regression in gen_xmlpool.py
processing t_options.h with CRLF line endings.
This change makes core.autocrlf to be true, which would have enabled to
immediately catch the issue, as seen in
https://ci.appveyor.com/project/jrfonseca/mesa/build/51
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This was added as a workaround for Heaven 3.0 but was later removed
by 5ead448719 to allow Heaven 4.0 to work correctly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The main purpose for having NV_fragment_shader_interlock
extension is because that extension is also for GLES31 while
the ARB extension is for GL only.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
We could still have batches queued up to flush, so fd_context_destroy()
(which will kill and sync on the flush_queue) before deleting buffers
that might be referenced from fdN_gmem() from context of flush_queue.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Only used, when asserts are enabled.
Fixes an unused-variable warning with GCC 8:
../../../src/intel/common/gen_decoder.c: In function 'gen_spec_load':
../../../src/intel/common/gen_decoder.c:535:47: warning: variable 'total_length' set but not used [-Wunused-but-set-variable]
uint32_t text_offset = 0, text_length = 0, total_length;
^~~~~~~~~~~~
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Only used, when asserts are enabled.
Fixes an unused-variable warning with GCC 8:
../../../src/intel/tools/aubinator.c: In function 'ensure_phys_mem':
../../../src/intel/tools/aubinator.c:209:11: warning: unused variable 'ftruncate_res' [-Wunused-variable]
int ftruncate_res = ftruncate(mem_fd, mem_fd_len += 4096);
^~~~~~~~~~~~~
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Only used, when asserts are enabled.
Fixes an unused-but-set-variable warning with GCC 8:
../../../src/intel/tools/aubinator_error_decode.c: In function 'main':
../../../src/intel/tools/aubinator_error_decode.c:759:11: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
int ret;
^~~
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This fixes a regression with some Unity demos. Not sure
what the root cause of the problem is, especially because
the driver doesn't perform any fast color clears. So, it
shouldn't be needed to decompress DCC. RadeonSI says that
the decompression is relatively cheap if the surface has
been decompressed already.
One possible improvement is to two use predicates, one for
DCC and one for FCE that could be cleared when DCC, FMASK
or CMASK are performed by the driver. That might skip some
unnecessary decompression passes (not DCC though).
Fixes: ff7daadca1 ("radv: enable/disable predication for the DCC decompression pass")
CC: 18.2 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107563
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Patch implements common bits for EXT_surface_SMPTE2086_metadata
and EXT_surface_CTA861_3_metadata extensions by adding new required
attributes and eglQuerySurface + eglSurfaceAttrib changes.
Currently none of the drivers are utilizing this data but this patch
is enabler in getting there.
v2: don't enable extension globally, should be only enabled by
EGL drivers that can transfer metadata to the window system (Jason)
use EGLint instead of uint16_t (Eric)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Supresses a maybe-uninitialized warning with GCC 8.
Note: image_index should always be initialised due to the result check,
but the compiler doesn't see that.
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Only used, when asserts are enabled.
Fixes an unused-variable warning with gcc-8:
../../../src/compiler/nir/nir_opt_if.c: In function 'opt_peel_loop_initial_if':
../../../src/compiler/nir/nir_opt_if.c:109:15: warning: unused variable 'prev_block' [-Wunused-variable]
nir_block *prev_block =
^~~~~~~~~~
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Only used, when asserts are enabled.
Fixes an unused-variable warning with gcc-8:
../../../src/util/half_float.c: In function '_mesa_half_to_unorm8':
../../../src/util/half_float.c:189:14: warning: unused variable 's' [-Wunused-variable]
const int s = (val >> 15) & 0x1;
^
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
For some reason wine will sometimes give us a windows style path
for an application. For example when running the 64bit version
of Rage wine gives a Unix style path, but when running the 32bit
version is gives a windows style path.
If we detect no '/' in the path at all it should be safe to
assume we have a wine application and instead look for a '\'.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The innermost check was added to stop us from unrolling multiple
loops in a single pass, and to stop outer loops from unrolling.
When we successfully unroll a loop we need to run the analysis
pass again before deciding if we want to go ahead an unroll a
second loop.
However the logic was flawed because it never tried to unroll any
nested loops other than the first innermost loop it found.
If this innermost loop is not unrolled we end up skipping all
other nested loops.
This unrolls a loop in a Deus Ex: MD shader on ultra settings and
also unrolls a loop in a shader from the game Prey when running
on DXVK.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At the moment, depending on pipe transfer flags, the dumb
buffer map address can end up at either kms_sw_dt->ro_mapped
or kms_sw_dt->mapped.
When it's time to unmap the dumb buffer, both locations get unmapped,
even though one is probably initialized to 0.
That leads to the code segment getting unmapped at runtime and
crashes when trying to call into unrelated code.
This commit addresses the problem by using MAP_FAILED instead of
NULL for ro_mapped and mapped when the dumb buffer is unmapped,
and only unmapping mapped addresses at unmap time.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107098
Signed-off-by: Ray Strode <rstrode@redhat.com>
Fixes: d891f28df9 ("gallium/winsys/kms: Fix possible leak in map/unmap.")
Cc: Lepton Wu <lepton@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
drirc implementation of MESA_LOADER_DRIVER_OVERRIDE which can be
used to override dri driver to load.
Usage:
override dri driver for device with spec kernel driver name:
<device kernel_driver="kernel_driver_name">
<option name="dri_driver" value="new_dri_driver" />
</device>
or
<device driver="loader" kernel_driver="kernel_driver_name">
<option name="dri_driver" value="new_dri_driver" />
</device>
v2:
add kernel_driver device attribute to specify kernel
driver name instead of reuse driver attribute
v3:
seperate loader_get_kernel_driver_name into another patch
seperate add kernel_driver attribute into another patch
Suggested-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Qiang Yu <Qiang.Yu@amd.com>
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
[v4 Emil: add HAVE_LIBDRM guard around __driConfigOptionsLoader and
loader_get_dri_config_driver]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Driver and application can put their drirc files in
${datadir}/drirc.d/ with name xxx.conf. Config files
will be read and applied in file name alphabetic order.
So there are three places for drirc listed in order:
1. /usr/share/drirc.d/
2. /etc/drirc
3. ~/.drirc
v4:
fix meson build
v3:
1. seperate driParseConfigFiles refine into another patch
2. fix entries[i] mem leak
v2:
drop /etc/drirc.d
Signed-off-by: Qiang Yu <Qiang.Yu@amd.com>
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This allows us to use the link-optimized shader for determining binding
table layouts and, more importantly, URB layouts. For apps running on
DXVK, this is extremely important as DXVK likes to declare max-size
inputs and outputs and this lets is massively shrink our URB space
requirements.
VkPipeline-db results (Batman pipelines only) on KBL:
total instructions in shared programs: 820403 -> 790008 (-3.70%)
instructions in affected programs: 273759 -> 243364 (-11.10%)
helped: 622
HURT: 42
total spills in shared programs: 8449 -> 5212 (-38.31%)
spills in affected programs: 3427 -> 190 (-94.46%)
helped: 607
HURT: 2
total fills in shared programs: 11638 -> 6067 (-47.87%)
fills in affected programs: 5879 -> 308 (-94.76%)
helped: 606
HURT: 3
Looking at shaders by hand, it makes the URB between TCS and TES go from
containing 32 per-vertex varyings per tessellation shader pair to a more
reasonable 8-12. For a 3-vertex patch, that's at least half the URB
space no matter how big the patch section is.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We want these to be set as close to the final compile as possible so
that they are guaranteed to happen after nir_shader_gather_info is
called. The next commit is going to move nir_shader_gather_info to
after the linking step which makes this necessary.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This commit makes three changes. One is to only walk the descriptors once
and set bind map sizes at the same time as filling out the entries. The
second is to make the pass additive so that we can put stuff in the bind
map before applying the pipeline layout. Third, we switch to using
designated initializers.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Because lower_ycbcr gets called before apply_pipeline_layout, the
indices are all logical and the binding layout HW size is actually too
big for the bounds check. We should just use the regular logical array
size instead.
Fixes: f3e91e78a3 "anv: add nir lowering pass for ycbcr textures"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In commit bd27203f4d we changed this to
open in binary mode, to then explicitly decode the lines with the right
encoding.
Unfortunately, that broke the build on Windows, where the template file
can have '\r\n' as line terminators: opening in binary mode would keep
those terminators and break the regexp.
We need to go back to text mode, where the "universal newlines" mode
takes care of this.
However, to fix the initial issue, let's specify the encoding explicitly
when opening the file, and make sure it is open in text mode, so we only
get unicode strings.
Reviewed-by: Jose Fonseca <jfonseca@vmware>
When the number of unique BO is 0, we optimize the list creation
by copying all buffers of the current CS directly into it. But
this is only valid if the CS doesn't have virtual buffers,
otherwise they are not added and hw might report VM faults.
This fixes VM faults with:
dEQP-VK.sparse_resources.image_sparse_binding.2d.rgba8ui.1024_128_1
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This adds a freedreno backend for the a6xx generation GPUs, which at
the time of this commit is about 98% GLES2 conformant. Much remains to
be done - both performance work and feature work towards more recent
GLES versions, but this is a good start.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
less than 2.7 is not supported.
v2: - Remove check for python >= 2.0, since we've already enforced 2.7
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This handy helper is nice for OSes that are not linux or BSD like (mac
and windows) as it knows how to find python3 in odd places.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
It's what autotools has required for a long time.
v3: - Use distutils.version.StrictVersion instead of comparing strings
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This was missing when VK_EXT_conditional_rendering has been
implemented. The predication type should be -1 to avoid
restoring previous state when performing a decompression pass
with DCC enabled.
Note that we don't have to handle secondary command buffers
because we don't support this feature currently.
CC: 18.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Seems like DXVK depends on that and it might get reverted
upstream. Since apps are not supposed to use 0 in v2 anyway,
we should be safe implementing the old behavior there.
Fixes: 66e12451ac "radv: Update to new VK_EXT_vertex_attribute_divisor to version 2."
CC: 18.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Seems that in a single case we use the renderpass before checking
the pipeline, so check the renderpass before we use it.
Fixes: fbcd167314 "radv: Add on-demand compilation of built-in shaders."
Tested-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
When moving the array sizes from the old list to the new one it was
not taken into account that the array indices start with one, but the
array_size array started at index zero, which resulted in incorrect array
sizes when arrays were merged. Correct this by copying the array_size
values of the retained arrays with an offset of -1.
Also fix whitespaces for the replaced lines.
Fixes: d8c2119f9b
mesa/st/glsl_to_tgsi: Expose array live range tracking and merging
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Commit 4434591bf5 caused substantially more URB messages in
geometry and tessellation shaders. Before we can really enable this
sort of optimization, We either need some way of combining them back
together into vectors or we need to do cross-stage vector element
elimination without splitting everything into scalars.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
Fixes: 4434591bf5 "intel/nir: Call nir_lower_io_to_scalar_early"
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
If called with an empty size, brw_emit_buffer_surface_state asserts.
We already have a dedicated helper for uploading nothing, so let's use
that instead.
Avoids an assert in
dEQP-GLES31.functional.shaders.opaque_type_indexing.ssbo.const_literal_vertex
when running a debug build of i965.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Kernel (for ppgtt) requires memory address to be
aligned to page size (4096).
-v2: added marking that also fixes initial commit 01058a5522.
-v3: numbers replaced by PAGE_SIZE; buffer-object size is aligned
instead of alignment of offsets (Chris Wilson).
-v4: changes related to PAGE_SIZE moved to separate commit
-v5: restored alignment to page-size for 0-size.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106997
Fixes: a363bb2cd0 (i965: Allocate VMA in userspace for full-PPGTT systems.)
Fixes: 01058a5522 (i965: Add virtual memory allocator infrastructure to brw_bufmgr.)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of doing conservative guesses, we should report the max levels
based on the max sizes we get from GL on the host.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jakob Bornecrantz <jakob@collabora.com>
These macro-names are also used for softpipe, so let's avoid confusion
by avoiding them. Besides, they are just used in one place in virgl, so
let's just inline them into the place they are used instead.
While we're at it, fixup an error in the comment for the 3D version.
Mesa subtracts computes max-size by doing by 2^(n-1), which means this
should be 256 cubed, not 512 cubed. The other comments are correct.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jakob Bornecrantz <jakob@collabora.com>
The last parameter of radeon_set_sh_reg_seq() is the number of
dwords to emit. We were lucky because WAVES_PER_SH(0x3) is 3 but
it was initialized to 0.
COMPUTE_RESOURCE_LIMITS is correctly set when generating
compute pipelines, so we don't need to initialize it.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This patch fixes a regression in mesa 18.2 and mesa-dev branches
for HAVE_DRM_GRALLOC code path which is causing black screen on Android
and prevents boot due to SIGSEGV MAPERR crash related to unproper handling
of drm_gralloc drm FD in new droid_open_device() path.
Problem is due to c7bb82136b ("egl/android: Add DRM node probing and filtering")
To avoid the crash the former existing working droid_open_device() is restored,
renamed droid_open_device_drm_gralloc() and kept within HAVE_DRM_GRALLOC braces.
Tested with mesa-dev and mesa 18.2 branch and oreo-x86 bootanimation
and Androdi GUI booting is fixed with i965, nouveau, radeon.
The changes are compatible with gbm_gralloc, I've tested build with hwc too.
(v2) remove indentation from HAVE_DRM_GRALLOC pre-processor directive
NOTE: Definition of enum{} for GRALLOC_MODULE_PERFORM_GET_DRM_FD
is not necessary and it's actually causing a redefinition building error,
because in HAVE_DRM_GRALLOC path gralloc_drm.h is already exported
by libgralloc_drm which is currently still a dependency.
Fixes: c7bb82136b ("egl/android: Add DRM node probing and filtering")
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Starting with a6xx, half and full precision registers conflict. Which
makes things a bit more efficient, ie. if some parts of the shader are
heavy on half-precision and others on full precision, you don't have to
allocate the worst case for both. But it means we need to setup some
additional conflicts.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Collapse is_temp() into it's only callsite, and pass compiler object as
struct rather than void. Just cleanups to reduce noise in next patch.
Signed-off-by: Rob Clark <robdclark@gmail.com>
We originally did this because at the time we didn't know all the
bitfields to configure where various frag shader sysval's went. But
we do.
So switch to using sysvals for all the frag shader inputs.
Signed-off-by: Rob Clark <robdclark@gmail.com>
create_input()/create_input_compmask() should take the ctx as arg,
rather than block, to enforce that all inputs are created in the first
block, so that RA sees them as live at the start of the shader.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Make it more clear that this is varying fetch related. Also fixup some
comments. Just cleanup for next patches.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Used internally in freedreno/ir3 for the vec2 value that hw passes to
shader to use as coordinate for bary.f (varying fetch) instruction.
This is not the same as SYSTEM_VALUE_FRAG_COORD.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Behavior wrt firstInstance got changed, and a divisor of 0 has been
disallowed.
The new version of the ext got published in specification 1.1.81.
Sending to stable since the only known user is DXVK, which needs
this for correctness.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
CC: 18.2 <mesa-stable@lists.freedesktop.org>
Following patches will be doing further cleanup after calling
fd_context_destroy() so it is easier if we move the free() into
the per-gen backend code.
Signed-off-by: Rob Clark <robdclark@gmail.com>
this patch brings a number of changes to ir2:
-ir2 now generates CF clauses as necessary during assembly. this simplifies
fd2_program/fd2_compiler and is necessary to implement optimization passes
-ir2 now has separate vector/scalar instructions. this will make it easier
to implementing scheduling of scalar+vector instructions together. dst_reg
is also now seperate from src registers instead of a single list
-ir2 now implements register allocation. this makes it possible to compile
shaders which have more than 64 TGSI registers
-ir2 now implements the following optimizations: removal of IN/OUT MOV
instructions generated by TGSI and removal of unused instructions when
some exports are disabled
-ir2 now allows full 8-bit index for constants
-ir2_alloc no longer allocates 4 times too many bytes
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@gmail.com>
In environments where we cannot cache, e.g. Android (no homedir),
ChromeOS (readonly rootfs) or sandboxes (cannot open cache), the
startup cost of creating a device in radv is rather high, due
to compiling all possible built-in pipelines up front. This meant
depending on the CPU a 1-4 sec cost of creating a Device.
For CTS this cost is unacceptable, and likely for starting random
apps too.
So if there is no cache, with this patch radv will compile shaders
on demand. Once there is a cache from the first run, even if
incomplete, the driver knows that it can likely write the cache
and precompiles everything.
Note that I did not switch the buffer and itob/btoi compute pipelines
to on-demand, since you cannot really do anything in Vulkan without
them and there are only a few.
This reduces the CTS runtime for the no caches scenario on my
threadripper from 32 minutes to 8 minutes.
Reviewed-by: Dave Airlie <airlied@redhat.com>
From AppVeyor #8582, it seems that MSVC doesn't like uint, so this
patch replaces it with unsigned.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Meson now uses python3, so let's add a block for Autotools, move that
line into the buildsys-specific blocks, and set the correct version for
Meson.
Fixes: 2ee1c86d71 "meson: Build with Python 3"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
This is clearly a copy-paste error; if we validate the reladdr2-pointer,
we don't want to traverse to the reladdr-pointer. Especially since the
check above shows that reladdr could be NULL here.
Noticed by Coverity.
CID: 1438389, 1438390
Fixes: 568bda2f2d ("mesa/st/glsl_to_tgsi: Split arrays whose elements are only accessed directly")
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gert Wollny <gw.fossdev@gmail.com>
Instead of using the copy of shader_info stored in gl_program, it now
uses the one in nir_shader. This is needed for SPIR-V because the
info.tess.tcs_vertices_out is filled in via _mesa_spirv_to_nir which
happens much later than with a GLSL shader. The copy of shader_data in
gl_program is only updated later via brw_shader_gather_info but that
is too late.
For GLSL this shouldn't create any problems because the nir copy of
the shader_info is immediately copied from the gl_program in
glsl_to_nir.
v2: updated after commit "i965: Combine both gl_PatchVerticesIn
lowering passes." (488972) (Alejandro Piñeiro)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The value is copied from the gl_program. If we don’t do this then it
will get reset back to zero in brw_shader_gather_info. This isn’t a
problem for GLSL because in that case the nir_shader is initialised
with a copy of the shader_info from the gl_program.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is the same we do for vulkan drivers
This is needed to pass the following CTS test:
KHR-GL45.gl_spirv.spirv_modules_shader_binary_multiple_shader_objects_test
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
input locations used by input attributes are not handled in the same
way in OpenGL vs Vulkan. There is a detailed explanation of such
differences on the following commit:
c2acf97fcc
So with this commit, the same adjustment that is done after
glsl_to_nir, is being done after spirv_to_nir, when it is used on
OpenGL (ARB_gl_spirv).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
After commit "nir: Use derefs in nir_lower_samplers"
(75286c2d08) assumes one deref for both
the texture and the sampler. However there are cases (on OpenGL, using
ARB_gl_spirv) where SPIR-V is not providing a sampler, like for
texture query levels ops. Although we could make spirv_to_nir to
provide a sampler deref for those cases, it is not really needed, and
wrong from the Vulkan point of view.
This patch fixes the following (borrowed) tests run on SPIR-V mode:
arb_compute_shader/execution/basic-texelFetch.shader_test
arb_gpu_shader5/execution/sampler_array_indexing/fs-simple-texture-size.shader_test
arb_texture_query_levels/execution/fs-baselevel.shader_test
arb_texture_query_levels/execution/fs-maxlevel.shader_test
arb_texture_query_levels/execution/fs-miptree.shader_test
arb_texture_query_levels/execution/fs-nomips.shader_test
arb_texture_query_levels/execution/vs-baselevel.shader_test
arb_texture_query_levels/execution/vs-maxlevel.shader_test
arb_texture_query_levels/execution/vs-miptree.shader_test
arb_texture_query_levels/execution/vs-nomips.shader_test
glsl-1.30/execution/fs-textureSize-compare.shader_test
v2: merge lower_tex_src_to_offset and calc_sampler_offsets together,
update texture/sampler index and texture_array_size directly on
lower_tex_src_to_offset (Jason)
v3: clarify one comment (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
So they are not exposed through the introspection API.
It is worth to note that the number of hidden uniforms of GLSL linking
vs SPIR-V linking would be somewhat different due the differen order
of the nir lowerings/optimizations.
For example: gl_FbWposYTransform. This is introduced as part of
nir_lower_wpos_ytransform. On GLSL that is executed after the IR-based
linking. So that means that on GLSL the UniformStorage will not
include this uniform. With the SPIR-V linking, that uniform is already
present, but marked as hidden. So it will be included on the
UniformStorage, but as hidden.
One alternative would create a special how_declared for that case, but
seemed an overkill. Using hidden should be ok as far as it is used
properly.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Equivalent to the already existing how_declared at GLSL IR. The only
difference is that we are not adding all the declaration_type
available on GLSL, only the one that we will use on the short term. We
would add more mode if needed on the future.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
GLSL has gl_VertexID which is supposed to be non-zero-based.
SPIR-V has both VertexIndex and VertexId builtins whose meanings are
defined by the APIs.
Vulkan defines VertexIndex as being non-zero-based. In Vulkan VertexId
and InstanceId have no meaning and are pretty much just reserved for
OpenGL at this point.
GL_ARB_spirv removes VertexIndex and defines VertexId to be the same
as gl_VertexId (which is also non-zero-based).
Previously in Mesa it was treating VertexIndex as non-zero-based and
VertexId as zero-based, so it was breaking for GL. This behaviour was
apparently based on Khronos bug 14255. However that bug doesn’t seem
to have made a final decision for VertexId.
Assuming there really is no other definition for VertexId for Vulkan
it seems better to just make them both have the same value.
v2: update comment and commit descriptions, based on Jason Ekstrand
explanation of the meaning/rationale behind all those builtins
(Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
info.gs.output_primitive was already being filled. Not sure why this
is not needed on Vulkan, but we found to be needed for
ARB_gl_spirv. Specifically, this is needed to get the following test
passing:
KHR-GL45.gl_spirv.spirv_validation_builtin_variable_decorations_test
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Patch sets additional formats renderable and enables the extension
when OpenGL ES 3.1 is supported.
v2: instead of dummy_true, have a separate toggle for extension
(Eric Anholt)
v3: add missing checks, simplify some existing checks and fix
glCopyTexImage2D check (Nanley Chery)
add SHORT and BYTE support in read_pixels_es3_error_check
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
One of the reasons we didn't notice that R24_UNORM_X8_TYPELESS
destinations were broken was that an earlier layer was swapping it
out for B8G8R8A8_UNORM. That made Z24X8 -> Z24X8 blits work.
However, R32_FLOAT -> R24_UNORM_X8_TYPELESS was still totally broken.
The old code only considered one format at a time, without thinking
that format conversion may need to occur.
This patch moves the translation out to a place where it can consider
both formats. If both are Z24X8, we continue using B8G8R8A8_UNORM to
avoid having to do shader math workarounds. If we have a Z24X8
destination, but a non-matching source, we use our shader hacks to
actually render to it properly.
Fixes: 804856fa57 (intel/blorp: Handle more exotic destination formats)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The hardware doesn't support rendering to R24_UNORM_X8_TYPELESS, so
Jason decided to fake it with a bit of shader math and R32_UNORM RTs.
The only problem is that R32_UNORM isn't renderable either...so we've
just traded one bad format for another.
This patch makes us use R32_UINT instead.
Fixes: 804856fa57 (intel/blorp: Handle more exotic destination formats)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This patch ties in the array split, merge, and interleave code.
shader-db changes in the TGSI code are:
original code | array-merge | change
mean max | mean max | best mean % worst
-----------------------------------------------------------
arrays 0.05 2 | 0.00 0 | -2 -100 0
total temps 5.05 21 | 4.92 20 | -15 -2.59 1
instr 55.33 988 | 55.20 988 | -15 -0.24 0
Evaluation:
Run shader-db in single thread mode (otherwise the output is
not ordered and the best and worst column don't make sense) to
get results pre-stats.txt and post-stats.txt. Then using
python pandas:
import pandas as pd
old_stats = pd.read_csv('pre-stats.txt')
new_stats = pd.read_csv('post-stats.txt')
omean = old_stats.mean()
omax = old_stats.max()
nmean = new_stats.mean()
nmax = new_stats.max()
delta = new_stats - old_stats
pd.concat([omean, omax, nmean, nmax, delta.min(),
delta.mean()/old_stats.mean()*100, delta.max()],
axis=1, keys=['mean', 'max', 'mean', 'max', 'best',
'avg change %', 'worst'])
v4: - Correct typo and add bugs that are fixed by this series.
- Update stats and describe stats evaluation
Bugzilla:
https://bugs.freedesktop.org/show_bug.cgi?id=105371https://bugs.freedesktop.org/show_bug.cgi?id=100200
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
v4: Also track the register given in inst->resource. (thanks: Benedikt Schemmer
for testing the patches on radeonsi, which revealed that I was missing
tracking this)
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Because of the indirect access it is impossible to obtain an accurate per
component and array element tracking. Therefore, the tracking is simplified
to only track whether any element was accessed, whether this happend
conditionally in a loop. In addition, while tracking of temporaries requires
a per-componet tracking that is later fused, for arrays only the components
access mask is neede. The resulting tracking code and evaluation of the array
live range is sufficiently different from the evaluation of the live range of
temporaries to justify implementing this in a different class instead of
adding more complexity to the already existing code for temporary life
range evaluation.
v4: Update commit message to make it clearer why this class is seperate from
the tracking of temporaries.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
In preparation of the array live range tracking the evaluation of the read
mask is moved out the register live range tracking to the enclosing call
of the generalized read access tracking.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
In preparartion of adding the tracking of the live range the classes that refer
to temporary registers are renamed.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
v2: - Define tests also in the meson.build file.
v4: - Check no-op mapping of all bits.
- Convert tests to the new class layout used in the merge evaulation.
- remove dependency on llvm in meson build (Thanks Dylan Baker for pointing
out that this might not needed)
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
v4: - Update the code to use the new merge logic.
- Use a cleaner, class-based approach for the evaluation of merges.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
v4: - Remove logic for evaluation of swizzles and merges since this
was moved to array_live_range. This class now only handles the
actual remapping.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
This class holds the array length, live range, and accessed components, and
it implements the logic for evaluating how arrays are merged and interleaved.
v4: - Add logic to evaluate merge and interleave of a pair of arrays to
the class array_live_range.
- document class
- update commit message
Thanks Nicolai Hähnle for the pointers given.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
On one hand "live range" is the term used in the literature, and on the
other hand a distinction is needed from the array live ranges.
v4: Fix indentions and white spaces
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v3)
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
in constructs like below, currently the live range estimation extends the live range
of t unecessarily to the whole loop because it was not detected that t is
unconditional written and later read only in the "if (a)" scope.
while (foo) {
...
if (a) {
...
if (b)
t = ...
else
t = ...
x = t;
...
}
...
}
This patch adds a unit test for this case and corrects the minimal live range estimation
accordingly.
v4: update comments
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Array whose elements are only accessed directly are replaced by the
according number of temporary registers. By doing so the otherwise
reserved register range becomes subject to further optimizations like
copy propagation and register merging.
Thanks to the resulting reduced register pressure this patch makes
the piglits
spec/glsl-1.50/execution -
variable-indexing/vs-output-array-vec3-index-wr-before-gs
geometry/max-input-components
pass on r600 (barts) where they would fail before with a "GPR limit exceeded"
error (even with the spilling that was recently added).
v2: * rename method dissolve_arrays to split_arrays
* unify the tracking and remapping methods for src and dst registers
* also track access to arrays via reladdr*
v3: * enable this optimization only if the driver requests register merge
v4: * Correct comments
* Also update inst->resource if it is an array element
(thanks: Benedikt Schemmer for testing the patches on radeonsi, which
revealed that I was missing tracking this)
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
When mesa is compiled in debug mode then this adds the possibility
to print out some statistics about the translated and optimized TGSI
shaders to a file.
The functionality is enabled by setting the environment variable
GLSL_TO_TGSI_PRINT_STATS
to the file name where the statistics should be collected. The file is
opened in append mode so that statistics from various runs will be
accumulated.
v4: Make accress to log file thread save (thanks for pointing this out Nicolai
Hähnle)
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
The GLSL operations findLSB, findMSB, and countBits always return
a signed integer type. Let TGSI reflect this.
v2: Properly set values in infer_(src|dst)_type (Thanks Roland
Schneidegger for pointing out problems with my 1st approach)
v2: Set values in the common infer_type code path, and only add
the correct source type for UMSB (Roland Schneidegger)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Now that all the build scripts are compatible with both Python 2 and 3,
we can flip the switch and tell Meson to use the latter.
Since Meson already depends on Python 3 anyway, this means we don't need
two different Python stacks to build Mesa.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
In both Python 2 and 3, opening a file without specifying the mode will
open it for reading in text mode ('r').
On Python 2, the read() method of a file object opened in mode 'r' will
return byte strings, while on Python 3 it will return unicode strings.
Explicitly specifying the binary mode ('rb') then decoding the byte
string means we always handle unicode strings on both Python 2 and 3.
Which in turns means all re.match(line) will return unicode strings as
well.
If we also make expandCString return unicode strings, we don't need the
call to the unicode() constructor any more.
We were using the ugettext() method because it always returns unicode
strings in Python 2, contrarily to the gettext() one which returns
byte strings. The ugettext() method doesn't exist on Python 3, so we
must use the right method on each version of Python.
The last hurdles are that Python 3 doesn't let us concatenate unicode
and byte strings directly, and that Python 2's stdout wants encoded byte
strings while Python 3's want unicode strings.
With these changes, the script gives the same output on both Python 2
and 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
On Python 3, executing `foo != bar` will first try to call
foo.__ne__(bar), and fallback on the opposite result of foo.__eq__(bar).
Python 2 does not do that.
As a result, those __eq__ methods were never called, when we were
testing for inequality.
Expliclty adding the __ne__ methods fixes this issue, in a way that is
compatible with both Python 2 and 3.
However, this means the __eq__ methods are now called when testing for
`foo != None`, so they need to be guarded correctly.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The check for ETC2 compatibility was not updated when the fallback
format was changed.
Fixes: 71867a0a61
st/mesa: Fall back to R8G8B8A8_SRGB for ETC2
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Instead of copying the list, then sorting the copy in-place, we can just
get a new sorted copy directly.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
In Python 2, the traditional way to sort containers was to use a
comparison function (which returned either -1, 0 or 1 when passed two
objects) and pass that as the "cmp" argument to the container's sort()
method.
Python 2.4 introduced key-functions, which instead only operate on a
given item, and return a sorting key for this item.
In general, this runs faster, because the cmp-function has to get run
multiple times for each item of the container.
Python 3 removed the cmp-function, enforcing usage of key-functions
instead.
This change makes the script compatible with Python 2 and Python 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Python 3 lost the long type: now everything is an int, with the right
size.
This commit makes the script compatible with Python 2 (where we check
for both int and long) and Python 3 (where we only check for int).
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Mixing the two is a long-standing recipe for errors in Python 2, so much
so that Python 3 now completely separates them.
This commit stops treating both as if they were the same, and in the
process makes the script compatible with both Python 2 and 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
On Python 2, the builtin functions filter() returns a list.
On Python 3, it returns an iterator.
Since we want to use those objects in contexts where we need lists, we
need to explicitly turn them into lists.
This makes the code compatible with both Python 2 and Python 3.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The code was just reimplementing itertools.combinations_with_replacement
in a less efficient way.
This does change the order of the results slightly, but it should be ok.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
This is basically copied from the DRI2 destroy path. Without this,
Raspberry Pi would quickly run out of CMA during the EGL tests in the CTS
due to all the pixmaps laying around.
Fixes: f35198bade ("egl/x11: Implement dri3 support with loader's dri3 helper")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
When the SIMD16 Gen4-5 fragment shader payload contains source depth
(g2-3), destination stencil (g4), and destination depth (g5-6), the
single register of stencil makes the destination depth unaligned.
We were generating this instruction in the RT write payload setup:
mov(16) m14<1>F g5<8,8,1>F { align1 compr };
which is illegal, instructions with a source region spanning more than
one register need to be aligned to even registers. This is because the
hardware implicitly does (nr | 1) instead of (nr + 1) when splitting the
compressed instruction into two mov(8)'s.
I believe this would cause the hardware to load g5 twice, replicating
subspan 0-1's destination depth to subspan 2-3. This showed up as 2x2
artifact blocks in both TIS-100 and Reicast.
Normally, we rely on the register allocator to even-align our virtual
GRFs. But we don't control the payload, so we need to lower SIMD widths
to make it work. To fix this, we teach lower_simd_width about the
restriction, and then call it again after lower_load_payload (which is
what generates the offending MOV).
Fixes: 8aee87fe4c (i965: Use SIMD16 instead of SIMD8 on Gen4 when possible.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107212
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=13728
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Diego Viola <diego.viola@gmail.com>
According to the G45 PRM Volume 2 Page 265 we're supposed to only set
these signals when there is an actual depth buffer. Note that we
already do this for the stencil buffer by virtue of brw->stencil_enabled
invoking _mesa_is_stencil_enabled(ctx) which checks whether the current
drawbuffer's visual has stencil bits (which is updated based on what
buffers are bound). We just need to do it for depth as well.
Not observed to fix anything.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This extension is not defined for indirect contexts. Marking it as
"client only", as the old code did here, would make the extension
available in indirect contexts, even though the server would certainly
not have it in its extension list.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Previously, we would load out the tile-aligned area, update the raster
copy, and store it back. This was a huge cost for XPutImage calls to the
screen under glamor.
Instead, implement a general load/store path that walks over the source
x/y writing into the corresponding pixel of the destination (using clever
math from
https://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/).
If things are aligned, we go through the previous utile-at-a-time loop.
Improves x11perf -putimage10 performance by 139.777% +/- 2.83464% (n=5)
Improves x11perf -putimage100 performance by 383.908% +/- 22.6297% (n=11)
Improves x11perf -getimage10 performance by 2.75731% +/- 0.585054% (n=145)
For the partial load/store support I'm about to add, we want the memcpy to
be compiled out to a single load/store. This should also eliminate the
calls to vc4_utile_width/height().
Improves x11perf -putimage100 performance by 3.76344% +/- 1.16978% (n=15)
According to EGL 1.5 spec, section 3.10.1.1 ("Native Window Resizing"):
"If the native window corresponding to _surface_ has been resized
prior to the swap, _surface_ must be resized to match. _surface_ will
normally be resized by the EGL implementation at the time the native
window is resized. If the implementation cannot do this transparently
to the client, then *eglSwapBuffers* must detect the change and
resize surface prior to copying its pixels to the native window."
So far, resizing a native window in Wayland/EGL was interpreted in Mesa
as a request to resize, which is not executed until the first draw call.
And hence, surface size is not updated until executing it. Thus,
querying the surface size with eglQuerySurface() after a window resize
still returns the old values.
This commit updates the surface size values as soon as the resize is
done, even when the real resize is done in the draw call. This makes the
semantics that any native window resize request take effect inmediately,
and if user calls eglQuerySurface() it will return the new resized
values.
v2: update surface size if there isn't a back surface (Daniel)
CC: Daniel Stone <daniel@fooishbar.org>
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Daniel Stone <daniels@collabora.com>
When creating a windows surface with eglCreateWindowSurface(), the
width and height returned by eglQuerySurface(EGL_{WIDTH,HEIGHT}) is
invalid until buffers are updated (like calling glClear()).
But according to EGL 1.5 spec, section 3.5.6 ("Surface Attributes"):
"Querying EGL_WIDTH and EGL_HEIGHT returns respectively the width and
height, in pixels, of the surface. For a window or pixmap surface,
these values are initially equal to the width and height of the
native window or pixmap with respect to which the surface was
created"
This fixes dEQP-EGL.functional.color_clears.* CTS tests
v2:
- Do not modify attached_{width,height} (Daniel)
- Do not update size on resizing window (Brendan)
CC: Daniel Stone <daniel@fooishbar.org>
CC: Brendan King <brendan.king@imgtec.com>
CC: mesa-stable@lists.freedesktop.org
Tested-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Like in the autotools target, make the list of drivers to be built in
each of the Meson targets explicit.
This will help to identify missing dependencies and other issues more
easily.
CC: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
instead of the underlying texture's target. This fixes an issue
where the TGSI sampler type was not agreeing with the sampler view
target/type. In particular, this fixes a Mint 19 XFCE desktop
scaling issue because the TGSI code was using a RECT sampler but
the sampler view's underlying texture was PIPE_TEXTURE_2D.
We want to use the sampler view's type rather than the underlying
resource, as we do for the view's surface format.
No piglit regressions.
VMware issue 2156696.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
I'm not sure why we didn't support this in the past, but fillmode
is supported by all renderers nowadays.
Also fix the logic in svga_create_rasterizer_state() to avoid a few
swtnl case.
No piglit regressions
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Fixes failed assertion running Piglit polygon-mode-face test.
Though, the test still does not pass.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Add -fno-math-errno and -fno-trapping-math to the build.
Mesa does not depend on the functionality provided, thus this should
result in slightly faster code and smaller binaries.
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
The toggles were broken with the introduction of --enable-mangling.
Fixing that up might be possible, but it's not worth the complexity
since one can rename the libraries at any point.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
The respective drivers have been updated and the helpers are no longer
needed.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
In LLVM <6.0 we added explicitly libedit-dev, as it was required to
satisfy apt dependencies.
In LLVM 6.0, this is not required anymore, so let's remove it.
CC: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The virgl driver cares about the writable-flag on image definitions,
because it re-emits GLSL from the TGSI. However, so far it was hardcoded
to true in glsl_to_tgsi, which cause problems when virglrenderer is
running on top of GLES 3.1, where not all formats are supported for
writable images.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We won't have an FD if we're just having the server wait on a fence
created by eglCreateSyncKHR(). Our seqno fences will happen in order, so
server-side waits are no-ops in that case. Fixes
dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_server_sync.buffers.gen_delete
Fixes: b0acc3a562 ("broadcom/vc4: Native fence fd support")
Reduces the size of vc4_uniforms.o by about 10%. We would basically
always end up loading the cachline of uinfo->data[i] anyway, so it should
be good for performance as well as making the code a bit cleaner.
The HW only executes a load once the tile coordinates packet happens, and
only tracks one at a time, so by emitting our two MSAA loads back to back
we would end up with an undefined color or Z buffer. The simulator
doesn't seem to care, but sync up the RCL generation with the kernel
anyway.
Fixes dEQP-EGL.functional.render.multi_context.gles2.rgb888_window
Fixes texturing from EGL images created from cubemap faces, as in
dEQP-EGL.functional.image.create.gles2_cubemap_negative_x_rgba_texture
Cc: mesa-stable@lists.freedesktop.org
We're trying to write a unicode string (i.e decoded) to a file opened
in binary (i.e encoded) mode.
In Python 2 this works, because of the automatic conversion between
byte and unicode strings.
In Python 3 this fails though, as no automatic conversion is attempted.
This change makes the scripts compatible with both versions of Python.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Python 3 doesn't call objects __cmp__() methods any more to compare
them. Instead, it requires implementing the rich comparison methods
explicitly: __eq__(), __ne(), __lt__(), __le__(), __gt__() and __ge__().
Fortunately Python 2 also supports those.
This commit only implements the comparison methods which are actually
used by the build scripts.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
In Python 2, divisions of integers return an integer:
>>> 32 / 4
8
In Python 3 though, they return floats:
>>> 32 / 4
8.0
However, Python 3 has an explicit integer division operator:
>>> 32 // 4
8
That operator exists on Python >= 2.2, so let's use it everywhere to
make the scripts compatible with both Python 2 and 3.
In addition, using __future__.division tells Python 2 to behave the same
way as Python 3, which helps ensure the scripts produce the same output
in both versions of Python.
Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2)
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
A follow-up patch enables EGL_KHR_mutable_render_buffer for Android.
This patch is separate from the Android patch because I think it's
easier to review the platform-independent bits separately.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
If set, then the config will have __DRI_ATTRIB_MUTABLE_RENDER_BUFFER,
which translates to EGL_MUTABLE_RENDER_BUFFER_BIT_KHR.
Not used yet.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Define extensions DRI_MutableRenderBufferDriver and
DRI_MutableRenderBufferLoader. These are the two halves for
EGL_KHR_mutable_render_buffer.
Outside the DRI code there is one additional change. Add
gl_config::mutableRenderBuffer to match
__DRI_ATTRIB_MUTABLE_RENDER_BUFFER. Neither are used yet.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This pulls an 'else' block into the function's main body, making the
code easier to follow.
Without this change, the upcoming EGL_KHR_mutable_render_buffer patch
transforms dri2_make_current() into spaghetti.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
There exist *two* queryable EGL_RENDER_BUFFER states in EGL:
eglQuerySurface(EGL_RENDER_BUFFER) and
eglQueryContext(EGL_RENDER_BUFFER).
These changes eliminate potentially very fragile code in the upcoming
EGL_KHR_mutable_render_buffer implementation.
* eglQuerySurface(EGL_RENDER_BUFFER)
The implementation of eglQuerySurface(EGL_RENDER_BUFFER) contained
abstruse logic which required comprehending the specification
complexities of how the two EGL_RENDER_BUFFER states interact. The
function sometimes returned _EGLContext::WindowRenderBuffer, sometimes
_EGLSurface::RenderBuffer. Why? The function tried to encode the
actual logic from the EGL spec. When did the function return which
variable? Go study the EGL spec, hope you understand it, then hope
Mesa mutated the EGL_RENDER_BUFFER state in all the correct places.
Have fun.
To simplify eglQuerySurface(EGL_RENDER_BUFFER), and to improve
confidence in its correctness, flatten its indirect logic. For pixmap
and pbuffer surfaces, simply return a hard-coded literal value, as the
spec suggests. For window surfaces, simply return
_EGLSurface::RequestedRenderBuffer. Nothing difficult here.
* eglQueryContext(EGL_RENDER_BUFFER)
The implementation of this suffered from the same issues as
eglQuerySurface, and the solution is the same. confidence in its
correctness, flatten its indirect logic. For pixmap and pbuffer
surfaces, simply return a hard-coded literal value, as the spec
suggests. For window surfaces, simply return
_EGLSurface::ActiveRenderBuffer.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
As the spec says:
EGL_BAD_NATIVE_PIXMAP is generated if the implementation
does not support native pixmaps.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Currently dri2_copy_buffers is used for swrast, which depends on the
DRI2_FLUSH extension. Since that's not a thing on software based
drivers we crash out.
Do the slightly more graceful, thing of returning EGL_FALSE.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
There's little point in calling _eglGetNativePlatform() in
eglCopyBuffers. The platform returned should be identical to the one
already stored in our _EGLDisplay.
In the following corner case, the check is incorrect.
The function _eglGetNativePlatform effectively invokes the old-style
eglGetDisplay platform selection. Thus if the EGL_PLATFORM platform does
not match with the EGL_EXT_platform_* used to create the display we'll
error out.
Addresses the egl-copy-buffers piglit test.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
With version v1.15 the "code" option was deprecated in favour of
"private-code" or "public-code".
Before the interface symbol generated was exported (which is a bad idea
since it's internal implementation detail) and others may misuse it.
That was the case with libva approx. 1 year ago. Since then libva was
fixed, so we can finally hide it by using "private-code"
Inspired by similar xserver patch by Adam Jackson.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
With version v1.15 the "code" option was deprecated in favour of
"private-code" or "public-code".
Before the interface symbol generated was exported (which is a bad idea
since it's internal implementation detail) and others may misuse it.
That was the case with libva approx. 1 year ago. Since then libva was
fixed, so we can finally hide it by using "private-code"
Inspired by similar xserver patch by Adam Jackson.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
This matches what this field is called in virglrenderer's copy of
this.
This reduces the diff between the two different versions of
virgl_hw.h, and should make it easier to upgrade the file in
the future.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Found when debugging register spilling -- we would try to spill the dest
of a STVPMV, inserting spill code after entering the last segment. In
fact, we were likely to to choose to do this, given that the STVPMV "dest"
temp was never read from, making it cheap to spill.
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
The simulator complained that we had write responses outstanding at shader
end. It seems that a TMU read does not guarantee that previous TMU writes
by the thread have completed, which surprised me.
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
This is useful for periodically testing out register spilling to see how
it goes on simple shaders, rather than only failing on insanely
complicated ones.
In commit cf54bd5e8, dri_sw_winsys.c began using <sys/shm.h> to support
the new functions putImageShm, getImageShm in DRI_SWRastLoader. But
Android began supporting System V shared memory only in Oreo. Nougat has
no shm headers.
Fix the build by ifdef'ing out the shm code on Nougat.
Fixes: cf54bd5e8 "drisw: use shared memory when possible"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@gmail.com>
os.path.exists doesn't return True for stale symlinks, but they are in
the way later, when a link/file with the same name is to be created.
For instance it is conceivable that the pointed to file is replaced by
a file with a new name, and then the symlink is dead.
To handle this check specifically for all existing symlinks to be
removed. (This bugged me for some time with a link libXvMCr600.so
always being in the way of installing this file)
v2: use only os.lexist and replace all instances of os.exist (Dylan Baker)
v3: handle directory check correctly (Eric Engestrom)
Fixes: f7f1b30f81
("meson: extend install_megadrivers script to handle symmlinking")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>(v2 minus dir check)
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
This change helps with some of the dEQP-VK.wsi.android.* tests that
try to create swapchain with using such formats.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
We already guarded all OP_SULDP against out of bound accesses, but we
ended up just reusing whatever value was stored in the dest registers.
Fixes CTS test shader_image_load_store.incomplete_textures
v2: fix for loads not ending up with predicates (bindless_texture)
v3: fix replacing the def
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
mitigates hurt shaders after adding sqrt:
total instructions in shared programs : 5456166 -> 5454825 (-0.02%)
total gprs used in shared programs : 647522 -> 647551 (0.00%)
total shared used in shared programs : 389120 -> 389120 (0.00%)
total local used in shared programs : 21064 -> 21064 (0.00%)
total bytes used in shared programs : 58288696 -> 58274448 (-0.02%)
local shared gpr inst bytes
helped 0 0 0 516 516
hurt 0 0 27 2 2
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
./GpuTest /test=pixmark_piano 1024x640 30sec:
301 -> 327 points
shader-db:
total instructions in shared programs : 5472103 -> 5456166 (-0.29%)
total gprs used in shared programs : 647530 -> 647522 (-0.00%)
total shared used in shared programs : 389120 -> 389120 (0.00%)
total local used in shared programs : 21064 -> 21064 (0.00%)
total bytes used in shared programs : 58459304 -> 58288696 (-0.29%)
local shared gpr inst bytes
helped 0 0 27 8281 8281
hurt 0 0 21 431 431
v2: use NVISA_GM200_CHIPSET
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Since we don't support streaming an aub file, we can drop the decoding
status enum.
v2: include stdbool (Eric)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Up to now we've been lucky that the buffer returned was always exactly
at the address we requested.
Fixes: 144b40db54 ("intel: aubinator: drop the 1Tb GTT mapping")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs: 14276892 -> 14276886 (<.01%)
instructions in affected programs: 484 -> 478 (-1.24%)
helped: 2
HURT: 0
total cycles in shared programs: 532578397 -> 532578395 (<.01%)
cycles in affected programs: 3522 -> 3520 (-0.06%)
helped: 1
HURT: 0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
All Gen platforms had pretty similar results. (Skylake shown)
total instructions in shared programs: 14277220 -> 14277216 (<.01%)
instructions in affected programs: 422 -> 418 (-0.95%)
helped: 2
HURT: 0
total cycles in shared programs: 532577908 -> 532577848 (<.01%)
cycles in affected programs: 2800 -> 2740 (-2.14%)
helped: 2
HURT: 0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Unlike the much older -abs(a) >= 0.0 transformation, this is not
precise. The behavior changes if the source is NaN.
No shader-db changes on any platform.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
A problem was reported with arm,arm64 targets build due to missing
libLLVM shared library dependency with AOSP; to avoid this issue vulkan.radv
is built conditionally only when radeonsi is in BOARD_GPU_DRIVERS
Fixes: 0ca153f869 ("android: radv: enable build of vulkan.radv HAL module")
Reported-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
d3d10 requires NaNs to get converted to 0 for float->unorm conversions
(and float->int etc.). GL spec probably doesn't care in general, but it
would make sense to have reasonable behavior in any case imho - the
old code was converting negative NaNs to 0, and positive NaNs to 255.
(Note that using float comparison isn't actually all that much more
effort in any case, at least with sse2 it's just float comparison
(ucommiss) instead of int one - I converted the second comparison
to float too simply because it saves the probably somewhat expensive
transfer of the float from simd to int domain (with sse2 via stack),
so the generated code actually has 2 less instructions, although float
comparisons are more expensive than int ones.)
Reviewed-by: Brian Paul <brianp@vmware.com>
This is a forward port of a patch from the AOSP/master tree:
bd633f11de%5E%21/
Which replaces HOST_OUT_release with HOST_OUT
As per Dan's explanation, the current code was incorrect to use
$(HOST_OUT_release) as $(HOST_OUT) will be set properly for
whether the current build that's being cleaned during
incrementals is using host debug or release builds.
Additionally Dan noted it was incredibly uncommon to use a debug
host build, as there was never a shortcut and one had to set an
environment variable manually. Thus it was rarely if ever tested.
Change-Id: I7972c0a50fa3520dcfa962d6dd7e602bfe22368d
Cc: Rob Herring <rob.herring@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Marissa Wall <marissaw@google.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
This is a partial cherry-pick from AOSP's mesa3d tree:
a88dcf769e%5E%21/
"We're deprecating make implicit rules, preferring static pattern
rules, or just regular rules."
Without this patch, the freedesktop/master branch won't build in
the AOSP environment, and this patch corrects that, as tested
on the Dragonboard 820c.
The i965 portion of the patch this is based on collided badly,
and I'm not sure how to best forward port it. However, so far
we don't see build issues without that portion.
Comments or feedback would be appreciated!
Change-Id: Id6dfd0d018cbd665fa19d80c14abd5f75fa10b8a
Cc: Rob Herring <rob.herring@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Marissa Wall <marissaw@google.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Previously, I added a switch case for GL 2.1 (ed7a0770b881791dd697f3).
I don't know of any driver which only supports GL 2.0, but adding
this switch case avoids a failure if the app queries
GL_SHADING_LANGUAGE_VERSION.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This leaves us with a series of little anv_pipeline_compile_* functions
which each take a compiler object, a mem_ctx, the stage to compile, and
the previous stage for VUE linking purposes. Some of them do
interesting things but most are little more than wrappers around
brw_compile_*.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This breaks compilation up a bit into "link" and "compile". In the
"link" stage, new anv_pipeline_link_* helpers are called which are
responsible for setting up the binding table and doing anything needed
to properly link with the next stage in the pipeline if one exists.
They are called in reverse order starting with the fragment shader so
you can assume linking in later stages is already done.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We can set active_stages much more directly and then it's just candy
around setting pipeline->stages[stage].
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Instead of hashing each stage separately (and TES and TCS together), we
hash the entire pipeline. This means we'll get fewer cache hits if
they, for instance, re-use the same VS over and over again but it also
means we can now safely do cross-stage optimizations.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Instead of having each anv_pipeline_compile_* function populate the
shader key, make it part of the anv_pipeline_stage struct and fill it
out up-front.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
With a sufficently recent meson, the following warning is produced:
WARNING: Passed invalid keyword argument "extra_args".
WARNING: This will become a hard error in the future.
It seems that compiler.links(args:) is meant here.
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-and-Tested-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Zeroing memory after calloc is not necessary. This also allows to avoid
possible crash when allocation fails, because memset is called before
checking screen for NULL.
Fixes: a29d63ecf7 "swr: refactor swr_create_screen to allow
for proper cleanup on error"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-08-02 11:13:40 +01:00
2796 changed files with 383557 additions and 116098 deletions
# Appveyor defaults core.autocrlf to input instead of the default (true), but
# that can hide problems processing CRLF text on Windows
- git config --global core.autocrlf true
environment:
WINFLEXBISON_ARCHIVE:win_flex_bison-2.5.9.zip
LLVM_ARCHIVE:llvm-5.0.1-msvc2015-mtd.7z
WINFLEXBISON_VERSION:2.5.15
LLVM_ARCHIVE:llvm-5.0.1-msvc2017-mtd.7z
install:
# Check git config
- git config core.autocrlf
# Check pip
- python --version
- python -m pip --version
# Install Mako
- python -m pip install Mako==1.0.6
- python -m pip install Mako==1.0.7
# Install pywin32 extensions, needed by SCons
- python -m pip install pypiwin32
# Install python wheels, necessary to install SCons via pip
- python -m pip install wheel
# Install SCons
- python -m pip install scons==2.5.1
- python -m pip install scons==3.0.1
- scons --version
# Install flex/bison
- if not exist "%WINFLEXBISON_ARCHIVE%" appveyor DownloadFile "https://downloads.sourceforge.net/project/winflexbison/old_versions/%WINFLEXBISON_ARCHIVE%"
- set WINFLEXBISON_ARCHIVE=win_flex_bison-%WINFLEXBISON_VERSION%.zip
- if not exist "%WINFLEXBISON_ARCHIVE%" appveyor DownloadFile "https://github.com/lexxmark/winflexbison/releases/download/v%WINFLEXBISON_VERSION%/%WINFLEXBISON_ARCHIVE%"
- 7z x -y -owinflexbison\ "%WINFLEXBISON_ARCHIVE%" > nul
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=13728">Bug 13728</a> - [G965] Some objects in Neverwinter Nights Linux version not displayed correctly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107117">Bug 107117</a> - mesa-18.1: regression with TFP on intel with modesettings and glamor acceleration</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107212">Bug 107212</a> - Dual-Core CPU E5500 / G45: RetroArch with reicast core results in corrupted graphics</li>
</ul>
<h2>Changes</h2>
<p>Adam Jackson (1):</p>
<ul>
<li>glx: GLX_MESA_multithread_makecurrent is direct-only</li>
</ul>
<p>Andres Gomez (3):</p>
<ul>
<li>ddebug: use util_snprintf() in dd_get_debug_filename_and_mkdir</li>
<li>gallium/aux/util: use util_snprintf() in test_texture_barrier</li>
<li>glsl: use util_snprintf()</li>
</ul>
<p>Christian Gmeiner (1):</p>
<ul>
<li>etnaviv: fix typo in query names</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>r600: reduce num compute threads to 1024.</li>
</ul>
<p>Dylan Baker (6):</p>
<ul>
<li>docs: Add sha-256 sums for 18.1.5</li>
<li>nir/meson: fix c vs cpp args for nir test</li>
<li>gallium: fix ddebug on windows</li>
<li>cherry-ignore: add patches that get-pick-list is finding in error</li>
<li>cherry-ignore: Add some additional patches that are for 18.2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101247">Bug 101247</a> - Mesa fails to link GLSL programs with unused output blocks</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104809">Bug 104809</a> - anv: DOOM 2016 and Wolfenstein II:The New Colossus crash due to not having depthBoundsTest</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105904">Bug 105904</a> - Needed to delete mesa shader cache after driver upgrade for 32 bit wine vulkan programs to work.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106738">Bug 106738</a> - No test for miptrees with DRI modifiers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107359">Bug 107359</a> - [Regression] [bisected] [OpenGL CTS] [SKL,BDW] KHR-GL46.texture_barrier*-texels, GTF-GL46.gtf21.GL2FixedTests.buffer_corners.buffer_corners, and GTF-GL46.gtf21.GL2FixedTests.stencil_plane_corners.stencil_plane_corners fail with some configuration</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107477">Bug 107477</a> - [DXVK] Setting high shader quality in GTA V results in LLVM error</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107579">Bug 107579</a> - [SNB] The graphic corruption when we reuse the GS compiled and used for TFB when statebuffer contain magic trash in the unused space</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107601">Bug 107601</a> - Rise of the Tomb Raider Segmentation Fault when the game starts</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107760">Bug 107760</a> - GPU Hang when Playing DiRT 3 Complete Edition using Steam Play with DXVK</li>
</ul>
<h2>Changes</h2>
<p>Andrii Simiklit (1):</p>
<ul>
<li>i965/gen6/xfb: handle case where transform feedback is not active</li>
</ul>
<p>Bas Nieuwenhuizen (3):</p>
<ul>
<li>radv: Add missing checks in radv_get_image_format_properties.</li>
<li>radv: Fix CMASK dimensions.</li>
<li>radv: Use a lower max offchip buffer count.</li>
</ul>
<p>Christian Gmeiner (1):</p>
<ul>
<li>tegra: fix memory leak</li>
</ul>
<p>Daniel Stone (1):</p>
<ul>
<li>st/dri: Don't expose sRGB formats to clients</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>ac/radeonsi: fix CIK copy max size</li>
</ul>
<p>Dylan Baker (10):</p>
<ul>
<li>docs: Add mesa 18.1.7 notes</li>
<li>cherry-ignore: add a patch</li>
<li>cherry-ignore: Add more 18.2 only patches</li>
<li>meson: Actually load translation files</li>
<li>cherry-ignore: Add more 18.2 patches</li>
<li>cherry-ignore: Add additional patch</li>
<li>cherry-ignore: Add patch that doesn't apply to 18.1</li>
<li>cherry-ignore: Add a couple of two fixes warning patches</li>
<li>cherry-ignore: Add patch that needs more significant patches to function</li>
<li>Bump version to 18.1.8</li>
</ul>
<p>Emil Velikov (1):</p>
<ul>
<li>docs: update required mako version</li>
</ul>
<p>Grazvydas Ignotas (1):</p>
<ul>
<li>radv: place pointer length into cache uuid</li>
</ul>
<p>Gurchetan Singh (2):</p>
<ul>
<li>meson: fix egl build for surfaceless</li>
<li>meson: fix egl build for android</li>
</ul>
<p>Ian Romanick (2):</p>
<ul>
<li>i965/vec4: Clamp indirect tes input array reads with 0x0fffffff</li>
<li>i965/vec4: Correctly handle uniform sources in generate_tes_add_indirect_urb_offset</li>
</ul>
<p>Jason Ekstrand (5):</p>
<ul>
<li>anv: Fill holes in the VF VUE to zero</li>
<li>nir/algebraic: Be more careful converting ushr to extract_u8/16</li>
<li>egl/dri2: Add a helper for the number of planes for a FOURCC format</li>
<li>egl/dri2: Guard against invalid fourcc formats</li>
<li>anv/blorp: Do more flushing around HiZ clears</li>
</ul>
<p>Juan A. Suarez Romero (1):</p>
<ul>
<li>egl/wayland: do not leak wl_buffer when it is locked</li>
</ul>
<p>Lionel Landwerlin (1):</p>
<ul>
<li>anv: blorp: support multiple aspect blits</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>glapi: actually implement GL_EXT_robustness for GLES</li>
</ul>
<p>Nanley Chery (7):</p>
<ul>
<li>intel/isl: Avoid tiling some 16K-wide render targets</li>
<li>i965: Make blt_pitch public</li>
<li>i965/miptree: Drop an if case from retile_as_linear</li>
<li>i965/miptree: Use the correct BLT pitch</li>
<li>i965/miptree: Use miptree_map in map_blit functions</li>
<li>i965/miptree: Fix can_blit_slice()</li>
<li>i965/gen7_urb: Re-emit PUSH_CONSTANT_ALLOC on some gen9</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>radv: fix passing clip/cull distances from VS to PS</li>
</ul>
<p>vadym.shovkoplias (1):</p>
<ul>
<li>glsl/linker: Allow unused in blocks which are not declated on previous stage</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103241">Bug 103241</a> - Anv crashes when using 64-bit vertex inputs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104926">Bug 104926</a> - swrast: Mesa 17.3.3 produces: HW cursor for format 875713089 not supported</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107280">Bug 107280</a> - [DXVK] Batman: Arkham City with tessellation enabled hangs on SKL GT4</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107772">Bug 107772</a> - Mesa preprocessor matches if(def)s & endifs incorrectly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107779">Bug 107779</a> - Access violation with some games</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107810">Bug 107810</a> - The 'va_end' call is missed after 'va_copy' in 'util_vsnprintf' function under windows</li>
</ul>
<h2>Changes</h2>
<p>Andrii Simiklit (4):</p>
<ul>
<li>apple/glx/log: added missing va_end() after va_copy()</li>
<li>mesa/util: don't use the same 'va_list' instance twice</li>
<li>mesa/util: don't ignore NULL returned from 'malloc'</li>
<li>mesa/util: add missing va_end() after va_copy()</li>
</ul>
<p>Bas Nieuwenhuizen (4):</p>
<ul>
<li>radv: Use build ID if available for cache UUID.</li>
<li>radv: Only allow 16 user SGPRs for compute on GFX9+.</li>
<li>radv: Set the user SGPR MSB for Vega.</li>
<li>radv: Fix driver UUID SHA1 init.</li>
</ul>
<p>Christopher Egert (1):</p>
<ul>
<li>radeon: fix ColorMask</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>virgl: don't send a shader create with no data. (v2)</li>
</ul>
<p>Dylan Baker (10):</p>
<ul>
<li>docs/relnotes: Add sha256 sums for mesa 18.1.8</li>
<li>cherry-ignore: Add additional 18.2 patch</li>
<li>meson: Print a message about why a libdrm version was selected</li>
<li>cherry-ignore: add another 18.2 patch</li>
<li>cherry-ignore: Add patches that don't apply cleanly and are for developer tools</li>
<li>cherry-ignore: Add more 18.2 patches</li>
<li>cherry-ignore: add 18.2 patchs</li>
<li>cherry-ignore: add a patch that was reverted on master</li>
<li>cherry-ignore: one final update</li>
<li>Bump version to 18.1.9</li>
</ul>
<p>Erik Faye-Lund (2):</p>
<ul>
<li>winsys/virgl: avoid unintended behavior</li>
<li>virgl: adjust strides when mapping temp-resources</li>
</ul>
<p>Gert Wollny (1):</p>
<ul>
<li>winsys/virgl: correct resource and handle allocation (v2)</li>
</ul>
<p>Jason Ekstrand (6):</p>
<ul>
<li>anv/pipeline: Only consider double elements which actually exist</li>
<li>i965: Workaround the gen9 hw astc5x5 sampler bug</li>
<li>anv: Re-emit vertex buffers when the pipeline changes</li>
<li>anv: Disable the vertex cache when tessellating on SKL GT4</li>
<li>anv: Clamp scissors to the framebuffer boundary</li>
<li>anv/query: Write both dwords in emit_zero_queries</li>
</ul>
<p>Josh Pieper (1):</p>
<ul>
<li>st/mesa: Validate the result of pipe_transfer_map in make_texture (v2)</li>
</ul>
<p>Kenneth Feng (1):</p>
<ul>
<li>amd: Add Picasso device id</li>
</ul>
<p>Marek Olšák (4):</p>
<ul>
<li>st/mesa: help fix stencil border color for GL_DEPTH_STENCIL textures</li>
<li>radeonsi: fix HTILE for NPOT textures with mipmapping on SI/CI</li>
<li>r600: fix HTILE for NPOT textures with mipmapping</li>
<li>radeonsi: fix printing a BO list into ddebug reports</li>
</ul>
<p>Mathias Fröhlich (1):</p>
<ul>
<li>tnl: Fix green gun regression in xonotic.</li>
</ul>
<p>Mauro Rossi (3):</p>
<ul>
<li>android: broadcom/genxml: fix collision with intel/genxml header-gen macro</li>
<li>android: broadcom/cle: add gallium include path</li>
<li>android: broadcom/cle: export the broadcom top level path headers</li>
</ul>
<p>Michal Srb (1):</p>
<ul>
<li>st/dri: don't set queryDmaBufFormats/queryDmaBufModifiers if the driver does not implement it</li>
</ul>
<p>Michel Dänzer (1):</p>
<ul>
<li>loader/dri3: Only wait for back buffer fences in dri3_get_buffer</li>
@@ -59,9 +60,217 @@ Note: some of the new features are only available with certain drivers.
<li>GL_ARB_sample_locations and GL_NV_sample_locations on nvc0 (GM200+)</li>
<li>GL_ANDROID_extension_pack_es31a on radeonsi.</li>
<li>GL_KHR_texture_compression_astc_ldr on radeonsi</li>
<li>GL_NV_conservative_raster and GL_NV_conservative_raster_dilate on nvc0 (GM200+)</li>
<li>GL_NV_conservative_raster_pre_snap_triangles on nvc0 (GP102+)</li>
<li>multisampled images on nvc0 (GM107+) (now supported on GF100+)</li>
</ul>
<h2>Bug fixes</h2>
<ul>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=13728">Bug 13728</a> - [G965] Some objects in Neverwinter Nights Linux version not displayed correctly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=61761">Bug 61761</a> - glPolygonOffsetEXT, OFFSET_BIAS incorrectly set to a huge number</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=65422">Bug 65422</a> - Rename api_validate.[ch] to draw_validate.[ch]</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78097">Bug 78097</a> - glUniform1ui and friends not supported by display lists</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99116">Bug 99116</a> - Wine DirectDraw programs showing only a blackscreen when using Mesa Gallium drivers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99730">Bug 99730</a> - Metro Redux game(s) needs override for midshader extension declaration</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100177">Bug 100177</a> - [GM206] Misrendering in XCOM Ennemy Within</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=100430">Bug 100430</a> - [radv] graphical glitches on dolphin emulator</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101247">Bug 101247</a> - Mesa fails to link GLSL programs with unused output blocks</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102678">Bug 102678</a> - gl_BaseVertex should always be zero when the draw command has no <basevertex> parameter</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103274">Bug 103274</a> - BRW allocates too much heap memory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104388">Bug 104388</a> - [snb] GPU HANG: ecode 6:0:0x85fffff8 in fgfs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104809">Bug 104809</a> - anv: DOOM 2016 and Wolfenstein II:The New Colossus crash due to not having depthBoundsTest</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105351">Bug 105351</a> - [Gen6+] piglit's arb_shader_image_load_store-host-mem-barrier fails with a glGetTexSubImage fallback path</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105374">Bug 105374</a> - texture3d, a SaschaWillems demo, assert fails</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105396">Bug 105396</a> - tc compatible htile sets depth of htiles of discarded fragments to 1.0</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105399">Bug 105399</a> - [snb] GPU hang: after geometry shader emits no geometry, the program hangs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105497">Bug 105497</a> - shader-db crashes on 72 core system after ast_type_qualifier bitset change</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105613">Bug 105613</a> - Compute shader locks up within nested "for" loop</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105731">Bug 105731</a> - linker error "fragment shader input ... has no matching output in the previous stage" when previous stage's output declaration in a separate shader object</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105904">Bug 105904</a> - Needed to delete mesa shader cache after driver upgrade for 32 bit wine vulkan programs to work.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106133">Bug 106133</a> - make check "OSError: [Errno 24] Too many open files"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106163">Bug 106163</a> - r600/sb: optimizer tries to schedule access to different array elements in one instruction group</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106174">Bug 106174</a> - vulkan dota2 broken (segfaulting), found bug commit</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106180">Bug 106180</a> - [bisected] radv vulkan smoke test black screen (Add support for DRI3 v1.2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106232">Bug 106232</a> - LLVM unit tests have error in random number handling</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106243">Bug 106243</a> - [kbl] GPU HANG: 9:0:0x85dffffb, in Cinnamon</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106315">Bug 106315</a> - The witness + dxvk suffers flickering garbage</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106331">Bug 106331</a> - radv doesnt support VK_FORMAT_R32G32B32_SFLOAT</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106499">Bug 106499</a> - [regression, bisected] Several games crash on start</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106504">Bug 106504</a> - vulkan SPIR-V parsing failed at ../src/compiler/spirv/vtn_cfg.c:381</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106511">Bug 106511</a> - radv: MSAA broken on SI (assertion failure in vkCreateImage)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106587">Bug 106587</a> - Dota2 is very dark when using vulkan render on a Intel << AMD prime setup</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106594">Bug 106594</a> - [regression,apitrace,bisected] Prison Architect rendered unplayable by multicoloured flickering triangles and overlayed triangles when performing certain actions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106642">Bug 106642</a> - X server crashes in i965 on desktop startup when DRI3 v1.2 / modifier support is enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106643">Bug 106643</a> - double free when exporting a temporarily imported semaphore</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106673">Bug 106673</a> - [bisected] Steam is unusable since commit 5c33e8c7</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106687">Bug 106687</a> - radv: Fast color clears use incorrect format</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106708">Bug 106708</a> - [SKL/KBL/GLK] 2-3% performance drop in SynMark DrvState and 5-9% drop on SynMark Multithread</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106748">Bug 106748</a> - st/mesa: use PIPE_CAP_GLSL_FEATURE_LEVEL_COMPATIBILITY broke qemu -display sdl,gl=on</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106756">Bug 106756</a> - Wine 3.9 crashes with DXVK on Just Cause 3 and Quantum Break on VEGA but works ON POLARIS</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106774">Bug 106774</a> - GLSL IR copy propagates loads of SSBOs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106776">Bug 106776</a> - vma_random unrecognized command line option "-std=c++11"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106778">Bug 106778</a> - Files missing from tarball - intel_sanitize_gpu.*</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106779">Bug 106779</a> - Files missing from tarball - u_debug_stack_android.cpp</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106784">Bug 106784</a> - 18.1.1 autotools build fail without mako</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106801">Bug 106801</a> - vma_random_test.cpp:239:18: error: non-constant-expression cannot be narrowed from type 'unsigned long' to 'uint_fast32_t' (aka 'unsigned int') in initializer list [-Wc++11-narrowing]</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106810">Bug 106810</a> - ProgramBinary does not switch program correctly when using transform feedback</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106823">Bug 106823</a> - Failed to recongnize keyword of shader code</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106830">Bug 106830</a> - [bisected] 32 bit tests (deqp, piglit, glcts, vulkancts) crashing on all platforms</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106861">Bug 106861</a> - fatal error: wayland-egl-backend.h: No such file or directory compilation terminated.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106903">Bug 106903</a> - radv: Fragment shader output goes to wrong attachments when render targets are sparse</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106906">Bug 106906</a> - Failed to recongnize keyword “sampler2DRect” and "sampler2DRectShadow"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106907">Bug 106907</a> - Correct Transform Feedback Varyings information is expected after using ProgramBinary</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106928">Bug 106928</a> - When starting a match Rocket League crashes on "Go"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106941">Bug 106941</a> - Intel ANV vulkan driver exposing version 1.1.0 which is incorrect</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106986">Bug 106986</a> - glGetQueryiv error when querying number of result bits for GL_ANY_SAMPLES_PASSED_CONSERVATIVE</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106997">Bug 106997</a> - [Regression]. Dying light game is crashing on latest mesa</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107098">Bug 107098</a> - Segfault after munmap(kms_sw_dt->ro_mapped)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107117">Bug 107117</a> - mesa-18.1: regression with TFP on intel with modesettings and glamor acceleration</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107190">Bug 107190</a> - Got seg fault on snb when use INTEL_DEBUG=bat</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107212">Bug 107212</a> - Dual-Core CPU E5500 / G45: RetroArch with reicast core results in corrupted graphics</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107223">Bug 107223</a> - [GEN9+] 50% perf drop in SynMark Fill* tests (E2E RBC gets disabled?)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107248">Bug 107248</a> - [G45 ILK G965] Texture handling broken</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107275">Bug 107275</a> - NIR segfaults after spirv-opt</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107276">Bug 107276</a> - radv: OpBitfieldUExtract returns incorrect result when count is zero</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107295">Bug 107295</a> - Access violation on glDrawArrays with count >= 2048</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107305">Bug 107305</a> - glsl/opt_copy_propagation_elements.cpp:72:9: error: delegating constructors are permitted only in C++11</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107312">Bug 107312</a> - Mesa-git RPM build fails after commit 8cacf38f527d42e41441ef8c25d95d4b2f4e8602</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107359">Bug 107359</a> - [Regression] [bisected] [OpenGL CTS] [SKL,BDW] KHR-GL46.texture_barrier*-texels, GTF-GL46.gtf21.GL2FixedTests.buffer_corners.buffer_corners, and GTF-GL46.gtf21.GL2FixedTests.stencil_plane_corners.stencil_plane_corners fail with some configuration</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107366">Bug 107366</a> - NIR verification crashes on piglit tests</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107443">Bug 107443</a> - Build error on arm64: v3d_decoder.c:837:17: error: format not a string literal and no format arguments [-Werror=format-security]</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107460">Bug 107460</a> - radv: OpControlBarrier does not always work correctly (bisected)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107477">Bug 107477</a> - [DXVK] Setting high shader quality in GTA V results in LLVM error</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107510">Bug 107510</a> - [GEN8+] up to 10% perf drop on several 3D benchmarks</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107544">Bug 107544</a> - intel/decoder: out of bounds group_iter</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107550">Bug 107550</a> - "0[2]" as function parameter hits assert</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107579">Bug 107579</a> - [SNB] The graphic corruption when we reuse the GS compiled and used for TFB when statebuffer contain magic trash in the unused space</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107601">Bug 107601</a> - Rise of the Tomb Raider Segmentation Fault when the game starts</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107610">Bug 107610</a> - Dolphin emulator mis-renders shadow overlay in Super Mario Sunshine</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103241">Bug 103241</a> - Anv crashes when using 64-bit vertex inputs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107280">Bug 107280</a> - [DXVK] Batman: Arkham City with tessellation enabled hangs on SKL GT4</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107772">Bug 107772</a> - Mesa preprocessor matches if(def)s & endifs incorrectly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107779">Bug 107779</a> - Access violation with some games</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107810">Bug 107810</a> - The 'va_end' call is missed after 'va_copy' in 'util_vsnprintf' function under windows</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107832">Bug 107832</a> - Gallium picking A16L16 formats when emulating INTENSITY16 conflicts with mesa</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107843">Bug 107843</a> - 32bit Mesa build failes with meson.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107879">Bug 107879</a> - crash happens when link program</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107891">Bug 107891</a> - [wine, regression, bisected] RAGE, Wolfenstein The New Order hangs in menu</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (3):</p>
<ul>
<li>docs: add sha256 checksums for 18.2.0</li>
<li>Revert "Revert "glsl: skip stringification in preprocessor if in unreachable branch""</li>
<li>cherry-ignore: i965/tools: 32bit compilation with meson</li>
</ul>
<p>Andrii Simiklit (4):</p>
<ul>
<li>apple/glx/log: added missing va_end() after va_copy()</li>
<li>mesa/util: don't use the same 'va_list' instance twice</li>
<li>mesa/util: don't ignore NULL returned from 'malloc'</li>
<li>mesa/util: add missing va_end() after va_copy()</li>
</ul>
<p>Bas Nieuwenhuizen (5):</p>
<ul>
<li>radv: Support v3 of VK_EXT_vertex_attribute_divisor.</li>
<li>radv: Set the user SGPR MSB for Vega.</li>
<li>radv: Only allow 16 user SGPRs for compute on GFX9+.</li>
<li>radv: Use build ID if available for cache UUID.</li>
<li>radv: Fix driver UUID SHA1 init.</li>
</ul>
<p>Christopher Egert (1):</p>
<ul>
<li>radeon: fix ColorMask</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>virgl: don't send a shader create with no data. (v2)</li>
</ul>
<p>Dylan Baker (1):</p>
<ul>
<li>meson: Print a message about why a libdrm version was selected</li>
</ul>
<p>Eric Anholt (2):</p>
<ul>
<li>v3d: Fix SRC_ALPHA_SATURATE blending for RTs without alpha.</li>
<li>v3d: Fix setup of the VCM cache size.</li>
</ul>
<p>Erik Faye-Lund (2):</p>
<ul>
<li>winsys/virgl: avoid unintended behavior</li>
<li>virgl: adjust strides when mapping temp-resources</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104602">Bug 104602</a> - [apitrace] Graphical artifacts in Civilization VI on RX Vega</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104926">Bug 104926</a> - swrast: Mesa 17.3.3 produces: HW cursor for format 875713089 not supported</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107276">Bug 107276</a> - radv: OpBitfieldUExtract returns incorrect result when count is zero</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107786">Bug 107786</a> - [DXVK] MSAA reflections are broken in GTA V</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108024">Bug 108024</a> - [Debian Stretch]Fail to build because "xcb_randr_lease_t"</li>
</ul>
<h2>Changes</h2>
<p>Alex Deucher (1):</p>
<ul>
<li>pci_ids: add new polaris pci id</li>
</ul>
<p>Andres Rodriguez (1):</p>
<ul>
<li>radv: only emit ZPASS_DONE for timestamp queries on gfx queues</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99507">Bug 99507</a> - Corrupted frame contents with Vulkan version of DOTA2, Talos Principle and Sascha Willems' demos when they're run Vsynched in fullscreen</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107857">Bug 107857</a> - GPU hang - GS_EMIT without shader outputs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107926">Bug 107926</a> - [anv] Rise of the Tomb Raider always misrendering, segfault and gpu hang.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108012">Bug 108012</a> - Compiler crashes on access of non-existent member incremental operations</li>
</ul>
<h2>Changes</h2>
<p>Boyuan Zhang (1):</p>
<ul>
<li>st/va: use provided sizes and coords for vlVaGetImage</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>anv: add missing unlock in error path.</li>
</ul>
<p>Dylan Baker (1):</p>
<ul>
<li>meson: Don't allow building EGL on Windows or MacOS</li>
</ul>
<p>Emil Velikov (5):</p>
<ul>
<li>st/nine: do not double-close the fd on teardown</li>
<li>egl: make eglSwapInterval a no-op for !window surfaces</li>
<li>egl: make eglSwapBuffers* a no-op for !window surfaces</li>
<li>vl/dri3: do full teardown on screen_destroy</li>
<li>Revert "mesa: remove unnecessary 'sort by year' for the GL extensions"</li>
</ul>
<p>Eric Engestrom (1):</p>
<ul>
<li>radv: add missing meson c++ visibility arguments</li>
</ul>
<p>Fritz Koenig (1):</p>
<ul>
<li>i965: Replace checks for rb->Name with FlipY (v2)</li>
</ul>
<p>Gert Wollny (1):</p>
<ul>
<li>virgl, vtest: Correct the transfer size calculation</li>
</ul>
<p>Ilia Mirkin (4):</p>
<ul>
<li>glsl: fix array assignments of a swizzled vector</li>
<li>nv50,nvc0: mark RGBX_UINT formats as renderable</li>
<li>nv50,nvc0: guard against zero-size blits</li>
<li>nvc0: fix blitting red to srgb8_alpha</li>
</ul>
<p>Jason Ekstrand (7):</p>
<ul>
<li>nir/cf: Remove phi sources if needed in nir_handle_add_jump</li>
<li>anv: Use separate MOCS settings for external BOs</li>
<li>intel/fs: Fix a typo in need_matching_subreg_offset</li>
<li>nir/from_ssa: Don't rewrite derefs destinations to registers</li>
<li>anv/batch_chain: Don't start a new BO just for BATCH_BUFFER_START</li>
<li>nir/alu_to_scalar: Use ssa_for_alu_src in hand-rolled expansions</li>
<li>intel: Don't propagate conditional modifiers if a UD source is negated</li>
</ul>
<p>Juan A. Suarez Romero (2):</p>
<ul>
<li>docs: add sha256 checksums for 18.2.2</li>
<li>Update version to 18.2.3</li>
</ul>
<p>Józef Kucia (1):</p>
<ul>
<li>radeonsi: avoid sending GS_EMIT in shaders without outputs</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>drirc: add a workaround for ARMA 3</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>radv: add a workaround for a VGT hang with prim restart and strips</li>
</ul>
<p>Tapani Pälli (1):</p>
<ul>
<li>glsl: do not attempt assignment if operand type not parsed correctly</li>
</ul>
<p>Timothy Arceri (11):</p>
<ul>
<li>glsl: ignore trailing whitespace when define redefined</li>
<li>util: disable cache if we have no build-id and timestamp is zero</li>
<li>util: rename timestamp param in disk_cache_create()</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105731">Bug 105731</a> - linker error "fragment shader input ... has no matching output in the previous stage" when previous stage's output declaration in a separate shader object</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107511">Bug 107511</a> - KHR/khrplatform.h not always installed when needed</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107626">Bug 107626</a> - [SNB] The graphical corruption and GPU hang occur sometimes on the piglit test "arb_texture_multisample-large-float-texture" with parameter --fp16</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107626">Bug 107626</a> - [SNB] The graphical corruption and GPU hang occur sometimes on the piglit test "arb_texture_multisample-large-float-texture" with parameter --fp16</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107856">Bug 107856</a> - i965 incorrectly calculates the number of layers for texture views (assert)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106577">Bug 106577</a> - broken rendering with nine and nouveau (GM107)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108245">Bug 108245</a> - RADV/Vega: Low mip levels of large BCn textures get corrupted by vkCmdCopyBufferToImage</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108311">Bug 108311</a> - Query buffer object support is broken on r600.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108894">Bug 108894</a> - [anv] vkCmdCopyBuffer() and vkCmdCopyQueryPoolResults() write-after-write hazard</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108909">Bug 108909</a> - Vkd3d test failure test_resolve_non_issued_query_data()</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108914">Bug 108914</a> - blocky shadow artifacts in The Forest with DXVK, RADV_DEBUG=nohiz fixes this</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108925">Bug 108925</a> - vkCmdCopyQueryPoolResults(VK_QUERY_RESULT_WAIT_BIT) for timestamps with large query count hangs</li>
</ul>
<h2>Changes</h2>
<p>Alex Smith (1):</p>
<ul>
<li>radv: Flush before vkCmdWriteTimestamp() if needed</li>
</ul>
<p>Bas Nieuwenhuizen (4):</p>
<ul>
<li>radv: Align large buffers to the fragment size.</li>
<li>radv: Clamp gfx9 image view extents to the allocated image extents.</li>
<li>radv/android: Mark android WSI image as shareable.</li>
<li>radv/android: Use buffer metadata to determine scanout compat.</li>
</ul>
<p>Dave Airlie (2):</p>
<ul>
<li>r600: make suballocator 256-bytes align</li>
<li>radv: use 3d shader for gfx9 copies if dst is 3d</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>egl/wayland: bail out when drmGetMagic fails</li>
<li>egl/wayland: plug memory leak in drm_handle_device()</li>
</ul>
<p>Eric Anholt (3):</p>
<ul>
<li>v3d: Fix a leak of the transfer helper on screen destroy.</li>
<li>vc4: Fix a leak of the transfer helper on screen destroy.</li>
<li>v3d: Fix a leak of the disassembled instruction string during debug dumps.</li>
</ul>
<p>Eric Engestrom (3):</p>
<ul>
<li>anv: correctly use vulkan 1.0 by default</li>
<li>wsi/display: fix mem leak when freeing swapchains</li>
<li>vulkan/wsi: fix s/,/;/ typo</li>
</ul>
<p>Gurchetan Singh (3):</p>
<ul>
<li>virgl: quadruple command buffer size</li>
<li>virgl: avoid large inline transfers</li>
<li>virgl: don't mark buffers as unclean after a write</li>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>GL_AMD_depth_clamp_separate on r600, radeonsi.</li>
<li>GL_AMD_framebuffer_multisample_advanced on radeonsi.</li>
<li>GL_AMD_gpu_shader_int64 on i965, nvc0, radeonsi.</li>
<li>GL_AMD_multi_draw_indirect on all GL 4.x drivers.</li>
<li>GL_AMD_query_buffer_object on i965, nvc0, r600, radeonsi.</li>
<li>GL_EXT_disjoint_timer_query on radeonsi and most other Gallium drivers (ES extension)</li>
<li>GL_EXT_texture_compression_s3tc on all drivers (ES extension)</li>
<li>GL_EXT_vertex_attrib_64bit on i965, nvc0, radeonsi.</li>
<li>GL_EXT_window_rectangles on radeonsi.</li>
<li>GL_KHR_texture_compression_astc_sliced_3d on radeonsi.</li>
<li>GL_NV_fragment_shader_interlock on i965.</li>
<li>EGL_EXT_device_base for all drivers.</li>
<li>EGL_EXT_device_drm for all drivers.</li>
<li>EGL_MESA_device_software for all drivers.</li>
</ul>
<h2>Bug fixes</h2>
<ul>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=13728">Bug 13728</a> - [G965] Some objects in Neverwinter Nights Linux version not displayed correctly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99507">Bug 99507</a> - Corrupted frame contents with Vulkan version of DOTA2, Talos Principle and Sascha Willems' demos when they're run Vsynched in fullscreen</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=99730">Bug 99730</a> - Metro Redux game(s) needs override for midshader extension declaration</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101247">Bug 101247</a> - Mesa fails to link GLSL programs with unused output blocks</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102597">Bug 102597</a> - [Regression] mpv, high rendering times (two to three times higher)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103241">Bug 103241</a> - Anv crashes when using 64-bit vertex inputs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104602">Bug 104602</a> - [apitrace] Graphical artifacts in Civilization VI on RX Vega</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104809">Bug 104809</a> - anv: DOOM 2016 and Wolfenstein II:The New Colossus crash due to not having depthBoundsTest</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104926">Bug 104926</a> - swrast: Mesa 17.3.3 produces: HW cursor for format 875713089 not supported</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105333">Bug 105333</a> - [gallium-nine] missing geometry after commit ac: replace ac_build_kill with ac_build_kill_if_false</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105731">Bug 105731</a> - linker error "fragment shader input ... has no matching output in the previous stage" when previous stage's output declaration in a separate shader object</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105904">Bug 105904</a> - Needed to delete mesa shader cache after driver upgrade for 32 bit wine vulkan programs to work.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106231">Bug 106231</a> - llvmpipe blends produce bad code after llvm patch https://reviews.llvm.org/D44785</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106283">Bug 106283</a> - Shader replacements works only for limited use cases</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106577">Bug 106577</a> - broken rendering with nine and nouveau (GM107)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106833">Bug 106833</a> - glLinkProgram is expected to fail when vertex attribute aliasing happens on ES3.0 context or later</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106980">Bug 106980</a> - Basemark GPU vulkan benchmark hangs on GFX9</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106997">Bug 106997</a> - [Regression]. Dying light game is crashing on latest mesa</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107088">Bug 107088</a> - [GEN8+] Hang when discarding a fragment if dual source blending is enabled but shader doesn't support it</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107098">Bug 107098</a> - Segfault after munmap(kms_sw_dt->ro_mapped)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107212">Bug 107212</a> - Dual-Core CPU E5500 / G45: RetroArch with reicast core results in corrupted graphics</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107223">Bug 107223</a> - [GEN9+] 50% perf drop in SynMark Fill* tests (E2E RBC gets disabled?)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107276">Bug 107276</a> - radv: OpBitfieldUExtract returns incorrect result when count is zero</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107280">Bug 107280</a> - [DXVK] Batman: Arkham City with tessellation enabled hangs on SKL GT4</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107313">Bug 107313</a> - Meson instructions on web site are non-optimal</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107359">Bug 107359</a> - [Regression] [bisected] [OpenGL CTS] [SKL,BDW] KHR-GL46.texture_barrier*-texels, GTF-GL46.gtf21.GL2FixedTests.buffer_corners.buffer_corners, and GTF-GL46.gtf21.GL2FixedTests.stencil_plane_corners.stencil_plane_corners fail with some configuration</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107460">Bug 107460</a> - radv: OpControlBarrier does not always work correctly (bisected)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107477">Bug 107477</a> - [DXVK] Setting high shader quality in GTA V results in LLVM error</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107579">Bug 107579</a> - [SNB] The graphic corruption when we reuse the GS compiled and used for TFB when statebuffer contain magic trash in the unused space</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107601">Bug 107601</a> - Rise of the Tomb Raider Segmentation Fault when the game starts</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107610">Bug 107610</a> - Dolphin emulator mis-renders shadow overlay in Super Mario Sunshine</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107626">Bug 107626</a> - [SNB] The graphical corruption and GPU hang occur sometimes on the piglit test "arb_texture_multisample-large-float-texture" with parameter --fp16</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107734">Bug 107734</a> - [GLSL] glsl-fface-invariant, glsl-fcoord-invariant and glsl-pcoord-invariant should fail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107760">Bug 107760</a> - GPU Hang when Playing DiRT 3 Complete Edition using Steam Play with DXVK</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107765">Bug 107765</a> - [regression] Batman Arkham City crashes with DXVK under wine</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107772">Bug 107772</a> - Mesa preprocessor matches if(def)s & endifs incorrectly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107779">Bug 107779</a> - Access violation with some games</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107786">Bug 107786</a> - [DXVK] MSAA reflections are broken in GTA V</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107806">Bug 107806</a> - glsl_get_natural_size_align_bytes() ABORT with GfxBench Vulkan AztecRuins</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107810">Bug 107810</a> - The 'va_end' call is missed after 'va_copy' in 'util_vsnprintf' function under windows</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107832">Bug 107832</a> - Gallium picking A16L16 formats when emulating INTENSITY16 conflicts with mesa</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107843">Bug 107843</a> - 32bit Mesa build failes with meson.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107856">Bug 107856</a> - i965 incorrectly calculates the number of layers for texture views (assert)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107857">Bug 107857</a> - GPU hang - GS_EMIT without shader outputs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107865">Bug 107865</a> - swr fail to build with llvm-libs 6.0.1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107869">Bug 107869</a> - u_thread.h:87:4: error: use of undeclared identifier 'cpu_set_t'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107870">Bug 107870</a> - Undefined symbols for architecture x86_64: "_util_cpu_caps"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107879">Bug 107879</a> - crash happens when link program</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107891">Bug 107891</a> - [wine, regression, bisected] RAGE, Wolfenstein The New Order hangs in menu</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107923">Bug 107923</a> - build_id.c:126: multiple definition of `build_id_length'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107926">Bug 107926</a> - [anv] Rise of the Tomb Raider always misrendering, segfault and gpu hang.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107941">Bug 107941</a> - GPU hang and system crash with Dota 2 using Vulkan</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108012">Bug 108012</a> - Compiler crashes on access of non-existent member incremental operations</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108024">Bug 108024</a> - [Debian Stretch]Fail to build because "xcb_randr_lease_t"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108164">Bug 108164</a> - [radv] VM faults since 5d6a560a2986c9ab421b3c7904d29bb7bc35e36f</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108245">Bug 108245</a> - RADV/Vega: Low mip levels of large BCn textures get corrupted by vkCmdCopyBufferToImage</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108894">Bug 108894</a> - [anv] vkCmdCopyBuffer() and vkCmdCopyQueryPoolResults() write-after-write hazard</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108909">Bug 108909</a> - Vkd3d test failure test_resolve_non_issued_query_data()</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108914">Bug 108914</a> - blocky shadow artifacts in The Forest with DXVK, RADV_DEBUG=nohiz fixes this</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108877">Bug 108877</a> - OpenGL CTS gl43 test cases were interrupted due to segment fault</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=109023">Bug 109023</a> - error: inlining failed in call to always_inline ‘__m512 _mm512_and_ps(__m512, __m512)’: target specific option mismatch</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=109129">Bug 109129</a> - format_types.h:1220: undefined reference to `_mm256_cvtps_ph'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=109229">Bug 109229</a> - glLinkProgram locks up for ~30 seconds</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=109242">Bug 109242</a> - [RADV] The Witcher 3 system freeze</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=109488">Bug 109488</a> - Mesa 18.3.2 crash on a specific fragment shader (assert triggered) / already fixed on the master branch.</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (2):</p>
<ul>
<li>bin/get-pick-list.sh: fix the oneline printing</li>
<li>bin/get-pick-list.sh: fix redirection in sh</li>
</ul>
<p>Axel Davy (1):</p>
<ul>
<li>st/nine: Immediately upload user provided textures</li>
</ul>
<p>Bas Nieuwenhuizen (3):</p>
<ul>
<li>radv: Only use 32 KiB per threadgroup on Stoney.</li>
<li>radv: Set partial_vs_wave for pipelines with just GS, not tess.</li>
<li>nir: Account for atomics in copy propagation.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=109107">Bug 109107</a> - gallium/st/va: change va max_profiles when using Radeon VCN Hardware</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=109543">Bug 109543</a> - After upgrade mesa to 19.0.0~rc1 all vulkan based application stop working ["vulkan-cube" received SIGSEGV in radv_pipeline_init_blend_state at ../src/amd/vulkan/radv_pipeline.c:699]</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104297">Bug 104297</a> - [i965] Downward causes GPU hangs and misrendering on Haswell</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104602">Bug 104602</a> - [apitrace] Graphical artifacts in Civilization VI on RX Vega</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107052">Bug 107052</a> - [Regression][bisected]. Crookz - The Big Heist Demo can't be launched despite the "true" flag in "drirc"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107563">Bug 107563</a> - [RADV] Broken rendering in Unity demos</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108999">Bug 108999</a> - Calculating the scissors fields when the y is flipped (0 on top) can generate negative numbers that will cause assertion failure later on.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=109443">Bug 109443</a> - Build failure with MSVC when using Scons >= 3.0.2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=109451">Bug 109451</a> - [IVB,SNB] LINE_STRIPs following a TRIANGLE_FAN fail to use primitive restart</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=109594">Bug 109594</a> - totem assert failure: totem: src/intel/genxml/gen9_pack.h:72: __gen_uint: La declaración `v <= max' no se cumple.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=109597">Bug 109597</a> - wreckfest issues with transparent objects & skybox</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=109601">Bug 109601</a> - [Regression] RuneLite GPU rendering broken on 18.3.x</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=109698">Bug 109698</a> - dri.pc contents invalid when built with meson</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=109735">Bug 109735</a> - [Regression] broken font with mesa_vulkan_overlay</li>
</ul>
<h2>Changes</h2>
<p>Alok Hota (1):</p>
<ul>
<li>swr/rast: bypass size limit for non-sampled textures</li>
</ul>
<p>Andrii Simiklit (1):</p>
<ul>
<li>i965: re-emit index buffer state on a reset option change.</li>
</ul>
<p>Axel Davy (2):</p>
<ul>
<li>st/nine: Ignore window size if error</li>
<li>st/nine: Ignore multisample quality level if no ms</li>
</ul>
<p>Bas Nieuwenhuizen (4):</p>
<ul>
<li>radv: Sync ETC2 whitelisted devices.</li>
<li>radv: Fix float16 interpolation set up.</li>
<li>radv: Allow interpolation on non-float types.</li>
<li>radv: Interpolate less aggressively.</li>
</ul>
<p>Carlos Garnacho (1):</p>
<ul>
<li>wayland/egl: Ensure EGL surface is resized on DRI update_buffers()</li>
</ul>
<p>Danylo Piliaiev (1):</p>
<ul>
<li>glsl/linker: Fix unmatched TCS outputs being reduced to local variable</li>
</ul>
<p>David Shao (1):</p>
<ul>
<li>meson: ensure that xmlpool_options.h is generated for gallium targets that need it</li>
</ul>
<p>Eleni Maria Stea (1):</p>
<ul>
<li>i965: fixed clamping in set_scissor_bits when the y is flipped</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=110211">Bug 110211</a> - If DESTDIR is set to an empty string, the dri drivers are not installed</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=110221">Bug 110221</a> - build error with meson</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=110259">Bug 110259</a> - radv: Sampling depth-stencil image in GENERAL layout returns nothing but zero (regression, bisected)</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (4):</p>
<ul>
<li>glsl: correctly validate component layout qualifier for dvec{3,4}</li>
<li>glsl/linker: don't fail non static used inputs without matching outputs</li>
<li>glsl/linker: simplify xfb_offset vs xfb_stride overflow check</li>
<li>Revert "glsl: relax input->output validation for SSO programs"</li>
</ul>
<p>Bas Nieuwenhuizen (2):</p>
<ul>
<li>radv: Use correct image view comparison for fast clears.</li>
<li>ac/nir: Return frag_coord as integer.</li>
</ul>
<p>Danylo Piliaiev (1):</p>
<ul>
<li>glsl: Cross validate variable's invariance by explicit invariance only</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=110211">Bug 110211</a> - If DESTDIR is set to an empty string, the dri drivers are not installed</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=110221">Bug 110221</a> - build error with meson</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (4):</p>
<ul>
<li>glsl: correctly validate component layout qualifier for dvec{3,4}</li>
<li>glsl/linker: don't fail non static used inputs without matching outputs</li>
<li>glsl/linker: simplify xfb_offset vs xfb_stride overflow check</li>
<li>Revert "glsl: relax input->output validation for SSO programs"</li>
</ul>
<p>Bas Nieuwenhuizen (2):</p>
<ul>
<li>radv: Use correct image view comparison for fast clears.</li>
<li>ac/nir: Return frag_coord as integer.</li>
</ul>
<p>Danylo Piliaiev (2):</p>
<ul>
<li>anv: Treat zero size XFB buffer as disabled</li>
<li>glsl: Cross validate variable's invariance by explicit invariance only</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>softpipe: fix texture view crashes</li>
</ul>
<p>Dylan Baker (5):</p>
<ul>
<li>docs: Add SHA256 sums for 19.0.0</li>
<li>cherry-ignore: Add commit that doesn't apply</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108766">Bug 108766</a> - Mesa built with meson has RPATH entries</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=109648">Bug 109648</a> - AMD Raven hang during va-api decoding</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=110257">Bug 110257</a> - Major artifacts in mpeg2 vaapi hw decoding</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=110259">Bug 110259</a> - radv: Sampling depth-stencil image in GENERAL layout returns nothing but zero (regression, bisected)</li>
</ul>
<h2>Changes</h2>
<p>Boyuan Zhang (1):</p>
<ul>
<li>st/va: reverse qt matrix back to its original order</li>
</ul>
<p>Caio Marcelo de Oliveira Filho (1):</p>
<ul>
<li>nir: Take if_uses into account when repairing SSA</li>
</ul>
<p>Dylan Baker (2):</p>
<ul>
<li>docs: Add SHA256 sums for mesa 19.0.1</li>
<li>VERSION: bump version for 19.0.2</li>
</ul>
<p>Eric Anholt (3):</p>
<ul>
<li>dri3: Return the current swap interval from glXGetSwapIntervalMESA().</li>
<li>v3d: Bump the maximum texture size to 4k for V3D 4.x.</li>
<li>v3d: Don't try to use the TFU blit path if a scissor is enabled.</li>
</ul>
<p>Eric Engestrom (1):</p>
<ul>
<li>meson: strip rpath from megadrivers</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>Revert "anv/radv: release memory allocated by glsl types during spirv_to_nir"</li>
</ul>
<p>Karol Herbst (1):</p>
<ul>
<li>nir/print: fix printing the image_array intrinsic index</li>
With the August 2015 Workstation 12 / Fusion 8 releases, OpenGL 3.3
is supported in the guest.
This requires:
</p>
<ul>
<li>The VM is configured for virtual hardware version 12.
<li>The host OS, GPU and graphics driver supports DX11 (Windows) or
@@ -37,12 +38,28 @@ This requires:
<li>On Linux, the vmwgfx kernel module must be version 2.9.0 or later.
<li>A recent version of Mesa with the updated svga gallium driver.
</ul>
</p>
<p>
Otherwise, OpenGL 2.1 is supported.
</p>
<p>
With the Fall 2018 Workstation 15 / Fusion 11 releases, additional
features are supported in the driver:
<ul>
<li>Multisample antialiasing (2x, 4x)
<li>GL_ARB/AMD_draw_buffers_blend
<li>GL_ARB_sample_shading
<li>GL_ARB_texture_cube_map_array
<li>GL_ARB_texture_gather
<li>GL_ARB_texture_query_lod
<li>GL_EXT/OES_draw_buffers_indexed
</ul>
<p>
This requires version 2.15.0 or later of the vmwgfx kernel module and
the VM must be configured for hardware version 16 or later.
</p>
<p>
OpenGL 3.3 support can be disabled by setting the environment variable
SVGA_VGPU10=0.
@@ -126,7 +143,7 @@ Begin by saving your current directory location:
<ul>
<li>Mesa/Gallium master branch. This code is used to build libGL, and the direct rendering svga driver for libGL, vmwgfx_dri.so, and the X acceleration library libxatracker.so.x.x.x.
<li>VMware Linux guest kernel module. Note that this repo contains the complete DRM and TTM code. The vmware-specific driver is really only the files prefixed with vmwgfx.
<pre>
@@ -136,7 +153,7 @@ Begin by saving your current directory location:
Most distros ship with this but it's safest to install a newer version.
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.