Shader stats from VERDE:
Default scheduler:
Totals:
SGPRS: 491272 -> 488672 (-0.53 %)
VGPRS: 289980 -> 311093 (7.28 %)
Code Size: 11091656 -> 11219948 (1.16 %) bytes
LDS: 97 -> 97 (0.00 %) blocks
Scratch: 1732608 -> 2246656 (29.67 %) bytes per wave
Max Waves: 78063 -> 77352 (-0.91 %)
Wait states: 0 -> 0 (0.00 %)
Looking at some of the worst regressions, I get:
- The VGPR increase seems to be caused by the fact that if PS has used less
than 16 VGPRs, now it will always use 16 VGPRs and sometimes even 20.
However, the wave count remains at 10 if VGPRs <= 24, so no harm there.
- The scratch increase seems to be caused by SGPR spilling.
The unnecessary SGPR spilling has been an ongoing issue with the compiler
and it's completely fixable by rematerializing s_loads or reordering
instructions.
SI scheduler:
Totals:
SGPRS: 374848 -> 374576 (-0.07 %)
VGPRS: 284456 -> 307515 (8.11 %)
Code Size: 11433068 -> 11535452 (0.90 %) bytes
LDS: 97 -> 97 (0.00 %) blocks
Scratch: 509952 -> 522240 (2.41 %) bytes per wave
Max Waves: 79456 -> 78217 (-1.56 %)
Wait states: 0 -> 0 (0.00 %)
VGPRs - same story as before. The SI scheduler doesn't spill SGPRs so much
and generally spills way less than the default scheduler.
(522240 spills vs 2246656 spills)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Still disabled.
Only prologs & epilogs are compiled in draw calls, but each variant of those
is compiled only once per process.
VS is always compiled as hw VS.
TES is always compiled as hw VS.
LS and ES stages are always compiled on demand.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It only exports the primitive ID.
Also used by TES when it's compiled as VS.
The VS input location of the primitive ID input is v2.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Prologs (shader binaries inserted before the API shader binary) need to
know this, so that they won't change the input registers unintentionally.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Kepler compute support is really different than Fermi and it's not
ready yet.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Changes from v3:
- move the previous OP_SELP change to the previous commit
Changes from v2:
- make sure the op is OP_SELP when emitting the predicate and add one
assert
- use bld.getSSA() for mkOp2()
- add cross edge between tryLockAndSetBB and joinBB
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
This OP_SELP insn will be used to handle compare and swap subops.
Changes from v2:
- fix logic for GK110+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Shared memory address space (FILE_MEMORY_SHARED) must be used instead
of global memory when a shared memory area is declared.
Changes from v2:
- oops, do not remove TGSI_FILE_BUFFER in a switch in
nv50_ir_from_tgsi.cpp
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reduce likelihood of collision with real buffers by placing the
hole at the top of the 4G area. This fixes some indirect draw+compute
tests with large buffers.
Suggested by Ilia Mirkin.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
When indirect compute is used, the size of the grid (in blocks) is
stored as three integers inside a buffer. This requires a macro to
set up GRIDDIM_YX and GRIDDIM_Z.
Changes from v2:
- do not launch the grid if the number of groups for a dimension is 0
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Textures and samplers don't seem to be aliased between COMPUTE and 3D.
Changes from v2:
- refactor the code to share (almost) the same logic between 3d and
compute
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is loosely based on 3D. Shader buffers are bound on c15 (the
driver constbuf) at offset 0x200.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Changes from v3:
- add new validation state for COMPUTE driver constbuf
Changes from v2:
- always bind the driver consts even if user params come in via clover
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This will be used to invalidate 3D driver constbuf when using COMPUTE
and vice-versa. This is needed because this CB contains a bunch of
useful information like the addrs of shader buffers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Loosely based on 3D.
Changs from v3:
- invalidate COMPUTE CBs after validating 3D CBs because they are
aliased
Changes from v2:
- get rid of the 's' param to nvc0_cb_bo_push() because it doesn't
matter to upload constbufs for compute using the 3d chan
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Compute shaders are totally unsupported. This avoids Clover to
report that OpenCL is supported on Tesla because it's a lie.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
A return value of '-1' means that there was error during swap with a
window drawable, in this case we set error as EGL_BAD_NATIVE_WINDOW.
v2: coding style cleanup, better commit message
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Null-check on "*value" is currently done in _eglGetSyncAttrib, which is
after eglGetSyncAttribKHR dereferences it.
Move the check a layer up (in the beginning of eglGetSyncAttribKHR) to
avoid segfaults.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Emil Velikov: tweak commit message, add stable tag]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
It's basically the same thing as GL_ARB_texture_stencil8 except that
glCopyTexImage isn't supported, so add STENCIL_INDEX to the list of
invalid GLES formats for glCopyTexImage.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
- LOD must be provided in .w for TXF (even for buffer textures)
- User buffer must be valid at draw time
- Must have a sampler associated with the sampler view
This makes PBO uploads work again on nouveau.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The base format is a function of the user-requested format, while the
driver format is not. So we should use the base format instead.
The driver format can be anything. Specifically in the stencil-only
case, it might be a depth/stencil format. However we still want to
refuse such an attachment when bound to GL_DEPTH.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
This reduces a glTexImage(GL_RGBA, GL_UNSIGNED_BYTE) hot spot in when
storing the texture as BGRA.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Instead of discarding the texture we created, keep it around in case
the next glDrawPixels draws the same image again. This is intended
to help application which draw the same image several times in a row,
either within a frame or subsequent frames.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Specifically, for the case where we initialize a dmat with a source
matrix that has fewer columns/rows.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Need to set some non-zero limits for MaxCombinedUniformComponents,
otherwise we hit an "Too many <type> shader uniform components" error
in the linker.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This fixes a glDrawPixels regression since b63fe0552b. The new
quad-drawing utility code uses 3 vertex attributes (xyz, rgba, st).
For glDrawPixels path we don't use the rgba attribute so there's a
gap in the TGSI VS input declarations (INPUT[0] = pos, INPUT[2] =
texcoord). The TGSI->VGPU10 translations code did not handle this
correctly. I missed this because my VM was configured for HWv11
while testing.
Another way to fix this would be to change the tgsi_scan.c code so
that the tgsi_shader_info::num_inputs (and num_outputs) included
the unused inputs/outputs. These counts would then actually be
"max input register index + 1" rather than "number of used inputs".
But that change could impact all drivers so put it off for now.
No regressions found with piglit or typical GL apps.
v2: also update alloc_system_value_index() to use info.file_max[]
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Because the if statement that checks whether we have a return
statement is valid only on x86, surround it with X86 or X86-64
arch defines
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Currently, disassemble() directly prints to stdout. This has broke the
profiling support for llvmpipe JIT code.
This patch redirects the output to an sstream object, which is then
either gets printed to stdout (for assembly debugging) or gets written
to a file in /tmp/ (for profiling support).
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
src/gallium/drivers/trace/tr_context.c:1713:39: warning: ‘rbug_blocker_flags’ defined but not used [-Wunused-const-variable]
static const struct debug_named_value rbug_blocker_flags[] = {
^~~~~~~~~~~~~~~~~~
Note that use of rbug_blocker_flags was removed in:
commit 5494332128
Author: Jakob Bornecrantz <jakob@vmware.com>
Date: Wed May 12 19:26:19 2010 +0100
trace: Remove rbug from trace
Signed-off-by: Rob Clark <robdclark@gmail.com>
src/gallium/auxiliary/pipebuffer/pb_bufmgr_mm.c: In function ‘mm_bufmgr_create_from_buffer’:
src/gallium/auxiliary/pipebuffer/pb_bufmgr_mm.c:288:4:
warning: statement is indented as if it were guarded by... [-Wmisleading-indentation]
if(mm->map)
^~
src/gallium/auxiliary/pipebuffer/pb_bufmgr_mm.c:286:1: note:
...this ‘if’ clause, but it is not
if(mm->heap)
^~
Signed-off-by: Rob Clark <robdclark@gmail.com>
src/gallium/auxiliary/hud/font.c:234:22: warning: ‘Fixed8x13_Character_159’ defined but not used [-Wunused-const-variable]
static const GLubyte Fixed8x13_Character_159[] = { 9, 0, 0, 0, 0, 0, 0,170, 0, 0, 0,130, 0, 0, 0,130, 0, 0, 0,130, 0, 0, 0,170, 0, 0, 0, 0, 0};
^~~~~~~~~~~~~~~~~~~~~~~
.... many more..
These are simply unused, just #if 0 them out for now, in case someone
wants to use them in the future.
Signed-off-by: Rob Clark <robdclark@gmail.com>
src/mesa/main/texstore.c:92:22: warning: ‘map_1032’ defined but not used [-Wunused-const-variable]
static const GLubyte map_1032[6] = { 1, 0, 3, 2, ZERO, ONE };
^~~~~~~~
src/mesa/main/texstore.c:91:22: warning: ‘map_3210’ defined but not used [-Wunused-const-variable]
static const GLubyte map_3210[6] = { 3, 2, 1, 0, ZERO, ONE };
^~~~~~~~
src/mesa/main/texstore.c:90:22: warning: ‘map_identity’ defined but not used [-Wunused-const-variable]
static const GLubyte map_identity[6] = { 0, 1, 2, 3, ZERO, ONE };
^~~~~~~~~~~~
These appear to be unused since:
commit 8ec6534b26
Author: Iago Toral Quiroga <itoral@igalia.com>
AuthorDate: Wed Oct 15 13:42:11 2014 +0200
mesa: Use _mesa_format_convert to implement texstore_rgba.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
src/compiler/glsl/lower_discard_flow.cpp:79:1: warning: ‘ir_visitor_status {anonymous}::lower_discard_flow_visitor::visit_enter(ir_loop_jump*)’ defined but not used [-Wunused-function]
lower_discard_flow_visitor::visit_enter(ir_loop_jump *ir)
^~~~~~~~~~~~~~~~~~~~~~~~~~
The base class method that was intended to be overridden was
'visit(ir_loop_jump *ir)', not visit_enter().
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
src/compiler/glsl/ast_to_hir.cpp: In function ‘unsigned int ast_process_struct_or_iface_block_members(exec_list*, _mesa_glsl_parse_state*, exec_list*, glsl_struct_field**, bool, glsl_matrix_layout, bool, ir_variable_mode, ast_type_qualifier*,
unsigned int, unsigned int)’:
src/compiler/glsl/ast_to_hir.cpp:6339:52: warning: ‘first_member_has_explicit_location’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (!layout->flags.q.explicit_location &&
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
((first_member_has_explicit_location &&
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
!qual->flags.q.explicit_location) ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(!first_member_has_explicit_location &&
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
qual->flags.q.explicit_location))) {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp:244:1: warning:
‘void {anonymous}::fs_copy_prop_dataflow::dump_block_data() const’ defined but not used [-Wunused-function]
fs_copy_prop_dataflow::dump_block_data() const
^~~~~~~~~~~~~~~~~~~~~
From looking at git history, it looks like this is intended to be unused
(ie. just for adding on-demand debug prints)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
src/util/hash_table.h:111:23: warning: ‘_mesa_fnv32_1a_offset_bias’ defined but not used [-Wunused-const-variable]
static const uint32_t _mesa_fnv32_1a_offset_bias = 2166136261u;
^~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Android builds with -Wunused-parameter enabled which results in spewing
lots of warnings. Disable it so more meaningful warnings are more visible.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Use the LOCAL_CFLAGS_{32/64} instead of arch specific variants to define
the DEFAULT_DRIVER_DIR. This enables building for arm64.
Cc: Chih-Wei Huang <cwhuang@android-x86.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
TARGET_CC is not defined for the secondary arch on combined 32/64-bit
builds. The build system uses 2ND_TARGET_CC instead and it is not meant
to be used in module makefiles. LOCAL_CC was used to provide C only
flags as -std=c99 is not valid for C++ files. Since Android 4.4,
LOCAL_CONLYFLAGS was added to set compiler flags on C files only, so it
can be used now instead of LOCAL_CC.
This will break on pre-4.4 versions of Android, but it unlikely anyone
is using current Mesa with such an old version of Android.
Cc: Chih-Wei Huang <cwhuang@android-x86.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Pass the additional config attributes to dri2_add_config to set them
instead of open coding them. This is in preparation to add more attributes.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
From ARB_sample_shading:
"gl_NumSamples is the total number of samples in the framebuffer,
or one if rendering to a non-multisample framebuffer"
So make sure to always pass in at least 1.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Edward O`Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
This patch moves the calculation of current uniforms to
link_uniforms, which makes use of UniformRemapTable which
stores all the reserved uniform locations.
Location assignment for implicit uniforms now tries to use
any gaps left in the table after the location assignment
for explicit uniforms. This gives us more space to store more
uniforms.
Patch is based on earlier patch with following changes/additions:
1: Move the counting of explicit locations to
check_explicit_uniform_locations and then pass
the number to link_assign_uniform_locations.
2: Count the number of empty slots in UniformRemapTable
and store them in a list_head.
3: Try to find an empty slot for implicit locations from
the list, if that fails resize UniformRemapTable.
Fixes following CTS tests:
ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max
ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max-array
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93696
Just like the rest of the msaa "implementation" it's just fake for now...
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This basically saves the current pipeline state, sets up state for
rendering, constructs a set of textured quads, renders, then restores
the previous pipeline state.
It shouldn't be hard to implement a similar function for non-gallium
drives. With some code refactoring, the vertex definition code could
probably be shared.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This improves the performance of applications which use glXUseXFont()
or wglUseFontBitmaps() and glCallLists() to draw bitmap text.
Basically, we collect all the glBitmap images from the display lists
and put them into a texture atlas. To render the bitmaps for a
glCallLists() command, we render a set of textured quads where each
quad is textured with one bitmap image. Actually, the rendering part
has to be done by the Mesa driver or Mesa/gallium state tracker.
Note that GLUT demos that use glutBitmapCharacter() don't benefit
from this.
v2, per Nicolai Hähnle:
- check the max tex rect size is at least 1024.
- add comment in dd.h that texture_rectangle is required.
- in _mesa_DeleteLists(), try to delete the atlas before the list(s)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Gallium doesn't present these as GL_RED-style. A swizzle is necessary to
present the proper data in the unused components.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Also adds some of the Iris/Pro parts which we previously didn't have named.
v2: 0x192d is gt3, not gt4
Adding some 'e' tags for eDRAM parts
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Acked-by: Michał Winiarski <michal.winiarski@intel.com>
The restriction on multisampled integer texture formats only applies to
GLES 3.0, so don't apply it to GLES 3.1 contexts. This fixes a slew of
dEQP-GLES31.functional.state_query.internal_format.*
tests, which now all pass.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Every stage has a corresponding 3DSTATE_CONSTANT_XS packet, so having
the code to create and emit push constant buffers in genX_vs_state.c
is a little strange. Moving it to a separate file seems more logical.
v2 [Ken]: Rebase on master, explain motivation in the commit message.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen4/5's SEL instruction cannot use conditional modifiers, so min/max
are implemented as CMP + SEL. Handling that after optimization lets us
CSE more.
On Ironlake:
total instructions in shared programs: 6426035 -> 6422753 (-0.05%)
instructions in affected programs: 326604 -> 323322 (-1.00%)
helped: 1411
total cycles in shared programs: 129184700 -> 129101586 (-0.06%)
cycles in affected programs: 18950290 -> 18867176 (-0.44%)
helped: 2419
HURT: 328
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This will prevent optimization passes from introducing unsupported
library calls.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Even though it's a no-op, it's important to keep track of the type so
that we can pick the properly-signed op later on.
This fixes dEQP-GLES3.functional.shaders.precision.uint.highp_div_fragment,
which ended up using IDIV instead of UDIV.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Note that this results in a different transformation for the viewport's
Z axis (depth range), but that doesn't matter for this case.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
On gen7 (Ivy Bridge, Haswell), we will get a GPU hang if an indirect
dispatch is used, but one of the dimensions is 0.
Therefore we use predicated rendering on the GPGPU_WALKER command to
handle this case.
Fixes piglit test: spec/arb_compute_shader/zero-dispatch-size
From the ARB_compute_shader spec, under DispatchCompute:
"If the work group count in any dimension is zero, no work groups are
dispatched."
And then for DispatchComputeIndirect:
... "is equivalent (assuming no errors are generated) to calling
DispatchCompute with <num_groups_x>, <num_groups_y> and
<num_groups_z>" ...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94100
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
This seems to give more reliable results. More similar to what we do on
a3xx, although I think it breaks the a3xx theory that the four sets of
results map to each MRT (since we appear to still only have four sets on
a4xx). The divide-by-two is a bit odd, but seems to be needed for some
reason.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Some hw queries need their sample memory locations to have certain
alignment. At the moment that isn't an issue, since the only hw query
is occlusion, so all samples have the same size. But when others are
added with different sample sizes, this starts to be a problem.
All current and immediately upcoming hw queries simply need their
sample address aligned to their size, so let's use that for now.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Add enable hook for hw query providers. Some will need to configure
perfctr selector registers, which we want to do at the start of the
submit.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This will be needed to support converting from cycle counts to time for
performance related queries (initially time-elapsed, but there are some
additional performance counters that could be wired up).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c: In function ‘emit_tex’:
src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c:1368:26: warning: unused variable ‘const_off’ [-Wunused-variable]
struct ir3_instruction *const_off[4];
^~~~~~~~~
unused since:
commit 8750299a42
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date: Tue Feb 9 14:51:28 2016 -0800
nir: Remove the const_offset from nir_tex_instr
Signed-off-by: Rob Clark <robdclark@gmail.com>
helps shaders in saints row IV, bioshock infinite and shadow warrior
total instructions in shared programs : 1914931 -> 1903900 (-0.58%)
total gprs used in shared programs : 247920 -> 247785 (-0.05%)
total local used in shared programs : 5673 -> 5673 (0.00%)
total bytes used in shared programs : 17558272 -> 17457320 (-0.57%)
local gpr inst bytes
helped 0 137 719 719
hurt 0 12 0 0
v2: remove this opt for OP_SLCT and check against float for OP_SET
v3: simplified the code
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
When there's a predicate, it just goes onto the sources list. If the
quadop only has a single regular source, we will end up thinking that
the predicate is the second source. Check explicitly for the predSrc so
that we don't accidentally emit the wrong thing.
This fixes a bunch of dEQP-GLES3.functional.shaders.derivate.* tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
In a situation where the seamless setting isn't available on a
per-texture basis (G200+ Teslas, and all Fermis), assume that all
samplers will have it identically set, and enable accordingly.
This fixes arb_seamless_cubemap piglit test on Fermi and Tesla.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Noticed by Ilia when I was trying to figure out why some app was failing
to use ETC2.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Without this NVF0_COMPUTE environment variable, compute support is
initialized by default and this is not what we want for now because
it might break 3D. It will be enabled by default once we are sure it
won't break anything.
Please note that compute support on GM200+ is not enabled yet because
it needs to be double-checked.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fortunately, compute support on GM107 is very close to GK110, except
the GK110_COMPUTE.UNK02C4 which is invalid and should not be used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Because our firmware doesn't support the GK110_COMPUTE.FIRMWARE[0x6]
method the GPU hangs when it is used. Removing it fix the issue and
allow to launch compute shaders on GK110+.
Tested on GK208 and GM107.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
We already have one in the IR code that can be used everywhere its
needed in the AST code so remove the one from the AST.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This is usually handled by the backends in order to handle the
various interactions with the gl_*Color built-ins.
The problem is this means linking will fail if one side on the
interface adds the smooth qualifier to the varying and the other
side just uses the default even though they match.
This fixes various deqp tests. The spec is not clear what to for
desktop GL so leave it as is for now.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92743
This fixes the following dEQP test and the other compswap variants.
dEQP-GLES31.functional.ssbo.atomic.compswap.highp_int
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
When the number of uniform blocks is less than 12,
ARB_uniform_buffer_object can't be enabled and the maximum GL version
is not even 3.1...
This fixes a regression introduced in 7c79c1e (st/mesa: add compute
shader state) if the maximum number of uniform blocks allowed for
compute shaders is less than 12. This happens on Kepler but this might
also affect other Gallium drivers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reported-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Tested-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
The ARB_compute_shader spec says:
"If the work group count in any dimension is zero, no work groups
are dispatched."
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
See Ivy Bridge PRM, Volume 2, Part 2, 1.8.4 INTERFACE_DESCRIPTOR_DATA:
DWORD 5, bits 20:16: "This field indicates how much shared local
memory the thread group requires. The amount is specified in 4k
blocks, but only powers of 2 are allowed: 0, 4k, 8k, 16k, 32k and 64k
per half-slice."
For Haswell, see Volume 2d, INTERFACE_DESCRIPTOR_DATA:
DWORD 5, bits 20:16: With text identical to the Ivy Bridge PRM.
For Broadwell, see Volume 2d, INTERFACE_DESCRIPTOR_DATA:
DWORD 6, bits 20:16: With text identical to the Ivy Bridge PRM.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
cso_save_state() takes a bitmask of state items to save. Calling
cso_restore_state() restores those states.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Define a new st_util_vertex structure which is a bit smaller (9 floats
versus the previous 12 floats per vertex). Clean up the glClear,
glDrawPixels and glBitmap code that sets up the vertex data and does the
drawing so it's all very similar. This can lead to more consolidation.
v2: add assertion that vertex buffer slot == 0 to catch possible future
change in cso_get_aux_vertex_buffer_slot() behavior.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
LLVM removed LLVMAddTargetData for the 3.9 release in r260919. For the two
places in mesa where this is called, only enable the lines when compiling
for less then 3.9.
For the radeon driver, I'm not sure how to check if any other LLVM calls need
to be adjusted. I think since the target data used is extracted from the
LLVMModule, it isn't necessary to pass it back to LLVM again.
The code does compile, and at least for radeonsi does run OpenGL games.
[ Michel Dänzer: Move #if closer to LLVMAddTargetData in lp_bld_init.c,
and add HAVE_LLVM < 0x0309 guards around now unused occurrences of TD
and data_layout ]
Signed-off-by: Matthew Dawson <matthew@mjdsystems.ca>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Alos use the opportunity to mark inputs constant. (Context has to be
given as read-write to intel_miptree_supports_non_msrt_fast_clear()
to support debug output).
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
v2 (Ben): Use combination of msaa_layout and number of samples
instead of introducing explicit type for lossless
compression (intel_miptree_is_lossless_compressed()).
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
v2 (Ben): Use combination of msaa_layout and number of samples
instead of introducing explicit type for lossless
compression (intel_miptree_is_lossless_compressed()).
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
v2 (Ben): Use combination of msaa_layout and number of samples
instead of introducing explicit type for lossless
compression (intel_miptree_is_lossless_compressed()).
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
v2 (Ben): Use combination of msaa_layout and number of samples
instead of introducing explicit type for lossless
compression.
v3 (Ben): Squash with "i965: Resolve color buffer also in
lossless compression case" and clarify simple
non-compressed fast clear case.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
This got pushed accidentally in the first place but wasn't reverted
as it didn't regress piglit but instead fixed one newly introduced
test exercising a corner in case in i965 driver. However, saving and
restoring vertex buffer context is complicated and requires more
thought.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94150
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Tapani Palli <tapani.palli@intel.com>
Adds support for the new TIC layout that's present on Maxwell GPUs,
heavily based on the code for the existing layout.
This code is required for GM20x support. While GM10x supports the older
layout still, this commit switches it to use the updated version instead.
Piglit testing shows zero regressions on GM107.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
We previously stored texture format information as it would appear in
the TIC.
We're about to support the new TIC layout that appeared with Maxwell,
so it makes more sense to store the data in a split-out format.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
We've previously had identical naming between vertex and texture
formats, so it mostly made sense to define these together.
However, upcoming patches are going to transition the driver over to
using updated texture header definitions using NVIDIA's naming, and this
will no longer be the case.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
This enables ARB_shader_image_load_store and ARB_shader_image_size when
the backend claims support for these. It will also implicitly enable the
image component of ARB_shader_texture_image_samples.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Make them akin to shader buffers, with no refcounting/etc. Just used to
pass data about the bound image in ->set_shader_images.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Silences the following GCC warning:
mesa/src/gallium/drivers/vc4/vc4_qir_schedule.c: In function 'qir_schedule_instructions':
mesa/src/gallium/drivers/vc4/vc4_qir_schedule.c:578:16: warning: missing braces around initializer [-Wmissing-braces]
struct schedule_state state = { 0 };
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Variable was previously always set to true. Accordingly, the later
assert() served no active purpose.
Found with GCC warning and code inspection:
mesa/src/gallium/drivers/vc4/vc4_qpu_emit.c: In function'vc4_generate_code':
mesa/src/gallium/drivers/vc4/vc4_qpu_emit.c:315:22: warning: variable 'handled_qinst_cond' set but not used [-Wunused-but-set-variable]
bool handled_qinst_cond = true;
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
The two consumers want to know that the destination will be exactly the
source, which is not true if we might not set the destination.
Signed-off-by: Eric Anholt <eric@anholt.net>
This fixes a number of dEQP tests, such as:
dEQP-GLES31.functional.program_interface_query.buffer_limited_query.resource_query
It was expecting the length to be set even in the bufSize == 0 case.
Also _mesa_get_program_resourceiv does some error checking on the
resource which should probably happen even in the bufSize == 0 case as
well although there's no dEQP test for that.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
We already have this logic in the gallium/util functions so
lets reduce some entropy while here.
V.2:
Apply change to nv50 also as suggested by Samuel Pitoiset.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
brw_draw_upload.c uploads VertexID/InstanceID first, then DrawID.
So we need to assign the attribute mapping in that order as well.
Fixes the following Pigit tests with the vec4 backend:
- arb_shader_draw_parameters-drawid vertexid
- arb_shader_draw_parameters-drawid-indirect basevertex
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The ImageAccess array is statically sized to MAX_IMAGE_UNIFORMS:
GLenum ImageAccess[MAX_IMAGE_UNIFORMS];
There was no bounds checking ensuring we don't overflow. Passing in a
shader with too many uniforms would cause writes to extend into other
fields, such as sh->NumImages.
Later linker checks already handle reporting an error when there are too
many images, so just avoid corrupting structures here.
This rearranges the logic a bit to look more like the sampler case.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
GL_ARB_texture_multisample and GLES 3.1 expect the initial value to be
GL_TRUE. This fixes
dEQP-GLES31.functional.state_query.texture_level.texture_2d_multisample_array.fixed_sample_locations_integer
and a few related tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
SPIR-V has a concept of a function type that's used fairly heavily. We
could special-case function types in SPIR-V -> NIR but it's easier if we
just add support to glsl_types.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is to be used by SPIR-V for representing a sampler that isn't attached
to any particular image. In SPIR-V, all of the interesting bits such as
dimensionality, sampled type, etc. come from the image, the bare "sampler"
type simply uses a sampled type of VOID and 0 values for the rest.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
It's a bit more descriptive since it is the base type that you get when you
sample from it. Also, the next commit adds a bare "sampler" type and we
need glsl_type::sampler_type available for a public static member.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Fixes several of the
"dEQP-GLES31.functional.image_load_store*load_store*single_layer" dEQP
tests that use image formats we implement using untyped surface
messages.
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Changes from v3:
- dump the TGSI compute program
Changes from v2:
- remove use of MALLOC()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
According to the spec, this also increases the following minimum values:
- MAX_COMBINED_TEXTURE_IMAGE_UNITS 96 (6*16), was 80
- MAX_UNIFORM_BUFFER_BINDINGS 72 (6*12), was 60
ARB_compute_shader is not enabled by default because images support is
still not implemented yet. If you want to use it you need to set
MESA_EXTENSION_OVERRIDE=GL_ARB_compute_shader.
Changes from v2:
- make use of the new PIPE_CAP_SHADER_SUPPORTED_IRS cap instead of
enabling the extension when PIPE_CAP_COMPUTE is enabled.
- query for PIPE_CAP_COMPUTE first
- s/shader_supported_irs/compute_supported_irs/
- disable ARB_compute_shader and add a comment which explains why
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
LOCAL_INVOCATION_ID, WORK_GROUP_ID and NUM_WORK_GROUPS are respectively
mapped to THREAD_ID, BLOCK_ID and GRID_SIZE.
Changes from v2:
- add assertions in st_translate_program()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This adds GLSL intrinsics for load/store and atomic operations.
Changes from v2:
- use PROGRAM_MEMORY instead of PROGRAM_BUFFER
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Compute needs a new and different validation path.
Changes from v2:
- make use of unreachable() instead of assert() when the pipeline is
invalid
- move the st_pipeline enumeration to st_context.h instead of st_api.h
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This introduces TGSI_FILE_MEMORY for shared, global and local memory.
Only shared memory is currently supported.
Changes from v2:
- introduce TGSI_FILE_MEMORY
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This cap indicates the supported representations of programs. It should
be a mask of pipe_shader_ir bits. It will allow to enable
ARB_compute_shader if the underlying driver supports TGSI.
Changes from v2:
- improve description of PIPE_SHADER_CAP_SUPPORTED_IRS
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Like indirect draw, we need to store a resource and an offset that
needs to be 4 byte aligned. When indirect is used, the size of the
grid (in blocks) is stored with three 32-bit integers.
Changes from v2:
- s/most values/block sizes/
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This introduces pipe_grid_info which contains all information to
describe a launch_grid call. This will be used to implement indirect
compute in the same fashion as indirect draw.
Changes from v2:
- correctly initialize pipe_grid_info for nv50/nvc0
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Changes from v2:
- removed cso_{save,restore}_compute_shader() functions and the
compute_shader_saved variable because disabling compute shaders for
meta ops is not currently needed
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The size of shared variables needs to be stored in gl_compute_program
in order to set up pipe_compute_state::req_local_mem.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will allow to query the underlying drivers for the maximum
total storage size of all variables declared as <shared> with
PIPE_COMPUTE_CAP_MAX_LOCAL_SIZE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Until now there has been only one type of color buffer that needs
to resolved - namely single sampled fast clear. As even the
sampler engine in GPU doesn't understand the associated meta data,
the color values need to be always resolved prior to reading them.
From SKL onwards there is new scheme supported called the lossless
compression of single sampled color buffers. This is something that
is understood by the sampling engine and therefore resolving of
these types of buffers is not necessary before sampling.
This patch adds means to make the distinction when considering if
resolve is needed.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
In addition to simply calling miptree_create() the higher level
call intel_miptree_create() also considers if the buffer should
be associated with an auxiliary buffer based on the given format.
Here we are allocating an auxiliary buffer which in turn has such
format that would mislead intel_miptree_create_layout() later on
to try to associate the auxiliary buffer with an auxiliary buffer.
To prevent this the actual buffer creation logic was split out
into its own function. Lets invoke that instead.
v2 (Ben): Do not signal msaa layout with explicit argument but
using layout_flags instead.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
This allows ls, and scripts to get the file names in the correct order of
optimization.
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The L3 partitioning code tries to look at all programs - both render
programs (VS/TCS/TES/GS/FS) and compute (CS).
After calling brw_clear_cache, all prog_data pointers are invalid and
point to freed data. The intention was that flagging the dirty bits for
all programs would cause the next draw call to re-run the atoms for each
program stage, uploading new programs and installing new, valid pointers.
However, this doesn't quite work in our new multi-pipeline world. When
drawing or dispatching a compute workload, we only consider the programs
for the appropriate pipeline: drawing sets up VS/TCS/TES/GS/FS, but not
CS, and vice versa. This leaves pointers dangling a bit longer than
intended.
The L3 configuration code tries to inspect the prog_data for all shader
stages, so that we avoid having to reconfigure it when swapping back and
forth between render and compute workloads. So we can't have dangling
pointers.
The fix is simple: have brw_clear_cache NULL out stale prog_data
pointers, making it safe to inspect. The next L3 configuration pass
will see either the render shaders or compute shader as missing for
one go around, but will pick them up when both pipelines have run.
In other words, we'll simply reconfigure L3 twice, which is safe,
if a tiny bit wasteful - but then again, we just threw every compiled
shader we had on the floor and started recompiling the from scratch,
which is massively more wasteful, so it's not much of a concern.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93790
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jljusten@gmail.com>
If there is no pipe info log, we would unconditionally deref length,
which was only optionally there. _mesa_copy_string handles the source
being null, as well as the length, so may as well just always call it.
Fixes a segfault in
dEQP-GLES31.functional.state_query.program_pipeline.info_log
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Similar to commit dd9d2963d6 (mesa: AtomicBufferBindings should be
initialized to zero.), we should reset these to zero when unbinding.
This fixes a number of dEQP failures due to cross-test pollution. The
tests properly unbound everything, but when querying the values again,
the expectation was that they would be 0.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Similar as for AUX1-3, these enums aren't invalid (i.e. -1) but also not
supported by mesa. Returning BUFFER_COUNT causes the proper error to be
returned by ReadBuffer and other functions. This resolves some failures
in
dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.read_buffer
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes
dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.clear_bufferfv
and brings the logic up to spec with GL 4.5
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes
dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.clear_bufferuiv
and brings the logic up to spec with GL 4.5
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
There's a hunk above which sets INVALID_ENUM for GL_DEPTH
unconditionally.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes
dEQP-GLES31.functional.state_query.texture.texture_2d_multisample.depth_stencil_mode_integer
and a few related tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
We were implementing those the same way than
the default pool, which is sub-optimal.
The buffer is supposed to return pointer to
a ram copy when user locks, and automatically
update the vram copy when needed.
v2: Rename NineBuffer9_Validate to NineBuffer9_Upload
Rename validate_buffers to update_managed_buffers
Initialize NineBuffer9 managed fields after the resource
is allocated. In case of allocation failure, when the dtor
is executed, This->base.pool is then rightfully set.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
For 32 bits, incoming stack is 4-byte aligned.
We need to realign the stack to 16-byte at some point,
or there are issues later (crash with SSE, llvm, etc).
This patch chooses to align the stack at API entry points.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
using MIN/MAX is fine instead of CLAMP.
NRM doesn't exist anymore.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
We had several issues of crashes with it.
This should fix it.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Add new argument to d3d9_to_pipe_format_checked to
be able to bypass format support checks. This argument
is set to TRUE when the requested Pool is SCRATCH.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Returns INVALIDCALL when trying to create a surface
of unsupported format.
In practice, apps are supposed to check for format
support before trying to create a render target
of that format. However some bad behaving apps
could just try to create the surface and deduce if
it failed that it wasn't supported.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Texture and CubeTexture use common code,
and thus ATI1/ATI2 is already implemented
for CubeTexture.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
We were having checks at both Create*Texture functions
and in ctors.
Move all Create*Texture checks to ctors.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
This->base.base.resource is worth NULL
for SYSTEMMEM textures.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
We do not support shared textures, thus no need to set
the shared flag.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
We do not create a resource for SYSTEMMEM textures,
thus we do not need to set resource usage.
The only exception is vertexbuffer SYSTEMMEM, since
we do use a pipe resource for them.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
We already use these for gallium in
src/gallium/auxiliary/os/os_memory_stdc.h and it's always better to
minimize divergences between MinGW and MSVC.
Reviewed-by: Brian Paul <brianp@vmware.com>
Auxiliary buffers are always created with sample number of zero
which effectively prevents intel_miptree_create_layout() from trying
to associate auxiliary buffers with auxiliary buffers.
Now that there is more direct path available lets start using it
instead and stop even checking for such (im)possibility.
v2 (Ben): Do not signal msaa layout with explicit argument but
using layout_flags instead.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Currently the logic allocating and setting up miptrees is closely
combined with decision making when to re-allocate buffers in
X-tiled layout and when to associate colors with auxiliary buffers.
These auxiliary buffers are in turn also represented as miptrees
and are created by the same miptree creation logic calling itself
recursively. This means considering in vain if the auxiliary buffers
should be represented in X-tiled layout or if they should be
associated with auxiliary buffers again.
While this is somewhat unnecessary, this doesn't impose any problems
currently. Miptrees for auxiliary buffers are created as simgle-sampled
fusing the consideration for multi-sampled compression auxiliary
buffers. The format in turn is such that is not applicable for
single-sampled fast clears (that would require accompaning auxiliary
buffer).
But once the driver starts to support lossless compression of color
buffers the auxiliary buffer will have a format that would itself
be applicable for lossless compression. This would be rather
difficult and ugly to detect in the current miptree creation logic,
and therefore this patch seeks to separate the association logic
from the general allocation and setup steps.
v2 (Ben):
- Do not reconsider for X-tiling in intel_miptree_create()
as it was just forced to Y-tiling in miptree_create().
- Do not drop checks for allocation failures.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This makes the logic a little more explicit and helps to keep
subsequent patches easier to read.
Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Part of brw_try_draw_prims() is a check to validate textures
(brw_validate_textures()). In case of textures that currently have
only level zero but are marked for mipmap generation, i965 driver
will decide to replace the underlying buffer with a larger one
capable of holding also the additional levels. This results into
blit from the original buffer to the newly allocated (see
intel_miptree_copy_teximage()). This blit is currently handled with
blitter engine and hence it won't effect the ongoing draw operation.
However, this blit in turn may trigger color resolve on the source
buffer. In principle, this should be possible with fast cleared
buffers but I only started hitting it when I enabled lossless
compression (that reguires similar resolve to fast cleared buffers).
Now, the color resolve is a meta operation and uses the same drawing
path we are already in middle of. After quite a bit of debugging I
realized that the resolve will modify the current vbo setup but it
won't restore it afterwards resulting in the original draw call
using wrong vertex data.
When brw_try_draw_prims() gets called, the vbo logic in the Mesa
core (see vbo_draw_arrays()) has just bound the vbo (see
vbo_bind_arrays() and recalculate_input_bindings()). Color resolve
operation will overwrite the vbo setup by calling vbo_bind_arrays()
against the resolve rectangle (see brw_draw_rectlist()). Once the
color resolve is done the vbo setup is left to the resolve rectangle
state and the original drawing call yields bogus results.
This patch aims to restore the original state after the color
resolve by calling vbo_bind_arrays() yet again after the vertex
array state in the core context have been restored.
Now having said all this, I'd also like to state that I'm quite
uncomfortable with the nested meta operations. Ths original draw
call in this case is in fact a meta operation itself. It is a blit
from level zero to level one when generating the additional mipmap
levels (see _mesa_meta_GenerateMipmap()). Imagine the complexity
if the blit in the middle from buffer to another would go to meta
path also instead of blitter.
I would very tempted to try to move all the resolves to happen
before a meta operation is started.
Additionally I still feel that work I did earlier in the spring/
summer time moving meta operations to use direct state upload
bypassing the core context would make sense.
v2: Force input recalculation by setting the flag explicitly
v3: Do not attempt to restore vbo for opengles1 which doesn't
support vertex buffer objects.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Validation may kick off copies and subsequently color resolves.
Color resolves (and the copies themselves if ending up in meta path)
will overwrite the internal driver state but are not prepared to
restore it. Instead of adding that capability the validation can be
simply performed before the state is updated.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Setting brw->ctx.NewDriverState and brw->ctx.NewGLState affects
the dirty bits for the current pipeline. But, we need to flag
everything dirty on *both* pipelines, so that when we switch
back, we'll realize our programs are stale and re-upload them.
To accomplish this, flag the saved state for both pipelines.
Only one of them should matter, but this way we don't have to
check which we need to set. It's harmless to set the other.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93790
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Feedback from Khronos is that 'invariant' should be allowed on block
members for desktop OpenGL. Fix piglit regression added by fe1e89a0:
invariant-qualifier-in-out-block-01.vert
v2:
- Allow it for in/out blocks in OpenGL ES too, so when OES_shader_io_blocks
is supported we don't need to do any change (Timothy)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89330
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
I think this was just missed; Curro and I were probably writing
code simultaneously and forgot to combine them at the end.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We really need to stop pulling information directly out of shaders for
state setup. For one thing, if we want any sort of an on-disk shader
cache, having all of this metadata in one place is going to be crucial.
Also, passing it all through prog_data cleans up the compiler <-> state
setup API substantially.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's extremely FS specific so the fact that we have a stage check in the
middle of it is rather bogus. While were here, we rename
setup_payload_gen4 and setup_payload_gen6 to make it obvious that they are
both FS specific.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This 'words' parameter is there since 2011 but it has never been used.
While we are at it, get rid of the extern declaration.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Commit c98deb18d5 in 2010 disallowed embedded struct definitions
in ES. Then in 2013 d9bb8b7b56 disallowed it for everything but
GLSL 1.10.
Commit c98deb18d5 seemed the cleanest way to do the check so its
been extended to cover GL and the other version has been removed.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We no longer need to build any part of Mesa with Windows SDK 7.0.7600 or
MSVC 2008. MSVC 2013 will be the oldest we support.
In practice this means people are now free to declare variables in the
middle of blocks, on the whole Mesa tree.
Care should still be taken with variable length arrays and void pointer
arithmetic.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Hella-acked-by: Ian Romanick <ian.d.romanick@intel.com>
The indirect dispatch registers were whitelisted in command parser
version 5. (Version 5 is available as of Linux 4.4)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Broken by one of my cleanups. Spotted by luck.
Radeonsi doesn't care, because all shader create callbacks go to the same
function.
Reviewed-by: Brian Paul <brianp@vmware.com>
When NIR was originally drafted, there was no easy way to determine if
something was constant or not. The result was that we had lots of
special-casing for constant values such as this. Now that load_const
instructions are SSA-only, it's really easy to find constants and this
isn't really needed anymore.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robclark@gmail.com>
This fixes two issues. First, we had a use-after-free in the case where
the instruction got deleted and we tried to return mov->dest.write_mask.
Second, in the case where we are doing a self-mov of a register, we delete
those channels that are moved to themselves from the write-mask. This
means that those channels aren't reported as being handled even though they
are. We now stash off the write-mask before remove unneeded channels so
that they still get reported as handled.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94073
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
This fixes an assertion failure in [at least] one of the Unreal Engine Linux
demo/games that uses DXT1 compression. Specifically, the "Vehicle Game".
At some point, the game ends up trying to blit mip level whose size is 2x2,
which is smaller than a DXT1 block. As a result, the assertion in the blit path
is triggered. It should be safe to simply make sure we align the width and
height, which is sadly an example of compression being less efficient.
NOTE: The demo seems to work fine without the assert, and therefore release
builds of mesa wouldn't stumble over this. Perhaps there is some unnoticeable
corruption, but I had trouble spotting it.
Thanks to Jason for looking at my backtrace and figuring out what was going on.
v2: Use NPOT alignment to make sure ASTC is handled properly (Ilia)
Remove comment about how this doesn't fix other bugs, because it does.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93358
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.
In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions. The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.
Here's the problem scenario:
- Application calls a meta function that generates a name. The first
Gen will probably return 1.
- Application decides to use the same name for an object of the same
type without calling Gen. Many demo programs use names 1, 2, 3,
etc. without calling Gen.
- Application calls the meta function again, and the meta function
replaces the data. The application's data is lost, and the app
fails. Have fun debugging that.
Fixes piglit 'object-namespace-pollution glGetTexImage-compressed
renderbuffer' test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.
In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions. The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.
Here's the problem scenario:
- Application calls a meta function that generates a name. The first
Gen will probably return 1.
- Application decides to use the same name for an object of the same
type without calling Gen. Many demo programs use names 1, 2, 3,
etc. without calling Gen.
- Application calls the meta function again, and the meta function
replaces the data. The application's data is lost, and the app
fails. Have fun debugging that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Nothing left in meta does anything with the RBO binding, so we don't
need to save or restore it. The FBO binding is still modified.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This has the advantage that it does not pollute the global binding
state. It also enables later patches that will stop calling
_mesa_GenRenderbuffers / _mesa_CreateRenderbuffers which pollute the
renderbuffer namespace.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This has the advantage that it does not pollute the global binding
state. It also enables later patches that will stop calling
_mesa_GenRenderbuffers / _mesa_CreateRenderbuffers which pollute the
renderbuffer namespace.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Pulls the parts of renderbuffer_storage that aren't just parameter
validation out into a function that can be called from other parts of
Mesa (e.g., meta).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This function previously was only used in fbobject.c and contained a
bunch of API validation. Split the function into
framebuffer_renderbuffer that is static and contains the validation, and
_mesa_framebuffer_renderbuffer that is suitable for calling from
elsewhere in Mesa (e.g., meta).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The texture slot is expanded to 16 dwords containing 2 descriptors.
Those can be:
- Image and fmask, or
- Image and sampler state
By carefully choosing the locations, we can put all three into one slot,
with the fmask and sampler state being mutually exclusive.
This improves shaders in 2 ways:
- 2 user SGPRs are unused, shaders can use them as temporary registers now
- each pair of descriptors is always on the same cache line
v2: cosmetic changes: add back v8i32, don't load a sampler state & fmask
at the same time
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2: Clarify the relation between num_tiles_pipes and GB_TILE_MODE and the fix
needed for Tahiti as suggested by Marek.
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This avoids a possible NULL dereference because ureg_create() might
return a NULL pointer.
Spotted by coverity.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Two things were broken here:
- The depth/stencil surface dimensions were broken for MSAA.
- Sample count was programmed incorrectly.
Result was the depth resolve didn't work correctly on MSAA surfaces, and
so sampling the surface later produced garbage.
Fixes the new piglit test arb_texture_multisample-sample-depth, and
various artifacts in 'tesseract' with msaa=4 glineardepth=0.
Fixes freedesktop bug #76396.
Not observed any piglit regressions on Haswell.
v2: Just set brw_hiz_op_params::dst.num_samples rather than adding a
helper function (Ken).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
v3: moved the alignment needed for hiz+msaa to brw_blorp.cpp, as
suggested by Chad Versace (Alejandro Piñeiro on behalf of Chris
Forbes)
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The assertion is inside a condition mandating num_samples > 1 and
therefore the first half of the constraint is always met. The
second half in turn would only be applicable for single sampled
case and moreover it is trying to falsely check against surface
type instead of format.
Subsequent patches will introduce proper support for the lossless
compression and dropping this here makes the patches a little
simpler.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
This is no longer necessary...and it doesn't make much sense to
have inputs as destinations.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
gl_PointSize is delivered in the .w component of the VUE header, while
the language expects it to be a float (and thus in the .x component).
Previously, we emitted MOVs to copy it over to the .x component.
But this is silly - we can just use a .wwww swizzle and access it
without copying anything or clobbering the value stored at .x
(which admittedly is useless).
Removes the last use of ATTR destinations.
v2: Use BRW_SWIZZLE_WWWW, not SWIZZLE_WWWW (caught by GCC).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This patch re-implements the pre-Haswell VS attribute workarounds.
Instead of emitting shader code in the vec4 backend, we now simply
call a NIR pass to emit the necessary code.
This simplifies the vec4 backend. Beyond deleting code, it removes
the primary use of ATTR as a destination. It also eliminates the
requirement that the vec4 VS backend express the ATTR file in terms
of VERT_ATTRIB_* locations, giving us a bit more flexibility.
This approach is a little different: rather than munging the attributes
at the top, we emit code to fix them up when they're accessed. However,
we run the optimizer afterwards, so CSE should eliminate the redundant
math. It may even be able to fuse it with other calculations based on
the input value.
shader-db does not handle non-default NOS settings, so I have no
statistics about this patch.
Note that the scalar backend does not implement VS attribute
workarounds, as they are unnecessary on hardware which allows SIMD8 VS.
v2: Do one multiply for FIXED rescaling and select components from
either the original or scaled copy, rather than multiplying each
component separately (suggested by Matt Turner).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Use st->internal_target instead of PIPE_TEXTURE_2D when choosing the
texture format. Probably no real difference, but let's be consistent.
Simplify a test when determining whether we need normalized texcoords.
Add a new assertion.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Bitmaps may be drawn with a PIPE_TEXTURE_2D or PIPE_TEXTURE_RECT resource
as determined at context creation by checking if PIPE_CAP_NPOT_TEXTURES is
supported. But many places in the bitmap code were hard-coded to use
PIPE_TEXTURE_2D. Use st->internal_target instead.
I think an older NV chip is the only case where a gallium driver does not
support NPOT textures. Bitmap drawing was probably broken for that GPU.
Also, we only need one sampler state with texcoord normalization set up
according to st->internal_target.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The fast path for Intel's ReadPixels() unintentionally omits clipping
the specified area to a valid one. Rather than clip in various
corner-cases, perform this operation in the API validation stage.
The bug in intel_readpixels_tiled_memcpy() showed itself when the winsys
ReadBuffer's height was smaller than the one specified by ReadPixels().
yoffset became negative, which was an invalid input for tiled_to_linear().
v2: Move clipping to validation stage (Jason)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92193
Reported-by: Marta Löfstedt <marta.lofstedt@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These logical texture instructions can have a *lot* of sources. It's much
safer if we have symbolic names for them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit adds the capability to NIR to support separate textures and
samplers. As it currently stands, glsl_to_nir only sets the texture deref
and leaves the sampler deref alone as it did before and nir_lower_samplers
assumes this. Backends can still assume that they are combined and only
look at only at the texture index. Or, if they wish, they can assume that
they are separate because nir_lower_samplers, tgsi_to_nir, and prog_to_nir
all set both texture and sampler index whenever a sampler is required (the
two indices are the same in this case).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're about to separate the two concepts. When we do, the sampler will
become optional. Doing a rename first makes the separation a bit more
safe because drivers that depend on GLSL or TGSI behaviour will be fine to
just use the texture index all the time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bit 0 of the Patch Header is "TR DS Cache Disable". Setting that bit
disables the DS Cache for tessellator-output topologies resulting in
stitch-transition regions (but leaves it enabled for other cases).
We probably shouldn't leave this to chance - the URB could contain
garbage - which could result in the cache randomly being turned on
or off.
This patch makes the final EOT write 0 to the first DWord (which
only contains this one bit). This ensures the cache is always on.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Direct access to intr->const_index[n], where different slots have
different meanings, is somewhat confusing.
Instead, let's put some extra info in nir_intrinsic_infos[] about which
slots map to what, and add some get/set helpers. The helpers validate
that the field being accessed (base/writemask/etc) is applicable for the
intrinsic opc, for some extra safety. And nir_print can use this to
dump out decoded const_index fields.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If the only stage is MESA_SHADER_COMPUTE, we should complain that
there's nothing coming out of the geometry shader stage just as
we would if the first stage were MESA_SHADER_FRAGMENT.
Also, it's valid for tessellation shaders to be the stage producing
transform feedback varyings, so mention those in the compiler error.
Found by inspection.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This fixes FP16 conversion instructions for VI, which has 16-bit floats,
but not SI & CI, which can't disable denorms for those instructions.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It was partly a state and partly emulated by shader code, but since we want
to do this in a fragment shader prolog, we need to put it into the shader
key, which will be used to generate the prolog.
This also removes the spi_ps_input states and moves the registers
to the PS state.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
BCOLOR inputs were immediately after COLOR inputs. Thus, all following inputs
were offset by 1 if color_two_side was enabled, and not offset if it was not
enabled, which is a variation that's problematic if we want to have 1 variant
per shader and the variant doesn't care about color_two_side (that should be
handled by other bytecode attached at the beginning).
Instead, move BCOLOR inputs after all other inputs, so BCOLOR0 is at location
"num_inputs" if it's present. BCOLOR1 is next.
This also allows removing si_shader::nparam and
si_shader::ps_input_param_offset, which are useless now.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When glCallLists() is compiled into a display list, preserve the call
as a single glCallLists rather than 'n' glCallList calls. This will
matter for an upcoming display list optimization project.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Generate GL_INVALID_VALUE if n < 0. Return early if n==0 or lists==NULL.
v2: fix formatting, also check for lists==NULL.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Most apps don't use glBitmap so don't allocate the bitmap cache or
gallium state objects/shaders/etc until the first call to st_Bitmap().
v2: simplify a conditional, per Gustaw Smolarczyk.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Move setup/restoration of rendering state into helper functions.
This makes the draw_bitmap_quad() function much more concise.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Both st/mesa and i965 should return a true/false result now, and the
only other driver implementing queries (radeon) doesn't support
ARB_occlusion_query2 which added that pname.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This reduces code duplication. It also adds support for drivers where the
fragment position is a system value.
Suggested-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
These are used in GLSL IR to removed unused varyings and match
transform feedback variables. There is no need to use these in NIR.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The existing code was very hard to follow and has been the source
of at least 3 bugs in the past year.
The existing code also has a bug for SSO where if we have a
multi-stage SSO for example a tes -> gs program, if we try to use
transform feedback with gs the existing code would look for the
transform feedback varyings in the tes stage and fail as it can't
find them.
V2: Add more code comments, always try to remove unused inputs
to the first stage.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We really just needed to skip the existing ES < 3.1 check if we have
a compute shader, all other scenarios are already covered.
* No shaders is a link error.
* Geom or Tess without Vertex is a link error which means we always
require a Vertex shader and hence a Fragment shader.
* Finally a Compute shader linked with any other stage is a link error.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously an empty program would go through the entire
link_shaders() function and we would have to be careful
not to cause a segfault.
In core profile also now set link_status to false by
generating an error, it was previously set to true.
From Section 7.3 (PROGRAM OBJECTS) of the OpenGL 4.5 spec:
"Linking can fail for a variety of reasons as specified in the
OpenGL Shading Language Specification, as well as any of the
following reasons:
- No shader objects are attached to program."
V2: Only generate an error in core profile and add spec quote (Ian)
V3: generate error in ES too, remove previous check which was only
applying the rule to GL 4.5/ES 3.1 and above. My understand is that
this spec change is clarifying previously undefined behaviour and
therefore should be applied retrospectively. The ES CTS tests for
this are in ES 2 I suspect it was passing because it would have
generated an error for not having both a vertex and fragment shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Helps 11 shaders in UnrealEngine4 demos.
I seriously hope they would have given us bitfieldReverse() if we
exposed GL 4.0 (but we do expose ARB_gpu_shader5, so why not use that
anyway?).
instructions in affected programs: 4875 -> 4633 (-4.96%)
cycles in affected programs: 270516 -> 244516 (-9.61%)
I suspect there's a *lot* of room to improve nir_search/opt_algebraic's
handling of this. We'd actually like to match, e.g., step2 by matching
step1 once and then doing a pointer comparison for the second instance
of step1, but unfortunately we generate an enormous tuple for instead.
The .text size increases by 6.5% and the .data by 17.5%.
text data bss dec hex filename
22957 45224 0 68181 10a55 nir_libnir_la-nir_opt_algebraic.o
24461 53160 0 77621 12f35 nir_libnir_la-nir_opt_algebraic.o
I'd be happy to remove this if Unreal4 uses bitfieldReverse() if it is
in a GL 4.0 context once we expose GL 4.0.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The next patch adds an algebraic rule that uses the constant 0xff00ff00.
Without this change, the build fails with
return hex(struct.unpack('I', struct.pack('i', self.value))[0])
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
The hex() function handles integers of any size, and assigning a
negative value to an unsigned does what we want in C. The pack/unpack is
unnecessary (and as we see, buggy).
Reviewed-by: Dylan Baker <baker.dylan.c@gmail.com>
Walking the SSA definitions in order means that we consider the smallest
algebraic optimizations before larger optimizations. So if a smaller
rule is part of a larger rule, the smaller one will happen first,
preventing the larger one from happening.
instructions in affected programs: 32721 -> 32611 (-0.34%)
helped: 106
In programs whose nir_optimize loop count changes (129 of them):
before: 1164 optimization loops
after: 1071 optimization loops
Of the 129 affected, 16 programs' optimization loop counts increased.
Prevents regressions and annoyances in the next commits.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I don't know why, but we never hooked up this pass Eric wrote.
Otherwise, you can end up with stupid scalarized code such as:
vec4 ssa_7 = load_const (0.0, 0.0, 0.0, 0.0)
vec4 ssa_8 = ...
vec1 ssa_9 = feq ssa_8, ssa_7
vec1 ssa_10 = feq ssa_8.y, ssa_7.y
vec1 ssa_11 = feq ssa_8, ssa_7.z
vec1 ssa_12 = feq ssa_8.y, ssa_7.w
ssa_8.xyxy == <0, 0, 0, 0> should only take two feq instructions.
shader-db on Skylake:
total instructions in shared programs: 9121153 -> 9120749 (-0.00%)
instructions in affected programs: 32421 -> 32017 (-1.25%)
helped: 277
HURT: 69
total cycles in shared programs: 69003364 -> 69000912 (-0.00%)
cycles in affected programs: 899186 -> 896734 (-0.27%)
helped: 313
HURT: 403
This also prevents regressions when disabling channel expressions.
v2: Don't call opt_cse afterwards (requested by Matt). It should
happen in the optimization loop below anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The aim of this is to work towards removing UniformHash from the program
struct so that we don't need to hold onto it in memory and pass it around
outside the linker.
Reviewed-by: Dave Airlie <airlied@redhat.com>
There are never render target reads, so there are no scheduling hazards.
Giving the extra flexibility to the scheduler makes it possible to do
FB writes as soon as their sources are available, reducing register
pressure. It also makes it possible to do the payload setup for more
than one FB write message at a time, which could better hide latency.
shader-db results on Skylake:
total instructions in shared programs: 9110254 -> 9110211 (-0.00%)
instructions in affected programs: 2898 -> 2855 (-1.48%)
helped: 3
HURT: 0
LOST: 0
GAINED: 1
A reduction in instruction counts is surprising, but legitimate:
the three shaders helped were spilling, and reducing register
pressure allowed us to issue fewer spills/fills.
total cycles in shared programs: 69035108 -> 68928820 (-0.15%)
cycles in affected programs: 4412402 -> 4306114 (-2.41%)
helped: 4457
HURT: 213
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
reuse the sampler deref handling code to do the same
thing for atomics.
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The state tracker never handled this properly, and it finally
annoyed me for the second time so I decided to fix it properly.
This is inspired by the NIR sampler lowering code and I only realised
NIR seems to do its deref ordering different to GLSL at the last
minute, once I got that things got much easier.
it fixes a bunch of tests in
tests/spec/arb_gpu_shader5/execution/sampler_array_indexing/
v2: fix AoA tests when forced on.
I was right I didn't need all that code, fixing the AoA code
meant cleaning up a chunk of code I didn't like in the array
handling.
v3: start generalising the code a bit more for atomics.
v3.1: use UniformRemapTable
v4: handle uniforms differently using the param_index,
and go back to UniformStorage
fix issues identified by Timothy with deref handling.
v4.1: squash const fix and move handling 1D const out
of recursive function.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We have a requirement to store the index into the mesa parameterlist
for uniforms. Up until now we've overwritten var->data.location with
this info. However this then stops us accessing UniformStorage,
which is needed to do proper dereferencing.
Add a new variable to ir_variable to store this value in, and change
the two uses to use it correctly.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Its previous name was somewhat misleading, this really behaves like a
RW cache flush rather than an invalidation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The state cache is also L3-backed so it seems sensible to make sure
it's clean as we do for other RO caches before repartitioning the L3.
This wasn't part of my original L3 partitioning code because I was
able to reproduce hangs on Gen7 hardware when the state cache
invalidation happened asynchronously with previous 3D rendering, which
should no longer be possible after the previous change.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to split the stalling flush from the RO cache invalidation
into a different PIPE_CONTROL command to make sure that the top of the
pipe invalidation happens after any previous rendering is complete.
Otherwise it's possible for previous rendering to pollute the L3 cache
in the short window of time between RO invalidation and the completion
of the stalling flush. Fixes rendering artifacts on Unigine Heaven,
Metro Last Light Redux and Metro 2033 Redux.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93540
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93599
Tested-by: Darius Spitznagel <d.spitznagel@goodbytez.de>
Tested-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The SEL instruction with predication mode NONE emitted when the atomic
operation doesn't need to be predicated is a no-op and might rely on
undocumented hardware behaviour. Noticed by chance while looking at
the assembly output.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The errors.c file had grown quite large so split off this extension
code into its own file. This involved making a handful of functions
non-static.
Acked-by: Timothy Arceri <timothy.arceri@collabora.com>
This fixes a crash with bin/arb_clear_texture-base-formats and
probably some other tests which use clear_texture().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The builtin data can get released with a glReleaseShaderCompiler call.
We're careful everywhere to clone everything that comes out of builtins
except here, where we accidentally return the signature belonging to the
builtin version, rather than the locally-cloned one.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
Cc: mesa-stable@lists.freedesktop.org
The builtin function shader is part of the builtin state, released
when glReleaseShaderCompiler is called. We must ensure that the
builtins have been (re)initialized before attempting to link with the
builtin shader.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Tested-by: Rob Herring <robh@kernel.org>
Cc: mesa-stable@lists.freedesktop.org
All interface blocks will have been lowered by this point so just
use an assert. Returning false would have caused all sorts of
problems if they were not lowered yet and there is an assert to
catch this later anyway.
We also update the tests to reflect this change.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The vec4 backend, at the end, does this:
if (inst->is_3src()) {
for (int i = 0; i < 3; i++) {
if (inst->src[i].vstride == BRW_VERTICAL_STRIDE_0)
assert(brw_is_single_value_swizzle(inst->src[i].swizzle));
So make sure that we use the same conditions when trying to
copy-propagate. UNIFORMs will be converted to vstride 0 in
convert_to_hw_regs, but so will ATTRs when interleaved (as will happen
in a GS with multiple attributes). Since the vstride is not set at
copy-prop time, infer it by inspecting dispatch_mode and reject ATTRs if
they have non-scalar swizzles and are interleaved.
Fixes assertion errors in dolphin-generated geometry shaders (or
misrendering on opt builds) on Sandybridge or on IVB/HSW with
INTEL_DEBUG=nodualobj.
Co-authored-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93418
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
If you're worried about the duplication of some CAPs, we can remove them
later.
v2: add fields for memory eviction stats
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
GLsync objects had a race condition when used from multiple threads
(which is the main point of the extension, really); it could be
validated as a sync object at the beginning of the function, and then
deleted by another thread before use, causing crashes. Fix this by
changing all casts from GLsync to struct gl_sync_object to a new
function _mesa_get_and_ref_sync() that validates and increases
the refcount.
In a similar vein, validation itself uses _mesa_set_search(), which
requires synchronization -- it was called without a mutex held, causing
spurious error returns and other issues. Since _mesa_get_and_ref_sync()
now takes the shared context mutex, this problem is also resolved.
Fixes bug #92757, found while developing Nageru, my live video mixer
(due for release at FOSDEM 2016).
v2: Marek: silence warnings, fix declaration after code
Signed-off-by: Steinar H. Gunderson <sesse@google.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Yet another change motivated by AMD GPUPerfStudio compatibility. These groups
are not directly accessible from userspace, and AMD GPUPerfStudio does not
actually query them - it just requires them to be there. Hence, adding
a placeholder for now.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
This is yet another change motivated by appeasing AMD GPUPerfStudio's
hardcoding of performance counter group numbers.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
As documented in the comment, AMD GPUPerfStudio unfortunately hardcodes the
order of performance counter groups. Let's do the pragmatic thing and present
the same order as Catalyst/Crimson.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
This group was used by older versions of AMD GPUPerfStudio (via
AMD_performance_monitor) to identify the GPU family, and GPUPerfStudio
still complains when it isn't available.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Set R600_DEBUG=preoptir to dump the LLVM IR before optimization passes,
to allow diagnosing problems caused by optimization passes.
Note that in order to compile the resulting IR with llc, you will first
have to run at least the mem2reg pass, e.g.
opt -mem2reg -S < shader.ll | llc -march=amdgcn -mcpu=bonaire
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> (original patch)
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (w/ debug flag)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We can get rid of our reference immediately, since the driver will hold
onto it for us.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
While rather unlikely, uploads _can_ fail. Doing them earlier means
we'll have to restore less state when they do fail, and it's slightly
easier to check the restore code.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously the framebuffer default sample count was taken directly
from the value given by the application. On the i965 driver on HSW if
the value wasn't one that is supported by the hardware it would hit an
assert when it tried to program the state for it. This patch fixes it
by adding a derived sample count to the state for the default
framebuffer. The driver can then quantize this to one of the valid
values in its UpdateState handler when the _NEW_BUFFERS state changes.
_mesa_geometric_samples is changed to use the new derived value.
Fixes the piglit test arb_framebuffer_no_attachments-query
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93957
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixup to commit 03b3eb90d - the number of buffers could be larger than
the number of elements, in which case we'd pass a negative argument to
PUSH_SPACE, which would be bad. While we're at it, merge it with the
other PUSH_SPACE at the top of the function.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
The spill logic will insert convert ops when moving between files. It
seems like the emission logic wasn't quite ready for these converts.
Tested on fermi, and visually looked at nvdisasm output for maxwell.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Use align_free to free memory allocated
with align_malloc.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
The color inputs must automatically use centroid whether
multisampling is used or not.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
sem.reg.mod & NINED3DSPDM_CENTROID is worth 4 when
centroid is requested, whereas
TGSI_INTERPOLATE_LOC_CENTROID is worth 1.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This enables to use fast clears in the following
case:
pixel shader renders to 1 RT
4 RT bound
clear
new pixel shader bound that renders to 4 RTs
Previously the fast clear path wouldn't be hit,
because when trying the fast clear path,
the framebuffer state would be configured for 1 RT,
instead of 4.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Some docs say linear filtering is always used when
app does shadow mapping.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Respect block alignment for ATI1/ATI2 format when trying to lock a
surface using LockRect().
Fixes failing WINE tests device.c test_surface_blocks() tests.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Testing Win behaviour seems to show wrong states
are accepted, but then depending on the states
some specific 'good' behaviours happen.
This adds some validation to catch invalid
states and have these 'good' behaviours
when it happens.
Also reorders SetRenderState to match the expected
optimisation:
(Value == previous Value) => return immediately,
which affects D3D9 hacks too.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Add config option override_vendorid to report a fake card in d3dadapter9 drm.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Implement a device private memory counter similar to Win 7.
Only textures and surfaces increment vidmem and may return
ERR_OUTOFVIDEOMEMORY. Vertexbuffers and indexbuffers creation always
succeedes, even when out of video memory.
Fixes "Vampire: The Masquerade - Bloodlines" allocating resources until crash.
Fixes "Age of Conan" allocating resources until crash.
Fixes failing WINE test device.c test_vidmem_accounting().
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Apps can know if the window is occluded by checking for
specific error messages. The behaviour is different
for Device9 and Device9Ex.
This allow games to release the mouse and stop rendering
until the focus is restored.
In case of multiple swapchain we do care only of the device one.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
To keep compatible with older ID3DPresent interfaces (used to talk
with Wine), store the minor version num accessible to all
statetracker functions (in the NineDevice9 structure).
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
flush_resource needs to be called before flush (for
fast clear resolve, etc).
Removes useless computation of resource (it is
already set correctly).
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Return D3DERR_INVALIDCALL instead of E_POINTER.
On error set ppBackBuffer to NULL.
Multiple swapchains can only be created in windowed mode as
windowed swapchain.
Set backbuffer to NULL in NineDevice9_GetBackBuffer, but not
in NineSwapChain9_GetBackBuffer.
This fixes all WINE's device.c test_swapchain() tests.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
When no window is specified, we should revert to the focus window.
This deserves more tests however (what if the device swapchain is
already using the focus window ?)
Fixes crash for FFXIV
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
In case swapchain creation fails This->swapchains[i] might be NULL and
causes a crash.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Return errors in case of invalid presentation parameters.
Fixes failing WINE tests device.c test_swapchain_parameters().
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Store a copy of GUID in the header that is under our control and use it
as key for the hashtable instead of using the application provided pointer.
The application might change the memory after leaving the function.
Fixes a crash for issue https://github.com/iXit/Mesa-3D/issues/130
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
To ease debugging print the GUID instead of the pointer to it.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
The values of box.z and box.depth weren't set and lead to a crash.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Tests show in case of multisample mismatch between the depth-stencil
buffer and the render target, then it is not cleared.
Fixes failing WINE test visual.c test_multisample_mismatch().
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Fixes crash for non-square textures.
We were using the height instead of the
width for some calculations.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Add support for D3DFMT_R8G8B8. It allows format conversion for
surfaces of pool scratch.
Usually gallium formats equivalents for d3d9 formats
have their names reversed.
The gallium format PIPE_FORMAT_R8G8B8_UNORM is the right
equivalent here, and its name is likely wrong (reversed).
Fixes a crash in TmNationsForever.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Shade mode flat is only working if pixelshaders have interpolate
set to TGSI_INTERPOLATE_COLOR on color inputs.
Fixes failing WINE tests visual.c test_shademode().
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Report success instead of failing as there's no resource for those surfaces.
Fixes a crash in Crysis: Warhead.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Previous vertex elements code update
was protected by
'if ((group & (NINE_STATE_VDECL | NINE_STATE_VS)) ||
state->changed.stream_freq & ~1)'
itself protected by
'if (group & (NINE_STATE_COMMON | NINE_STATE_VS))'
If no state is changed except the stream frequency,
no update would happen.
This patch solves the problem by adding a new
NINE_STATE_STREAMFREQ state.
Another way would be to add state->changed.stream_freq & ~1
check to the main test.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Some apps do redundant SetStreamSourceFreq calls.
Catch them to improve performance.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
The indexbuffer9 codebase was lagging behind the one of vertexbuffer9.
Add buffer9 as common code base for indexbuffer9 and vertexbuffer9.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
This seems cleaner to actually reference the resources for vtxbuf,
rather than relying on the fact the bound d3d streams do.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
When an application sets a vertex shader, we are supposed
to use it, and when no vertex shader are set, we are supposed
to revert to fixed function vertex shader.
It seems there is an exception: when the vertex declaration
has a position_t index, we should revert to fixed function
vertex shader.
Up to know we were checking if device->state.vs is set
to know whether to use programmable shader or not.
With this commit we determine whether we use programmable shader
or not when vertex shader/declaration are set, but
stateblocks do complicate things a bit.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
NineUnknown_ctor increments the refcount even in case of an error.
Restructure the code to prevent refcount increments.
Fixes a couple of wine tests.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Textures in SYSTEMMEM don't have resources attached.
Instead of returning an error for them, StretchRect
was crashing.
This changes the check order to fix that case.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
The last weighted element is one minus the sum of all previous weights.
Fixes WINE test visual.c test_vertex_blending.
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
In case of non local viewer the value has to be subtracted.
Fixes failing WINE tests in test_specular_lighting() (visual.c)
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Implement fixed function D3DRS_SPECULARENABLE.
Fixes failing WINE tests in test_specular_lighting() (visual.c)
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
vs1.1 rounds a0 to lowest integer, while
other versions do round to closest.
To use the same path as the other versions (with ARR),
we were substracting 0.5 for vs1.1 to get round to lowest.
This gives wrong result if a0 is set to 0:
round(0 - 0.5) = -1
Instead just use ARL for vs1.1
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
The documentation of the flag doesn't make sense.
To sum up the doc, if not set, specular alpha contains fog,
and if set specular alpha contains 0 (except for ff).
However in practice when the flag is there, apps do use specular alpha
as if it could be used normally, which makes much more sense than the doc.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Based on a gallivm patch by Ilia Mirkin.
+8 piglit regressions due to precision issues (I blame the tests)
The benefit is that we'll get v_cvt_f32_f16 and v_cvt_f16_f32 instead
of emulation with integer instructions. They are GLSL 4.00 intrinsics.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
No instruction counts changed, but:
total cycles in shared programs: 64834502 -> 64781530 (-0.08%)
cycles in affected programs: 16331544 -> 16278572 (-0.32%)
helped: 4757
HURT: 4288
GAINED: 66
LOST: 20
I remember trying this when I first wrote the pass, but it wasn't
helpful at the time.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We must fetch all sources into the instruction stream before generating
the instruction that uses them. Otherwise we'll define values after
using them, which won't work so well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The EXT spec has been updated to:
- logically combine the es2_profile and es_profile exts
- allow any legal version to be requested
dEQP tests request a specific ES version when using GLX, so this allows
dEQP upstream to run against GLX with the appropriate X server patch
(which had similar disabling logic).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Adam Jackson <ajax@redhat.com> (v3)
v1 -> v2:
- distinguish between DRI_API_GLES{,2,3}
- add GLX_EXT_create_context_es_profile client-side support
v2 -> v3:
- fix error in computing mask
While this is the default, private .emacs files might have it set to
something else. No harm in forcing it to 0.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
To prevent build failures when a large patch series is committed, like
happened in https://ci.appveyor.com/project/jrfonseca-fdo/mesa/build/322
due to 10 commits between dac2964f3e and
6f428328d3 where submitted before the
build slave started the git clone.
100 commits should be bigger than any patch series seen in practice, and
it takes practically the same time to download as 5 commits.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We seem to end up w/ duplication between compiler/Makefile.sources and
compiler/glsl/Makefile.sources. The latter appears unused. Delete it.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This parameter is equivalent to the corresponding OpenGL implementation
limit which is in texels, not bytes.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is already used internally in si_resource_copy_region for compressed
textures, so the only real change here is the adjusted surface size
computation.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
We will write our own version of texsubimage for PBO uploads, and we will
want to call that here as well.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Use instancing to generate two triangles for each destination layer and use
a geometry shader to route the layer index.
v2:
- directly write layer in VS if supported by the driver (Marek Olšák)
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Create a PIPE_BUFFER sampler view on the pixel-unpack buffer, and draw
the image on the texture with a fragment shader that maps fragment
coordinates to buffer coordinates.
Modifications by Nicolai Hähnle:
- various cleanups and fixes (e.g. error handling, corner cases)
- split try_pbo_upload into two functions, which will allow code to be
shared with compressed texture uploads
- modify the source format selection to only test for support against
the PIPE_BUFFER target
v2:
- update handling of TGSI_SEMANTIC_POSITION for recent changes in master
- MaxTextureBufferSize is number of texels, not bytes (Ilia Mirkin)
- only enable when integers are supported (Marek Olšák)
- try harder to hit the TextureBufferOffsetAlignment
- remove unnecessary MOV from the fragment shader
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
We need to tell the address generation functions about the dimensionality of
the texture to correctly implement the part of Section 3.8.1 (Texture Image
Specification) of the OpenGL 2.1 specification which says:
"For the purposes of decoding the texture image, TexImage2D is
equivalent to calling TexImage3D with corresponding arguments
and depth of 1, except that
...
* UNPACK SKIP IMAGES is ignored."
Fixes a low impact bug that was found by chance while browsing the spec and
extending piglit tests.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
This cap indicates whether pipe->create_surface can reinterpret a texture
as a surface with a format of different block width/height (but equal
block size).
v2: fix whitespace
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
This cap indicates that the driver only supports R, RG, RGB and RGBA
formats for PIPE_BUFFER sampler views.
v2: move into "unsupported features" section for nouveau (Ilia Mirkin)
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
When set to a truish value, this globally disables the minmax cache for all
buffer objects.
No #ifdef DEBUG guards because this option can be interesting for
benchmarking.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When applications stream their index buffers, the caches for those BOs become
useless and add overhead, so we want to disable them. The tricky part is
coming up with the right heuristic for *when* to disable them.
The first question is which hit rate to aim for. Since I'm not aware of any
interesting borderline applications that do something like "draw two or three
times for each upload", I just kept it simple.
The second question is how soon we should give up on the caching. Applications
might have a warm-up phase where they fill a buffer gradually but then keep
reusing it. For this reason, I count the number of indices that hit and miss
(instead of the number of calls that hit or miss), since comparing that to
the size of the buffer makes sense.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Some games developers are unaware that an index buffer in a VBO still needs
to be read by the CPU if some varying data comes from a user pointer (unless
glDrawRangeElements and friends are used). This is particularly bad when
they tell us that the index buffer should live in VRAM.
This cache helps, e.g. lifting This War Of Mine (a particularly bad
offender) from under 10fps to slightly over 20fps on a Carrizo.
Note that there is nothing prohibiting a user from rendering from multiple
threads simultaneously with the same index buffer, hence the locking. (The
internal buffer map taken for the buffer still leads to a race, but at least
the locks are a move in the right direction.)
v2: disable the cache on USAGE_TEXTURE_BUFFER as well (Chris Forbes)
v3:
- use bool instead of GLboolean for MinMaxCacheDirty (Ian Romanick)
- replace the sticky USAGE_PERSISTENT_WRITE_MAP bit by a direct
AccessFlags check
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> (v2)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We will add more code for caching/memoization. Moving the existing code
into its own file helps keep things modular.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Note that the conversion of the clear data (when data != NULL) can fail due
to an out of memory condition, but it does not check any error conditions
mandated by the spec. Therefore, it is safe to skip when size == 0.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We will want to disable minmax index caching for buffers that are used in this
way.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We will want to disable minmax index caching for buffers that are used in this
way.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The scaling list should be filled out with zig zag scan
v2: integrate zig zag scan for list 4x4 to vl(Christian)
v3: move list determination out from the loop(Ilia)
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
GEN8_SURFACE_AUX_MODE_NONE is 0, so this is a no-op.
Yet, this also makes it clear that we can compare aux_mode to the
other GEN8_SURFACE_AUX_MODE_ values. We will want to compare to
GEN8_SURFACE_AUX_MODE_HIZ.
v2: Some very minor cherry-pick conflicts due to moving it around in the series.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Whether multisampling is turned on depends, in part, on whether
attachments are themselves multisample surfaces. However when there are
no attachments, we should rely on the default geometry for this.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes dEQP-GLES31.functional.fbo.completeness.no_attachments
When the width or height are 0, the framebuffer is incomplete. We may
also not have been passing the new state down to the driver when the
widths/heights/etc changed. Make sure to dirty the state so that the
framebuffer state is revalidated at draw time.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes it.
States which also need to be taken into account:
- SPI color formats - each down-conversion format supports only a limited set
of SPI formats
- whether MSAA resolving and logic op are enabled
These need special handling:
- blending
- disabled channels
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The motivation is to simplify the Stoney RB+ code.
Intensity is already treated as red except here.
No piglit regressions.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The equivalent of the last patch for the hash table. I'm not aware of
any issues this fixes.
v2:
- use entry_is_deleted (Timothy)
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
When we delete entries in the hash set, we mark them "deleted" by
setting their key to the deleted_key, which points to a dummy
deleted_key_value. When searching for an entry, we normally skip over
those, but set_add() had some code for searching for duplicate entries
which forgot to skip over deleted entries. This led to a segfault inside
the NIR vectorization pass, since its key comparison function
interpreted the memory where deleted_key_value resides as a pointer and
tried to dereference it.
v2:
- add better commit message (Timothy)
- use entry_is_deleted (Timothy)
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
This reverts commit ab30426e33.
Apparently the memory isn't quite as aligned when this gets called
as it should be, causing crashes. (Albeit this looks independent
from this code, should crash just as well if ssse3 is enabled when
compiling without this patch.)
https://bugs.freedesktop.org/show_bug.cgi?id=93962
Add support for these opcodes, the conversion functions were already
there albeit need some new packing stuff.
Just like the tgsi version, piglit won't like it for all the same
reasons, so it's disabled (UP2H passes piglit arb_shader_language_packing
tests, albeit since PK2H won't due to those rounding differences I don't
know if that one works or not as the piglit test is rather difficult to
deal with).
Reviewed-by: Brian Paul <brianp@vmware.com>
Add support for these opcodes, the conversion functions were already
there albeit need some new packing stuff.
Just like the tgsi version, piglit won't like it for all the same
reasons, so it's disabled (UP2H passes piglit arb_shader_language_packing
tests, albeit since PK2H won't due those rounding differences I don't
know if that one works or not as the piglit test is rather difficult to
deal with).
The util functions handle the half-float conversion.
Note that piglit won't like it much due to:
a) The util functions use magic float mul conversion but when run inside
softpipe/llvmpipe, denorms are flushed to zero, therefore when the conversion
is from/to f16 denorm the result will be zero. This is a bug which should be
fixed in these functions (should not rely on denorms being available), but
will happen elsewhere just the same (e.g. conversion to f16 render targets).
b) The util functions use trunc round mode rather than round-to-nearest. This
is NOT a bug (as it is a d3d10 requirement). This will result of rounding not
representable finite values to MAX_F16 rather than INFINITY. My belief is the
piglit tests are wrong here but it's difficult to tell (generally glsl
rounding mode is undefined, however I'm not sure if rounding mode might need
to be consistent for different operations). Nevertheless, for gl it would be
better to use round-to-nearest, but using different rounding for GL and d3d10
is an unsolved problem (as it affects things like conversion to f16 render
targets, clear colors, this shader opcode).
Hence for now don't enable the cap bit (so the code is unused).
(Code is from imirkin, comment from sroland)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmvware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
If the tri is fully inside a scissor edge (or rather, we just use the
bounding box of the tri for the comparison), then we can drop these
additional scissor "planes" early. We do not even need to allocate
space for them in the tri.
The math actually appears to be slightly iffy due to bounding boxes
being rounded, but it doesn't matter in the end.
Those scissor rects are costly - the 4 planes from the scissor are
already more expensive to calculate than the 3 planes from the tri itself,
and it also prevents us from using the specialized raster code for small
tris.
This helps openarena performance by about 8% or so. Of course, it helps
there that while openarena often enables scissoring (and even moves the
scissor rect around) I have not seen a single tri actually hit the
scissor rect, ever.
v2: drop individual scissor edges, and do it earlier, not even allocating
space for them.
v3: help the compiler a bit with simpler code, suggested by Brian.
Reviewed-by: Brian Paul <brianp@vmware.com>
When we switched to 64bit rasterization, we could no longer use straight
aligned loads for loading the plane data. However, what the code actually
does for loading 3 planes, is 12 scalar loads + 9 unpacks, and then there's
another 8 unpacks for the transpose we need (!).
It would be possible to do the (scalar) loads of course already transposed
(at least saving the additional unpacks), however instead just use
(un)aligned vector loads, and recalculate the eo values, which is much less
instructions (note in case of the triangle_32_3_4 case, the eo values are
not even used, making the scalar loads + unpacks for them all the more
pointless).
This drops execution time of the triangle_32_3_4 function considerably,
albeit it doesn't really make a measurable difference (for small tris we're
essentially limited by vertex throughput in any case), for triangle_32_3_16
it's essentially noise (the loop is more costly than the initial code there).
(I'm thinking about just ditching storing the eo values in the plane data,
so could switch back to using aligned planes, however right now they are
still used in the other raster functions dealing with planes with scalar
code. Also not touching the ppc code, might not be that bad there in any
case.)
Reviewed-by: Brian Paul <brianp@vmware.com>
The existing code used ssse3, and because it isn't compiled in a separate
file compiled with that, it is usually not used (that, of course, could
be fixed...), whereas sse2 is always present at least with 64bit builds.
This should be pretty much as fast as the pshufb version, albeit those
code paths aren't really used on chips without llc in any case.
v2: fix andnot argument order, add comments
v3: use pshuflw/hw instead of shifts (suggested by Matt Turner), cut comments
Reviewed-by: Matt Turner <mattst88@gmail.com>
Enabling swrast on Android causes a link error because vtest is missing.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The virgl reference counting of buffers is broken for prime fd buffers.
Each prime fd passed into virgl_drm_winsys_resource_create_handle creates
a new resource. The solution requires creating a separate hash table to
track flink names separately from prime handles.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
It is necessary to share the screen between mesa and gralloc to
properly ref count resources. This implements a hash lookup on
the file description to re-use an already created screen. This is
a similar implementation as freedreno and radeon.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The change is necessary to avoid the following building error in android:
external/mesa/src/gallium/drivers/nouveau/nouveau_vp3_video_bsp.c: In function 'nouveau_vp3_bsp_next':
external/mesa/src/gallium/drivers/nouveau/nouveau_vp3_video_bsp.c:269:14: error: 'bsp_bo' undeclared (first use in this function)
assert(bsp_bo->size >= str_bsp->w0[0] + num_bytes[i]);
^
This matches the declaration of the variables in question.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
We use this logic to detect live ranges and then do plain renaming
across the whole codebase. As such, to prevent WaW hazards, we have to
treat a write as if it were also a read.
For example, the following sequence was observed before this patch:
13: UIF TEMP[6].xxxx :0
14: ADD TEMP[6].x, CONST[6].xxxx, -IN[3].yyyy
15: RCP TEMP[7].x, TEMP[3].xxxx
16: MUL TEMP[3].x, TEMP[6].xxxx, TEMP[7].xxxx
17: ADD TEMP[6].x, CONST[7].xxxx, -IN[3].yyyy
18: RCP TEMP[7].x, TEMP[3].xxxx
19: MUL TEMP[4].x, TEMP[6].xxxx, TEMP[7].xxxx
While after this patch it becomes:
13: UIF TEMP[7].xxxx :0
14: ADD TEMP[7].x, CONST[6].xxxx, -IN[3].yyyy
15: RCP TEMP[8].x, TEMP[3].xxxx
16: MUL TEMP[4].x, TEMP[7].xxxx, TEMP[8].xxxx
17: ADD TEMP[7].x, CONST[7].xxxx, -IN[3].yyyy
18: RCP TEMP[8].x, TEMP[3].xxxx
19: MUL TEMP[5].x, TEMP[7].xxxx, TEMP[8].xxxx
Most importantly note that in the first example, the second RCP is done
on the result of the MUL while in the second, the second RCP should have
the same value as the first. Looking at the GLSL source, it is apparent
that both of the RCP's should have had the same source.
Looking at what's going on, the GLSL looks something like
float tmin_8;
float tmin_10;
tmin_10 = tmin_8;
... lots of code ...
tmin_8 = tmpvar_17;
... more code that never looks at tmin_8 ...
And so we end up with a last_read somewhere at the beginning, and a
first_write somewhere at the bottom. For some reason DCE doesn't remove
it, but even if that were fixed, DCE doesn't handle 100% of cases, esp
including loops.
With the last_read somewhere high up, we overwrite the previously
correct (and large) last_read with a low one, and then proceed to decide
to merge all kinds of junk onto this temp. Even if that weren't the
case, and there were just some writes after the last read, then we might
still overwrite a merged value with one of those.
As a result, we should treat a write as a last_read for the purpose of
determining the live range.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
And mark nir_op_pack_uvec4_to_uint unreachable, since it's only produced
by lowering pack[SU]norm4x8 which the vec4 backend does not need.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
A future patch will want to use designated initalizers, which aren't
available in C++, but this is C.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Enable GL_OES_geometry_shader enums for OpenGL ES 3.1.
V4: EXTRA tokens updated according to comments from Ilia Mirkin.
V5: Account for check_extra does not evaluate "or" lazy. Fix issues
with EXTRA_EXT_FB_NO_ATTACH_CS.
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This happens especially with exports and varying packing, where the last
bits aren't always filled in. We end up trying to do quad-wide stores,
which ends up being a lot of register moves that carefully preserve the
nop value. Instead don't do the stores.
total instructions in shared programs : 6131375 -> 6125267 (-0.10%)
total gprs used in shared programs : 910139 -> 895501 (-1.61%)
total local used in shared programs : 15328 -> 15328 (0.00%)
local gpr inst
helped 0 7442 4693
hurt 0 90 2687
Most of the helped/hurt instruction changes are by one or two ops
because can no longer do quad-wide stores in all cases.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
If an instruction has multiple defs, we have to do a lot more checks to
make sure that we can move it forward. Among other things, various code
likes to do
a, b = tex()
if () c = a
else c = b
which means that a single phi node will have results pointing at the
same instruction. We obviously can't propagate the tex in this case, but
properly accounting for this situation is tricky. Just don't try for
instructions with multiple defs.
This fixes about 20 shaders in shader-db, including the dolphin efb2ram
shader.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
It appears that the nvidia render engine is quite picky when it comes to
linear surfaces. It doesn't like non-256-byte aligned offsets, and
apparently doesn't even do non-256-byte strides.
This makes arb_clear_buffer_object-unaligned pass on both nv50 and nvc0.
As a side-effect this also allows RGB32 clears to work via GPU data
upload instead of synchronizing the buffer to the CPU (nvc0 only).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> # tested on GF108, GT215
Tested-by: Nick Sarnie <commendsarnex@gmail.com> # GK208
Cc: mesa-stable@lists.freedesktop.org
Since we emulate clip-planes, the clip-vertex is used within the VS
itself (thanks to nir_lower_clip). So just ignore it as a VS output.
Fixes a boatload of piglit tests that were asserting on unknown
varying slot.
(Also unrelated spelling/typo fix.)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
With glsl_to_nir we end up with local variables, instead of global, for
arrays.
Note that we'll eventually have to do something more clever, I think,
when we support multiple functions, but that will probably take some
work in a few places.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
With tgsi_to_nir we get this as a normal input with VARYING_SLOT_FACE.
But glsl_to_nir plus nir_lower_system_values this becomes an intrinsic.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
When using the "shared" vertex array configuration strategy, we bind
each of the buffers as a separate array. However there can be holes in
such vertex buffer lists, so just emit a disable for those.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Teach the emitter that the two registers are sequential, and drop the
second arg entirely, in favor of a double-wide first argument.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This largely leaves the existing image logic alone. When image support
is added this will have to be harmonized somehow.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Issue a MEM_BARRIER. No idea if this is sufficient. As there are no
tests for this, it'll have to do for now.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
(address, length) pairs are uploaded to the driver constbuf as well to
make these values available to the shaders.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This makes PROGRAM_IMMEDIATE a first-class gl_register_file type, and
adds PROGRAM_BUFFER to the list. These are used purely inside
glsl_to_tgsi conversion.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Currently any access params (coherent/volatile/restrict) are being lost
when lowering to the ssbo load/store intrinsics. Keep track of the
variable being used, and bake its access params in as the last arg of
the load/store intrinsics.
If the variable is accessed via an instance block, then 'variable'
points to the instance block variable and not the field inside the
instance block that we are accessing. In order to check access
parameters for the field itself we need to detect this case and keep
track of the corresponding field struct so we can extract the specific
field access information from there instead.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
v1 -> v2: add tracking of struct field
v2 -> v3: minor adjustments based on Iago's feedback
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Interfaces can have image properties set in case they are buffer
interfaces. Make sure not to lose this information.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In particular, AMDGPU_GEM_CREATE_CPU_GTT_USWC can affect even BOs created
in VRAM if they get evicted to GTT. In general there's no need to
restrict any of the flags to any particular domains.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Very modest effect, but it's clearly the right thing to do.
total instructions in shared programs : 6131491 -> 6131398 (-0.00%)
total gprs used in shared programs : 910157 -> 910131 (-0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)
local gpr inst bytes
helped 0 55 85 85
hurt 0 26 20 20
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Following shader-db results on GK110:
total instructions in shared programs : 6141510 -> 6131491 (-0.16%)
total gprs used in shared programs : 910187 -> 910157 (-0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)
local gpr inst bytes
helped 0 18 821 821
hurt 0 0 0 0
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
ARB_gpu_shader_fp64 spec says:
"This extension does not support interpolation of double-precision
values; doubles used as fragment shader inputs must be qualified as
"flat"."
Fixes the regressions added by commit 781d278:
arb_gpu_shader_fp64-double-gettransformfeedbackvarying
arb_gpu_shader_fp64-tf-interleaved
arb_gpu_shader_fp64-tf-interleaved-aligned
arb_gpu_shader_fp64-tf-separate
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93878
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Just make sure that after we've submitted, we get to at least 5
(global) submits ago before we go on to do more. Prevents up to
seconds of lag with window movement in X with xcompmgr -c. There may
be useful tuning to do in the future, but for now this gets us
usability.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
On an error return, the returned seqno will probably be unset, so we'd
lose track of what we've submitted so far for waiting on in the
future.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
When RA fails, and we spill, we have to clean everything up before doing
RA again. We were forgetting to reset the hi/lo linked lists - at
least the hi list is guaranteed to still have pointers to now-deleted
RIG nodes.
Signed-off-by: Karol Herbst <nouveau@karolherbst.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
v2: drop inline keyword
drop radeon_llvm_dispose_kernel_module wrapper
v3: move definitions to .c file
use in radeonsi
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The addition of spi_shader_col_format killed all color outputs
in precompiled shaders.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
v2: also set the alpha func (trivial)
For now this will be enabled in tandem with GL_OES_geometry_shader.
Should a driver come along that wants to separate them out, another
enable can be added.
Also adds the missed GL_OES_geometry_shader define in glcpp.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
At a later stage we might want to split out the NIR specific [XXX:
which one was it], as to make things move obvious and rename the files
appropriately. This patch aims to split it out of nir.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Currently it's an empty library, although it'll be used to store common
code between GLSL and NIR that is compiler specific (rather than generic
as the one in src/util).
XXX: strictly speaking we could add a python/mako parser to generate the
relevant files instead including builtin_type_macros.h in such a manner.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
This currently just writes out the name of dump files, which can be useful
to easily correlate those files with other log outputs (driver debug output,
apitrace calls, etc.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This changes the default behavior of 'always' mode to be consistent with
hang detection mode.
I have used this to more easily compare dumped command streams using diff.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The new name for the intrinsic was introduced in LLVM r258558.
v2: use ternary operator instead of preprocessor
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When setting the conservative thread counts, I halved everything. That isn't
correct for the wm, which has nothing to do with actual thread counts. I suck.
BXT only has 1 slice, and there is some ambiguity about subslices, so just
reserve the max possible for now. It looks like this might fix:
piglit.spec.glsl-1_50.execution.variable-indexing.gs-output-array-vec4-index-wr.bxtm64.
I kind of question why that is, but it is what Jenkins says.
Mark is current running some of the other blacklisted tests on this patch. (it
effects anything requiring scratch space).
Cc: mesa-stable <mesa-stable@lists.freedesktop.org>
Cc: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
_mesa_texture_parameteriv is used because (the more obvious)
_mesa_texture_parameteri just stuffs the parameter in an array and calls
_mesa_texture_parameteriv. This just cuts out the middleman.
As a side bonus we no longer need check that ARB_stencil_texturing is
supported. The test doesn't allow non-supporting implementations to
avoid any work, and it's redundant with the value-changed test.
Fix bug #93717 because the state restore commands at the bottom of
_mesa_meta_GenerateMipmap no longer depend on the bound state.
Fixes piglit arb_direct_state_access-generatetexturemipmap with the
changes recently sent to the piglit mailing list. See the bugzilla
entry for more info.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93717
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Commit c246828c added the code to save and restore the stencil
texturing mode. The restore, however, was erroneously inside the
'target != GL_TEXTURE_RECTANGLE' block.
Fixes piglit test 'arb_stencil_texturing-blit_corrupts_state
GL_TEXTURE_RECTANGLE'.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Commit 055093e removed the call to _mesa_meta_in_progress, and meta.h
has not been necessary in src/mesa/main/enable.c since.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The key for a geometry shader would be interpreted as the key for a vertex
shader further down the line, which really doesn't make sense.
This does not affect the contents of shader->key because geometry shaders
don't have any key entries anyway.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It is only used during shader creation now, so no need to keep it around
afterwards.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We now have an explicit parameter that contains the same information, and
this will allow us to get rid of is_gs_copy_shader in the si_shader struct.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Specifically, when the API switches from using a GS to not using a GS and then
back to using the same GS again, we do not have to re-send all the GS state,
but we do have to send VGT_GS_MODE. So make VGT_GS_MODE consistently be a part
of the VS state.
This fixes a rendering bug in Dolphin, but surely other applications are
affected as well.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93648
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This reverts commit 67e3098703.
It breaks a bunch of geometry shader tests, such as "spec@!opengl 3.2@minmax"
and others depending on the glGet queries.
In the old hand-writen implementation of atan2, the calculation of
atan(y/x) was performed conditionally in the "then" block of the
outermost if statement. I believe I accidentally lifted this out
into unconditional code when converting to IR builder.
For reference, the original hand-written IR is visible in commit
722eff674b.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: Erik Faye-Lund <kusmabite@gmail.com>
OpenGL's dual color blending feature was specified so that an
implementation could support both multiple render targets (MRT) and
dual source blending. Fragment shader outputs specify both "location"
(the render target number) and "index" (either color 0 or 1).
I believe DirectX only has the notion of "location" - if using dual
color blending, location 0 or 1 will specify the operands. If not,
then location means the render target index. The two features can't
be used together.
As such, some applications mistakenly try to use <loc = 0, index = 0>
and <loc = 1, index = 0> in a shader used for dual color blending with
a single render target, rather than the correct <loc = 0, index = 0>
and <loc = 0, index = 1>.
In particular, Unigine Heaven 4.0 and Valley 1.0 suffer from this bug.
Unigine is aware of the problem, and quickly developed a fix, but has
not bothered to change the download link on their website to a working
copy in over a year. People were still using the broken version and
complaining. We tried working around this by disabling dual color
blending, but that apparently hurts performance, and people were once
again unhappy.
On i965, dual source blending is achieved by using different framebuffer
write messages than normal rendering. So, we have to compile different
code for the two cases. We're not being pedantic: we actually have to
know in order to function.
Normally, dual source blending is detectable in the shader: if a shader
has an output with index = 1, then it's meant for blending, not MRT.
With the broken inputs, they're indistinguishable, so we can only tell
by looking at the current GL state.
This patch implements a new drirc workaround:
export dual_color_blend_by_location=true
which makes the i965 driver detect when OpenGL state is configured for
dual source blending, and recompile the fragment shader to use the right
messages. In that case, we allow either location = 1 or index = 1 to
specify the second source for the blending equations.
It also re-enables GL_ARB_blend_func_extended for Unigine.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92233
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: After more discussion with hw teams, the kernel already contains the
optimal settings allowing us to use all CUs.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This was originally removed here:
commit 031d350132
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Tue Aug 25 16:59:12 2015 -0700
i965/vs: Unify URB entry size/read length calculations between backends.
Then added back:
commit bd198b9f0a
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Fri Aug 14 16:01:33 2015 -0700
i965/vs: Simplify fs_visitor's ATTR file.
Note that the authorship dates are out of order, but the above reflects the
order of the commit dates.
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When a fragment shader is used that has no outputs but does conditional
discard (KILL_IF), all fragments are killed without this patch.
By comparing various register settings, my conclusion is that the exec mask
is either not properly forwarded to the DB by NULL exports or ends up being
unused, at least when there is _only_ a NULL export (the ISA documentation
claims that NULL exports can be used to override a previously exported exec
mask).
Of the various approaches I have tried to work around the problem, this one
seems to be the least invasive one.
v2: take discard by alpha test into account as well
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93761
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Updates the _mesa_has_geometry_shaders function to also look
for OpenGL ES 3.1 contexts that has OES_geometry_shader enabled.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
because not using SPI_SHADER_32_ABGR doubles fill rate.
We should also get optimal performance if alpha isn't needed or blending
isn't enabled.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This does change the behavior slightly:
If a shader writes COLOR[i] and that color buffer isn't bound,
the shader will export MRT_NULL instead and discard the IR tree that
calculates the output. The only exception is alpha-to-coverage, which
requires an alpha export.
v2: - update a comment about 16BPC
- account for MRTZ when when fixing alpha-test/kill
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Normally there's a producer and consumer, and the producer var gets
picked. In both the vertex->gs and tes->gs cases, that's the un-arrayed
version.
In the SSO case, however, there is no producer. So we picked the arrayed
GS variable, and as a result, used more slots than we should. More
critically, these slots would also no longer line up with the producer's
calculation. To fix this, we need to fix up the type of the variable
based on stage no matter what.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93650
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
By using whole static libraries the android buildsystem provides
whole-archive (alike) solution. This means that we don't need to worry
about the order of the static libraries and any reverse, recursive or
circular dependencies that they have between one another.
Without this the linker will discard any unused hunks of one library
and we'll end up with unresolved symbols as those are required by
another static library. This issue has become more prominent with the
introduction of pipe-loader.
Whole static libraries has been used in i915/i965 for a very long
time, so we might do the same.
v2:
- Better commit message (Ilia)
- Keep external dependencies as [normal] static libs (Mauro)
Cc: mesa-stable@lists.freedesktop.org
Cc: Mauro Rossi <issor.oruam@gmail.com>
Reported-by: Mauro Rossi <issor.oruam@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The commit b4e198f47f changed the offset and bits parameters of the
bitfield insert operation from scalars to vectors. However, the lowering
of ldexp on doubles operates on each vector component and emits scalar
code (since it has to deal with the lower and upper 32-bit chunks of
each double component), so it needs its bits and offset parameters to
be scalars.
Fixes fp64 regression (crash) in:
spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-ldexp-dvec4.shader_test
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Since the GREMEDY extensions are normally only exposed by the gremedy
debugger (and could possibly trigger debug paths in the app), we don't
expose the extension by default, but instead only with
ST_DEBUG=gremedy.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The texture mipmap completeness checking code was checking whether all
of the faces have the same size. However this is pointless because the
code just above it checks whether the face has the expected size
calculated for the mipmap level anyway so the error condition could
never be reached. This patch just removes it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
According to the GL 1.4 spec section 3.8.10, a cubemap texture is only
complete if:
• The level base arrays of each of the six texture images making up
the cube map have identical, positive, and square dimensions.
• The level base arrays were each specified with the same internal
format.
• The level base arrays each have the same border width.
Previously the texture completeness code was only checking the first
point. This patch makes it additionally check the other two.
This fixes the following two dEQP tests:
deqp-gles2.functional.texture.completeness.cube.format_mismatch_rgba_rgb_level_0_neg_z
deqp-gles2.functional.texture.completeness.cube.format_mismatch_rgb_rgba_level_0_pos_z
And also this Piglit test:
spec/!opengl 2.0/incomplete-cubemap-format
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93792
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The buffers are referenced from r600_update_driver_const_buffers()
-> r600_set_constant_buffer() -> u_upload_data(), but nothing
ever releases the reference. Similar case with driver_consts.
Found using valgrind.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
main/shaderapi.c:1318:51: warning: format specifies type 'unsigned int' but the argument has type 'GLhandleARB' (aka 'unsigned long') [-Wformat]
_mesa_debug(ctx, "glDeleteObjectARB(%u)\n", obj);
~~ ^~~
%lu
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Removes the public symbol _glapi_create_table_from_handle from
libGL.so.1.2.0 on all platforms except Darwin.
Since the symbol is not used on other platforms it makes sense to
build glapi_gentable.c only on Darwin.
As a side effect it accelerates the build a bit and reduces the size
of libGL.so.1.2.0 as follows:
size lib/libGL.so.1.2.0 on my system shows
text data bss dec hex filename
469211 21848 2720 493779 788d3 lib/libGL.so.1.2.0 before
420988 11240 2720 434948 6a304 lib/libGL.so.1.2.0 after
A little bit of history:
_glapi_create_table_from_handle was introduced in
commit 85937f4c0d
Author: Jeremy Huddleston <jeremyhu@apple.com>
Date: Thu Jun 9 16:59:49 2011 -0700
glapi: Add API that can create a _glapi_table from a dlfcn handle
Example usage:
void *handle = dlopen(opengl_library_path, RTLD_LOCAL);
struct _glapi_table *disp = _glapi_create_table_from_handle(handle,
"gl");
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
and the only user in mesa was added in
commit f35913b96e
Author: Jeremy Huddleston <jeremyhu@apple.com>
Date: Thu Jun 9 17:29:51 2011 -0700
apple: Use _glapi_create_table_from_handle to initialize our
dispatch table
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
gl_gentable.py was also used for XQuartz in xserver 1.11 - 1.14.
v2: Fix typos in commit message
Add missing XORG_GLAPI_OUTPUTS += \ into src/mapi/glapi/gen/Makefile.am
Add glapi_gentable.c to EXTRA_DIST for inclusion in the release
tarball
v3: Fix commit message: s/gl_gentable.c/glapi_gentable.c/
Reported-by: Arlie Davis <arlied@google.com>
Cc: Jeremy Huddleston <jeremyhu@apple.com>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch significantly reduces the size of the libGL.so binary. It does
not change the (externally visible) behavior of libGL.so at all.
gl_gentable.py generates a function, _glapi_create_table_from_handle.
This function allocates a large dispatch table, consisting of 1300 or so
function pointers, and fills this dispatch table by doing symbol lookups
on a given shared library. Previously, gl_gentable.py would generate a
single, very large _glapi_create_table_from_handle function, with a short
cluster of lines for each entry point (function). The idiom it generates
was a NULL check, a call to snprintf, a call to dlsym / GetProcAddress,
and then a store into the dispatch table. Since this function processes
a large number of entry points, this code is duplicated many times over.
We can encode the same information much more compactly, by using a lookup
table. The previous total size of _glapi_create_table_from_handle on x64
was 125848 bytes. By using a lookup table, the size of
_glapi_create_table_from_handle (and the related lookup tables) is reduced
to 10840 bytes. In other words, this enormous function is reduced by 91%.
The size of the entire libGL.so binary (measured when stripped) itself drops
by 15%.
So the purpose of this change is to reduce the binary size, which frees up
disk space, memory, etc.
size lib/libGL.so.1.2.0 on my system shows (Andreas)
text data bss dec hex filename
565947 11256 2720 579923 8d953 lib/libGL.so.1.2.0 before
469211 21848 2720 493779 788d3 lib/libGL.so.1.2.0 after
v2: Incorporate Matt's feedback.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Tested-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Take reading shader outputs into account, and use setFlagsDef for the
carry since we rely on having i->flagsDef being set.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Doing that is clearly a bug. We can't quite assert as st/mesa may hit this,
but increase at least visibility of it a bit.
(For the non-refcounted objects it would be illegal too, but we can't detect
that unless we'd store the context ourselves. Plus, those don't tend to cause
random crashes at context or object destruction time... So just sampler views,
surfaces and so targets for now.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
I removed this mistakenly in 2dbc20e456. I
actually thought it should not be necessary and a piglit run didn't show
any differences, but this shouldn't have been in there.
draw_prepare_shader_outputs() is in fact dependent on NEW_RASTERIZER.
The new polygon-mode-facing test indeed shows why this is necessary, there's
lots of invalid reads and writes with valgrind (also crashes without
valgrind), because the pre-pipeline vertex size doesn't match the
post-pipeline vertex size (note this won't help much with stages which don't
have the prepare hook which can grow the vertex size, in particular the wide
point stage, but this isn't used by llvmpipe). The test still won't pass, of
course, but it is only usage of uninitialized values now, which is much
less dangerous...
(Albeit I'm pretty sure for i915 it really is not needed anymore as it
doesn't care about the extra outputs and doesn't call
draw_prepare_shader_outputs().)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Patch moves uniform calculation to happen during link_uniforms, this
is possible with help of UniformRemapTable that has all the reserved
locations.
Location assignment for implicit locations is changed so that we
utilize also the 'holes' that explicit uniform location assignment
might have left in UniformRemapTable, this makes it possible to fit
more uniforms as previously we were lazy here and wasting space.
Fixes following CTS tests:
ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max
ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max-array
v2: code cleanups, increment NumUniformRemapTable correctly, fix
find_empty_block to work properly and add some more comments.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Fixes piglit regression after fixes to duplicate layout rules.
Previously catching multiple layouts was relying on the code
meant to catch duplicates within a single layout(...), this
change triggers the rules for multiple layouts.
Cc: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
If we have a d24x8 format, there is no stencil. Therefore, we can always
clear these bits too, which means this will be some kind of memset rather
than read-modify-write.
This is good for some 7% increase or so in gears with huge window size -
seems to have a bigger effect if things aren't in caches. Of course, any
real app won't spend nearly as much time comparatively in clearing
depth buffer in the first place, so the speedup will be much lower.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fixes a number of GLES31 CTS failures and hangs on various hardware:
ES31-CTS.texture_gather.plain-gather-depth-2d
ES31-CTS.texture_gather.plain-gather-depth-2darray
ES31-CTS.texture_gather.plain-gather-depth-cube
ES31-CTS.texture_gather.offset-gather-depth-2d
ES31-CTS.texture_gather.offset-gather-depth-2darray
ES31-CTS.layout_binding.sampler2D_layout_binding_texture_ComputeShader
ES31-CTS.layout_binding.sampler2DArray_layout_binding_texture_ComputeShader
ES31-CTS.explicit_uniform_location.uniform-loc-types-samplers
ES31-CTS.compute_shader.resources-texture
Some of them were actually passing by luck on some generations even
though we weren't uploading sampler state tables explicitly for the
compute stage, most likely because they relied on the cached sampler
state left from previous rendering to be close enough.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92589
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93312
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93325
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93407
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93725
Reported-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This reuses the NEW_SAMPLER_STATE_TABLE state bit (currently only used
on pre-Gen7 hardware) to signal that the sampler state tables have
changed in order to make sure that the GPGPU interface descriptor is
updated.
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I have a patch that writes shaders as .shader_test files, and it uses
this function to create the headers (i.e. [vertex shader]).
[tess ctrl shader] isn't a valid shader_runner header - it's spelled
out as [tessellation control shader].
There's no real reason to abbreviate it, so spell it out.
v2: Rebase on Rob's patches to move the code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Even if re-linking fails rendering shouldn't fail as the previous
succesfully linked program will still be available. It also shouldn't
be possible to have an unlinked program as part of the current rendering
state.
This fixes a subtest in:
ES31-CTS.sepshaderobjs.StateInteraction
This change should improve performance on CPU limited benchmarks as noted
in commit d6c6b186cf.
>From Section 7.3 (Program Objects) of the OpenGL 4.5 spec:
"If a program object that is active for any shader stage is re-linked
unsuccessfully, the link status will be set to FALSE, but any existing
executables and associated state will remain part of the current rendering
state until a subsequent call to UseProgram, UseProgramStages, or
BindProgramPipeline removes them from use. If such a program is attached to
any program pipeline object, the existing executables and associated state
will remain part of the program pipeline object until a subsequent call to
UseProgramStages removes them from use. An unsuccessfully linked program may
not be made part of the current rendering state by UseProgram or added to
program pipeline objects by UseProgramStages until it is successfully
re-linked."
"void UseProgram(uint program);
...
An INVALID_OPERATION error is generated if program has not been linked, or
was last linked unsuccessfully. The current rendering state is not modified."
V2: apply the rule to both core and compat.
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From the ARB_shading_language_420pack spec:
"More than one layout qualifier may appear in a single
declaration. If the same layout-qualifier-name occurs in
multiple layout qualifiers for the same declaration, the
last one overrides the former ones."
The parser was already failing correctly when the extension is
not available but testing for duplicates within a single layout
qualifier was still causing this to fail when available as both
cases share the same function for merging.
Here we add a parameter to differentiate between the two uses
and apply it to the duplicate test.
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
In order to only create a single node for each default declaration
we add a new boolean parameter to the in/out merge function to
only create one once we reach the rightmost layout qualifier.
From the ARB_shading_language_420pack spec:
"More than one layout qualifier may appear in a single
declaration. If the same layout-qualifier-name occurs in
multiple layout qualifiers for the same declaration, the
last one overrides the former ones."
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This will allow merging of duplicate layout qualifiers as allowed
by ARB_shading_language_420pack
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This is added by ARB_enhanced_layouts although it doesn't fit
into any of the six main changes so we enable this independently.
From the ARB_enhanced_layouts spec:
"More than one layout qualifier may appear in a single
declaration. Additionally, the same layout-qualifier-name
can occur multiple times within a layout qualifier or across
multiple layout qualifiers in the same declaration"
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Apparently, nobody has combined stippling with a fragment shader
containing immediates in almost five years...
Fixes a bug in Kodi with radeonsi reported by Christian König.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The whole point of inlining sources is to reduce loads. We can end up in
a situation where one value is used a lot of times, and one value is
used only once per instruction. The once-per-instruction one is the one
that should get inlined, but with the previous algorithm, it was given
no preference.
This flips things around to preferring putting less-referenced values
into src1 which increases the likelihood of them being inlined.
While we're at it, adjust the heuristic to not treat 0 as an immediate,
as well as (effectively) check for situations where LIMMs can't be
loaded. All this yields improvements on nvc0:
total instructions in shared programs : 6261157 -> 6255985 (-0.08%)
total gprs used in shared programs : 945082 -> 943417 (-0.18%)
total local used in shared programs : 30372 -> 30288 (-0.28%)
total bytes used in shared programs : 50089256 -> 50047880 (-0.08%)
local gpr inst bytes
helped 21 822 3332 3332
hurt 0 278 565 565
And more importantly avoids generating really bad code with SSBOs, where
we end up checking a lot of different values (usually immediates) against
the length.
On nv50 we get comparable results, and even improve packing (bytes went
down more than instructions):
total instructions in shared programs : 6346564 -> 6341277 (-0.08%)
total gprs used in shared programs : 728719 -> 725131 (-0.49%)
total local used in shared programs : 3552 -> 3552 (0.00%)
total bytes used in shared programs : 43995688 -> 43932928 (-0.14%)
local gpr inst bytes
helped 0 1380 3252 3774
hurt 0 287 1710 1365
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
An issue could still occur if the base level is set, but fixing that
would require a lot more logic.
This fixes the recently-failing texelFetch 3D tests because the mipmaps
were no longer being generated, which in turn caused the copying logic
to be hit, which in turn didn't work because of the broken
width/height/depth.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Once we go past half of the "GPR" register file, it seems like we need
to run frag shader with smaller threadsize. (The vertex shader already
runs at TWO_QUADS, which is the minimum.)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Some a4xx firmware doesn't implement the "PFD" (prefetch-disabled)
version of the CP_INDIRECT_BUFFER packet. So allow for PFD vs PFE per
generation. Switch a3xx and a4xx over to using prefetch-enabled version
(which is also what blob does.. it seems only on a2xx we cannot use
PFE).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
And only call it from r600_invalidate_resource for buffer resources.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This patch fixes a bug when building a pack instruction.
For POWER (altivec), in case the destination is signed and the
src width is 32, we need to use vpkswss. The original code used vpkuwus,
which emits an unsigned result.
This fixes the following piglit tests on ppc64le:
- spec@arb_color_buffer_float@gl_rgba8-drawpixels
- shaders@glsl-fs-fogscale
I've also corrected some coding style issues in the function.
v2: Returned else statements to vmware style
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
One of the oglconform tests was crashing here, and it was
due to not cloning the actual parameters before creating the
new call. This makes a call clone function that does the right
things to make sure we clone all the needed info, and points
the callee at it. (It differs from ->clone due to this).
this may fix https://bugs.freedesktop.org/show_bug.cgi?id=93722, I had this
patch in my cts fixes tree, but hadn't had time to make sure I liked it.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Any duplicates in a single declaration will already fail the
generic duplicates test due to the explicit_stream flag being set.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This will allow the ARB_shading_language_420pack rules in
glsl_parser.yy for catching duplicate layout qualifiers to be
triggered for the stream identifier rather than relying on the
code meant to catch duplicates within a single layout(...)
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
From Section 7.9 (SUBROUTINE UNIFORM VARIABLES) of the OpenGL
4.5 Core spec:
"The command
void UniformSubroutinesuiv(enum shadertype, sizei count,
const uint *indices);
will load all active subroutine uniforms for shader stage
shadertype with subroutine indices from indices, storing
indices[i] into the uniform at location i. The indices for
any locations between zero and the value of
ACTIVE_SUBROUTINE_UNIFORM_LOCATIONS minus one which are not
used will be ignored."
V2: simplify NULL check suggested by Jason.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: "11.0 11.1" mesa-stable@lists.freedesktop.orghttps://bugs.freedesktop.org/show_bug.cgi?id=93731
Apparently the IPA op decided to stop working with offsets. Need to
figure out if we need to do an AL2P situation or something similar. For
now just turn it back off.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This patch fixes a classic "confuse the enemy" bug.
_mm_andnot_si128 (SSE) and vec_andc (VMX) do the same operation, but the
arguments are opposite.
_mm_andnot_si128 performs "r = (~a) & b" while
vec_andc performs "r = a & (~b)"
To make sure this error won't return in another place, I added a wrapper
function, vec_andnot_si128, in u_pwr8.h, which makes the swap inside.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
In fad158a0 ("freedreno/ir3: array rework") the src # (n) shifted by
one, but missed updating delay-slot calc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
It at least happens with some piglit tests, like
$piglit/bin/vp-address-01
VERT
DCL IN[0]
DCL IN[1]
DCL OUT[0], POSITION
DCL OUT[1], COLOR
DCL CONST[0..7]
DCL ADDR[0]
0: ARL ADDR[0].x, IN[1].xxxx
1: MOV_SAT OUT[1], CONST[ADDR[0].x-1]
2: DP4 OUT[0].x, CONST[4], IN[0]
3: DP4 OUT[0].y, CONST[5], IN[0]
4: DP4 OUT[0].z, CONST[6], IN[0]
5: DP4 OUT[0].w, CONST[7], IN[0]
6: END
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Seems like in certain cases, we cannot use c<a0.x+0> as the third src to
cat3 instructions. This may be slightly conservative, we may only have
this restriction when the first src is also const.
This fixes, for example, +24/-0 of the variable-indexing piglit tests.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If we handle separately the special case of eliminating output mov
(which includes keeps and various other cases where we don't have a
consuming instruction's src register to collapse things into), we
can simplify the logic.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Shuffle things slightly, passing instr-data to ra_name() to reduce the
number of places where we need to add support for array names.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
nir.h is a bit inconsistent about 'typedef struct {} nir_foo' vs
'typedef struct nir_foo {} nir_foo'. But missing struct name tags is
inconvenient when you need a fwd declaration without pulling in all
of nir.
So add missing struct name tag for nir_variable, and a couple other
spots where it would likely be useful.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
This adds code that is basically the same as the code in umod, udiv and idiv.
However, unlike idiv we return -1.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The ARB has decided that implicit conversions should be performed for
bitwise operators in future language revisions. Implementations of
current language revisions may or may not perform them.
This patch makes Mesa apply implicti conversions even on current
language versions. Applications appear to expect this behavior,
and there's really no downside to doing so.
Fixes shader compilation in Shadow of Mordor.
Bugzilla: https://www.khronos.org/bugzilla/show_bug.cgi?id=1405
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
In the vertex and fragment stages, the hardware is nice to us and leaves
g0.2 zerod out for us so we can use it for headers. However, in compute,
geometry, and tessellation stages, the hardware is not so nice. In
particular, for compute shaders on BDW, the hardware places some debug bits
in 23:15. As it happens, bit 15 is interpreted by the sampler as the alpha
channel mask. This means that if you use a texturing instruction with a
header in a compute shader, you may randomly get the alpha channel
disabled. Since channel masks affect the return length of the sampler
message, this can lead the GPU to expect a different mlen to the one you
specified in the shader and this, in turn, hangs your GPU.
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The cleaning up was quite a performance hog (making pipe_resource_reference
the number two in profilers on the vertex path, and 3rd overall, with its
cousin pipe_reference_described not far behind) if there were lots
of tiny draw calls (ipers). Now the reason was really that it was blindly
calling this for all potential shader views (so 32 each for vs and gs) even
though the app never touched a single one which could have been fixed,
however I can't come up with a good reason why we refcount these. We've got
references, of course, in the sampler views, which should be quite sufficient
as we do all vertex and geometry shader execution fully synchronous.
(Calling prepare_shader_sampling for all draw calls even if there were no
changes looks quite suboptimal too, but generally we don't really expect vs/gs
shader sampling to be used much with llvmpipe, and there's even an early exit
if there aren't any views to avoid the "null loop" albeit it's now no longer
always trying to loop through all 32 slots. Maybe improve another time...).
Of course, if we manage to make vertex loads run asynchronously some day,
we need references again, but adding that back would be the least of the
problems...
Also only set LP_NEW_SAMPLER_VIEW for fragment sampler views. Nothing on the
vertex side depends on it (I suppose we'd really wanted a separate flag in
any case).
(Good for a 3% improvement or so in ipers under the right conditions.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This was not really a leak per se, but we were referencing the textures for
longer than intended. If textures were set via llvmpipe_set_sampler_views()
(for fs) and then picked up by lp_setup_set_fragment_sampler_views(), they
were referenced in the setup state. However, the only way to unreference them
was by replacing them with another texture, and not when the texture slot
was replaced with a NULL sampler view. (They were then further also referenced
by the scene too which might have additional minor side effects as we limit
the memory size which is allowed to be referenced by a scene in a rather crude
way.) Only setup destruction (at context destruction time) then finally would
get rid of the references.
Fix this by noting the number of textures the last time, and unreference
things if the new view is NULL (avoiding having to unreference things
always up to PIPE_MAX_SHADER_SAMPLER_VIEWS which would also have worked).
Found by code inspection, no test...
v2: rename var
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Only modify interpolation type for integer-based varyings or when the
consumer is known and different than fragment shader.
If we are linking separate shader programs and the consumer is unknown,
the consumer could be added later and be a fragment shader. If we
modify the interpolation type in this case, we could read wrong
values in the fragment shader inputs, as shown in bug 93320.
Fixes the following CTS test:
ES31-CTS.vertex_attrib_binding.advanced-bindingUpdate
Fixes the following dEQP tests:
dEQP-GLES31.functional.separate_shader.random.102
dEQP-GLES31.functional.separate_shader.random.111
dEQP-GLES31.functional.separate_shader.random.115
dEQP-GLES31.functional.separate_shader.random.17
dEQP-GLES31.functional.separate_shader.random.22
dEQP-GLES31.functional.separate_shader.random.23
dEQP-GLES31.functional.separate_shader.random.3
dEQP-GLES31.functional.separate_shader.random.32
dEQP-GLES31.functional.separate_shader.random.39
dEQP-GLES31.functional.separate_shader.random.64
dEQP-GLES31.functional.separate_shader.random.73
dEQP-GLES31.functional.separate_shader.random.91
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93320
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This shouldn't hurt anything, and I'm about to introduce a pass that
will want it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This makes it a pass, hiding the parameter structs and block callbacks
so it's simpler to work with.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Shorter than compiler->scalar_stage[MESA_SHADER_VERTEX], which can
help with line-wrapping.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
nir_build_ivec4 is more readable and succinct than using nir_build_imm
directly, even if you have C99.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If shader declares uniform explicit location in one stage but
implicit in another, explicit location should be used. Patch marks
implicit uniforms as explicit if they were explicit in previous stage.
This makes sure that we don't treat them implicit later when assigning
locations.
Fixes following CTS test:
ES31-CTS.explicit_uniform_location.uniform-loc-implicit-in-some-stages3
v2: move check to cross_validate_globals (Timothy)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The RS and hardware binding tables are only supported on the 3D
pipeline and can lead to corruption if left enabled during a GPGPU
workload. Disable it when switching to the GPGPU (or media) pipeline
and re-enable it when switching back to the 3D pipeline.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
This hardware bug can supposedly lead to a hang on IVB and VLV.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
AFAIK brw_emit_select_pipeline() is only called once during context
init on Gen4-5, at which point the pipeline is likely to be already
idle so it may just happen to work by luck regardless of the MI_FLUSH.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Switching the current pipeline while it's not completely idle or the
read and write caches aren't flushed can lead to corruption. Fixes
misrendering of at least the following Khronos CTS test:
ES31-CTS.shader_image_load_store.basic-allTargets-store-fs
The stall and flushes are no longer required on Gen8+.
v2: Emit PIPE_CONTROL with non-zero post-sync op before the write
cache flush on SNB due to hardware bug. (Ken)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93323
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This hardware bug can cause a hang on context restore while the
current pipeline is set to GPGPU (BDWGFX HSD 1909593). In addition to
clearing the valid bit, mark the CC state as dirty to make sure that
the CC indirect state pointer is re-emitted when we switch back to the
3D pipeline.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used on Gen8+ to make sure that the color calculator
state pointers are re-emitted when switching back to the 3D pipeline
after some GPGPU workload due to a hardware workaround. There are
other state bits already defined that could be used to achieve the
same effect but they all cause a ton of unrelated state to be
re-emitted (e.g. BRW_NEW_STATE_BASE_ADDRESS), so just define a new
one, state bits are cheap.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reduces local memory usage in a lot of Metro 2033 Redux and a few KSP
shaders:
total local used in shared programs : 54116 -> 30372 (-43.88%)
Probably modest advantage to execution, but it's an imporant
prerequisite to dropping some of the TGSI optimizations done by the
state tracker.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Previously we were treating any indirect temp array usage to mean that
everything should end up in lmem. The MemoryOpt pass would clean a lot
of that up later, but in the meanwhile we would lose a lot of
opportunity for optimization.
This helps a lot of Metro 2033 Redux and a handful of KSP shaders:
total instructions in shared programs : 6288373 -> 6261517 (-0.43%)
total gprs used in shared programs : 944051 -> 945131 (0.11%)
total local used in shared programs : 54116 -> 54116 (0.00%)
A typical case is for register usage to double and for instructions to
halve. A future commit can also optimize local memory usage size to be
reduced with better packing.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Indirect constbuf indexing works by using very large offsets. However if
an indirect constbuf index load is const-propagated, it becomes a very
large const offset. Take that into account when legalizing the SSA by
moving the high parts of that offset into the file index. Also disallow
very large (or small) indices on most other instructions.
This fixes regressions in ubo_array_indexing/*-two-arrays piglit tests.
Fixes: abd326e81b (nv50/ir: propagate indirect loads into instructions)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
For those formats that support hw mipmap generation, use the
DXGenMips command. Otherwise fallback to the mipmap generation utility.
Tested with piglit, OpenGL apps (Heaven, Turbine, Cinebench)
v2: make sure the texture surface was created with the render target bind flag
set relocation flag to SVGA_RELOC_WRITE for the texture surface
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The actual increment of the num-generate-mipmap counter will be done
in a subsequent patch when hw generate mipmap is supported.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch adds a new interface to support hardware mipmap generation.
PIPE_CAP_GENERATE_MIPMAP is added to allow a driver to specify
if this new interface is supported; if not supported, the state tracker will
fallback to mipmap generation by rendering/texturing.
v2: add PIPE_CAP_GENERATE_MIPMAP to the disabled section for all drivers
v3: add format to the generate_mipmap interface to allow mipmap generation
using a format other than the resource format
v4: fix return type of trace_context_generate_mipmap()
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The OpenGL specifications for bitfieldExtract() says:
The result will be undefined if <offset> or <bits> is negative, or if
the sum of <offset> and <bits> is greater than the number of bits
used to store the operand.
Therefore passing bits=32, offset=0 is legal and defined in GLSL.
But the earlier SM5 ubfe/ibfe opcodes are specified to accept a bitfield width
ranging from 0-31. As such, Intel and AMD instructions read only the low 5 bits
of the width operand, making them not able to implement the GLSL-specified
behavior directly.
This commit adds ubfe/ibfe operations from SM5 and a lowering pass for
bitfield_extract to to handle the trivial case of <bits> = 32 as
bitfieldExtract:
bits > 31 ? value : bfe(value, offset, bits)
Fixes:
ES31-CTS.shader_bitfield_operation.bitfieldExtract.uvec3_0
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
The OpenGL specifications for bitfieldInsert() says:
The result will be undefined if <offset> or <bits> is negative, or if
the sum of <offset> and <bits> is greater than the number of bits
used to store the operand.
Therefore passing bits=32, offset=0 is legal and defined in GLSL.
But the earlier SM5 bfi opcode is specified to accept a bitfield width
ranging from 0-31. As such, Intel and AMD instructions read only the low
5 bits of the width operand, making them not able to implement the
GLSL-specified behavior directly.
This commit fixes the lowering of bitfield_insert to handle the trivial
case of <bits> = 32 as
bitfieldInsert:
bits > 31 ? insert : bfi(bfm(bits, offset), insert, base)
Fixes:
ES31-CTS.shader_bitfield_operation.bitfieldInsert.uint_2
ES31-CTS.shader_bitfield_operation.bitfieldInsert.uvec4_3
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
We check that a bunch of raster operations are disabled in
blit_copy_pixels(). We also need to check that color logicop is
disabled.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The whole point of AMD_pinned_memory is that applications don't have to map
buffers via OpenGL - but they're still allowed to, so make sure we don't break
the link between buffer object and user memory unless explicitly instructed
to.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This accomodates a streaming pattern where the discard flag is set when the
application wraps back to the beginning of the buffer.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It makes sense to re-use pipe->invalidate_resource for the purpose of
glInvalidateBufferData, but this function is already implemented in vc4
where it doesn't have the expected behavior. So add a capability flag
to indicate that the driver supports the expected behavior.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Change the check to be in line with what the quoted spec fragment says.
I have sent out a piglit test for this as well.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The internal Mesa format used for a texture might not match the one
requested in the internalFormat when the texture was created, for
example if the driver is internally remapping RGB textures to RGBA.
Otherwise it can cause false positives for completeness if one mipmap
image is created as RGBA and the other as RGB because they would both
have an RGBA Mesa format. If we check the InternalFormat instead then
we are directly checking the API usage which I think better matches
the intention of the check.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93700
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I spotted this while looking for what needs updating in future platforms.
I'm too lazy to go through the git logs, but it was probably missed by Jason
when all the brw refactoring happened.
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Intel/AMD's hardware instructions do not handle arguments of 32.
Constant evaluation should not produce a result different from the
hardware instruction.
The s/1ull/1u/ change is intentional: previously we wanted defined
behavior for the "1 << 32" case, but we're making this case undefined so
we can make it 1u and save ourselves a 64-bit operation.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If a Python codegen script failed, it would write a zero-byte file,
which on subsequent invocations of make would trick it into thinking the
file was appropriately generated.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We would like to be able to combine
result.x = bitfieldExtract(src0.x, src1.x, src2.x);
result.y = bitfieldExtract(src0.y, src1.y, src2.y);
result.z = bitfieldExtract(src0.z, src1.z, src2.z);
result.w = bitfieldExtract(src0.w, src1.w, src2.w);
into a single ivec4 bitfieldInsert operation. This should be possible
with most drivers.
This patch changes the offset and bits parameters from scalar ints
to ivecN or uvecN. The type of all three operands will be the same,
for simplicity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
We would like to be able to combine
result.x = bitfieldInsert(src0.x, src1.x, src2.x, src3.x);
result.y = bitfieldInsert(src0.y, src1.y, src2.y, src3.y);
result.z = bitfieldInsert(src0.z, src1.z, src2.z, src3.z);
result.w = bitfieldInsert(src0.w, src1.w, src2.w, src3.w);
into a single ivec4 bitfieldInsert operation. This should be possible
with most drivers.
This patch changes the offset and bits parameters from scalar ints
to ivecN or uvecN. The type of all four operands will be the same,
for simplicity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
TGSI doesn't use these - it just translates ir_quadop_bitfield_insert
directly. NIR can handle ir_quadop_bitfield_insert as well.
These opcodes were only used for i965, and with Jason's recent patches,
we can do this lowering in NIR (which also gains us SPIR-V handling).
So there's not much point to retaining this GLSL IR lowering code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
NIR's bfm, like Intel/AMD's hardware instructions and GLSL IR's
ir_binop_bfm takes <bits> as src0 and <offset> as src1.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
I added this code right at the end, and got it wrong.
Only used by the WGL_ARB_render_texture code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The RGBX surface formats aren't renderable so we internally remap them
to RGBA when rendering. They are retained as RGBX when used as
textures. However since the previous patch fast clears are disabled
for surfaces that use a different format for rendering than for
texturing. To avoid this situation we can just pretend not to support
RGBX formats at all. This will cause the upper layers of mesa to pick
an RGBA format internally instead. This should be safe because we
always override the alpha component to 1.0 for RGBX in the texture
swizzle anyway. We could also do this for all gens except that it's a
bit more difficult when the hardware doesn't support texture
swizzling. Gens using the blorp have further problems because that
doesn't implement this swizzle override.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Commit 8926dc8 added a check where we add packed varyings of output
stage only when we have multiple stages, however duplicates are already
handled by changes in commit 0508d950 and we want to add outputs also in
case where we have only one stage.
Fixes regression caused by 8926dc8 for following test:
ES31-CTS.program_interface_query.separate-programs-vertex
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The trick here is to recognize that in the c + n * dcdx calculations,
not only can the lower FIXED_ORDER bits not change (as the dcdx values
have those all zero) but that this means the sign bit of the calculations
cannot be different as well, that is
sign(c + n*dcdx) == sign((c >> FIXED_ORDER) + n*(dcdx >> FIXED_ORDER)).
That shaves off more than enough bits to never require 64bit masks.
A shifted plane c value could still easily exceed 32 bits, however since we
throw out planes which are trivial accept even before binning (and similarly
don't even get to see tris for which there was a trivial reject plane)) this
is never a problem.
The idea isnt't all that revolutionary, in fact something similar was tried
ages ago (9773722c2b) back when the values were
only 32 bit anyway. I believe now it didn't quite work then because the
adjustment needed for testing trivial reject / partial masks wasn't handled
correctly.
This still keeps the separate 32/64 bit paths for now, as the 32 bit one still
looks minimally simpler (and also because if we'd pass in dcdx/dcdy/eo unscaled
from setup which would be a good reason to ditch the 32 bit path, we'd need to
change the special-purpose rasterization functions for small tris).
This passes piglit triangle-rasterization (-fbo -auto -max_size
-subpixelbits 8) and triangle-rasterization-overdraw (with some hacks
to make it work correctly with large sizes) easily (full piglit as
well of course, but most tests wouldn't use triangles large enough to
be affected, that is tris with a bounding box over 128x128).
The profiler says indeed time spent in rast_tri functions is reduced
substantially, BUT of course only if the tris are large. I measured a 3%
improvement in mesa gloss demo when supersized to twice the screen size...
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Otherwise some planes we get in rasterization have subpixel precision, others
not. Doesn't matter so far, but will soon. (OpenGL actually supports viewports
with subpixel accuracy, so could even do bounding box calcs with that).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is quite a few less instructions, albeit still do the 2 64bit muls
with scalar c code (they'd need way more shuffles, plus fixup for the signed
mul so it totally doesn't seem worth it - x86 can do 32x32->64bit signed
scalar muls natively just fine after all (even on 32bit).
(This still doesn't have a very measurable performance impact in reality,
although profiler seems to say time spent in setup indeed has gone down by
10% or so overall. Maybe good for a 3% or so improvement in openarena.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Discovered by accident, valgrind was complaining (could have possibly caused
us to create redundant geometry shader variants).
v2: convinced by Brian and Jose, just use memset for both gs and vs keys,
just as easy and less error prone.
.length() on an unsized SSBO variable doesn't actually read any data
from the SSBO, and is allowed on variables marked 'writeonly'.
Fixes compute shader compilation in Shadow of Mordor.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This adds barrier dependencies around TCS_OPCODE_URB_WRITE, preventing
reads and writes from being incorrectly scheduled.
Fixes rendering in GFXBench 4.0's tessellation demo.
For some reason, we haven't ever listed URB writes as having
side-effects. This hasn't been a problem because in most stages, we
never read from the URB, and only write to each location once.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93526
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
If the constructor fails before the LIST_INIT calls the pointers
will be null and the deconstructor will segfault.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Patch changes linker to allocate gl_shader_variable instead of using
ir_variable. This makes it possible to get rid of ir_variables and ir
in memory after linking.
v2: check that we do not create duplicate entries with
packed varyings
v3: document 'patch' bit (Ilia Mirkin)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Linker missed a check for situation where we exceed max amount of
uniform locations with explicit + implicit locations. Patch adds this
check to already existing iteration over uniforms in linker.
Fixes following CTS test:
ES31-CTS.explicit_uniform_location.uniform-loc-negative-link-max-num-of-locations
v2: use var->type->uniform_locations() (Timothy)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
We already check if the driver changed the completeness, we don't
need to duplicate that check. Let's just early out there instead.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This hasn't been in use since c476305 ("gallium/util: pregenerate
half float tables"), where the last bit of run-time init using this
was killed. So let's just get rid of the pointless header.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Re-binding compute constant buffers after launching a grid have no effects
because they are not currently validated and because dirty_cp is not updated
accordingly. This might also prevent weird future behaviours when UBOs will
be bound for compute.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The path that depends on this will be avoided (by fallback_required) if
the extension is not supported. _mesa_set_sampler_srgb_decode does not
generate GL errors (by design), so there are no problems there.
I kept this change separate and last because it is one of the few in the
series that is not a candidate for the stable branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
All of the calls after the first _mesa_bind_sampler call are DSA style
calls that don't depend on the current binding.
I kept this change separate and last because it is one of the few in the
series that is not a candidate for the stable branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.
In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions. The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.
Here's the problem scenario:
- Application calls a meta function that generates a name. The first
Gen will probably return 1.
- Application decides to use the same name for an object of the same
type without calling Gen. Many demo programs use names 1, 2, 3,
etc. without calling Gen.
- Application calls the meta function again, and the meta function
replaces the data. The application's data is lost, and the app
fails. Have fun debugging that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Some meta operations can be called recursively. Future changes (the
"Don't pollute the ... namespace" changes) will cause objects with
invalid names to be used. If a nested meta operation tries to restore
an object named 0xDEADBEEF, it will fail.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.
In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions. The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.
Here's the problem scenario:
- Application calls a meta function that generates a name. The first
Gen will probably return 1.
- Application decides to use the same name for an object of the same
type without calling Gen. Many demo programs use names 1, 2, 3,
etc. without calling Gen.
- Application calls the meta function again, and the meta function
replaces the data. The application's data is lost, and the app
fails. Have fun debugging that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Some meta operations can be called recursively. Future changes (the
"Don't pollute the ... namespace" changes) will cause objects with
invalid names to be used. If a nested meta operation tries to restore
an object named 0xDEADBEEF, it will fail.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.
In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions. The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.
Here's the problem scenario:
- Application calls a meta function that generates a name. The first
Gen will probably return 1.
- Application decides to use the same name for an object of the same
type without calling Gen. Many demo programs use names 1, 2, 3,
etc. without calling Gen.
- Application calls the meta function again, and the meta function
replaces the data. The application's data is lost, and the app
fails. Have fun debugging that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Some meta operations can be called recursively. Future changes (the
"Don't pollute the ... namespace" changes) will cause objects with
invalid names to be used. If a nested meta operation tries to restore
an object named 0xDEADBEEF, it will fail.
v2: Add a comment explaining why samp_obj_save is set to NULL in
_mesa_meta_fb_tex_blit_begin. This came out of review feedback from
Jason.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This requires tracking the sampler object using the gl_sampler_object*
instead of the object name.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Pulls the parts of _mesa_BindSampler that aren't just parameter
validation out into a function that can be called from other parts of
Mesa (e.g., meta).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I was going to send this as review for dce1e1a8, but I missed that
window. This saves 64 bytes of unshared data and prelaces it with 96
bytes shared text. My guess is that some of the calls to memcpy get
optimized to something else.
text data bss dec hex filename
7847613 220208 27432 8095253 7b8615 i965_dri.so before
7847709 220144 27432 8095285 7b8635 i965_dri.so after
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Brian Paul <brianp@vmware.com>
The ISO C99 standard (7.18.4) specifies that C++
implementations should define UINT64_C only when
__STDC_CONSTANT_MACROS is defined.
Because we now use UINT64_C in our cpp files (since commit
208bfc493d), we need to add this define.
This also solves compilation errors with GCC 4.8.x on ppc64le machines.
v2: add this define to SCons build system
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Gen9+ requires us to emit 3DSTATE_BINDING_TABLE_POINTERS_HS for the
hull shader push constants to take effect. The passthrough TCS uses
push constants for the default tessellation levels. So, when those
change, we need to re-upload the binding table as well.
Fixes five Piglit tests on Skylake:
- spec/arb_tessellation_shader/vs-tes-vertex
- spec/arb_tessellation_shader/vs-tes-tessinner-tessouter-inputs-quads
- spec/arb_tessellation_shader/vs-tes-tessinner-tessouter-inputs-tris
- spec/arb_tessellation_shader/tes-read-texture
- spec/arb_tessellation_shader/tess_with_geometry
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In testing KBL, I found:
- urb size was not set for slices gt1.5, gt2, and gt3. The value I
used for these slices (384) was taken from an earlier patch authored
by Ben Widawsky.
- slice count was missing. This field was added by
a403ad4f5a
With this commit, KBL passes piglit at parity with SKL.
Note: As requested by Kristian, Sarah modified this patch to drop
setting urb size for gt1.5, gt2, and gt3, since the correct default is
set in the GEN9 macro by commit c1e38ad370
"i965/skl: Use larger URB size where available."
Signed-off-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <kristian.h.kristensen@intel.com>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
For the case where we convert a double to an int, we should
round the same as we do for floats.
This fixes GL41-CTS.gpu_shader_fp64.state_query
v2: add IROUNDD (Ilia)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The lower_named_interface_blocks() pass is called before we try
assign locations to varyings so this shouldn't be reachable.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
The lower_named_interface_blocks() pass is called before we try
assign locations to varyings so this shouldn't be reachable.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
OpenGL 2.0 function StencilOp() is in part internally implemented via
StencilOpSeparate(). This change happened some time ago, however the
accompanying doxygen todo comment was not accordingly updated.
Replace the outdated portion of this doxygen todo comment, leaving the
remainder unchanged.
Also better respect the 80 character suggested line length in this file.
v2: Fully remove comment, following code review by t_arceri@yahoo.com.au
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Currently, opt_vectorize() tries to combine:
result.x = bitfieldInsert(src0.x, src1.x, src2.x, src3.x);
result.y = bitfieldInsert(src0.y, src1.y, src2.y, src3.y);
result.z = bitfieldInsert(src0.z, src1.z, src2.z, src3.z);
result.w = bitfieldInsert(src0.w, src1.w, src2.w, src3.w);
into a single ir_quadop_bitfield_insert opcode, which operates on
ivec4s. However, GLSL IR's opcodes currently require the bits and
offset parameters to be scalar integers. So, this breaks.
We want to be able to vectorize this eventually, but for now, just
chicken out and make opt_vectorize() bail by marking all the bitfield
insert/extract related opcodes as horizontal. This is a relatively
uncommon case today, so we'll do the simple fix for stable branches,
and fix it properly on master.
Fixes assertion failures when compiling Shadow of Mordor vertex shaders
on i965 in vec4 mode (where OptimizeForAOS enables opt_vectorize()).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
SCons doesn't understand nir yet and doesn't want to compile the glsl to
nir pass. Move the files to their own variable so we can add it only for
automake.
Tested-by: Brian Paul <brianp@vmware.com>
These are used by code that doesn't necessarily link to libglsl.la. Move
them to shader_enums.[ch] where we keep similar helpers.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The scope of libi965_compiler.la is to be able to take nir shaders and
generate i965 EU code. As such, we don't want the GLSL IR lowering
passes in the library. With this change, libi965_compiler.la no longer
needs to link to libglsl.la.
Reviewed-by: Matt Turner <mattst88@gmail.com>
libglsl_la_SOURCES includes both NIR_FILES and LIBGLSL_FILES, so for
libglsl.la consumers, this is a no-op. libnir.la however no longer uses
any GLSL IR infrastructure and can be used without also linking to
libglsl.la.
Acked-by: Matt Turner <mattst88@gmail.com>
The freedesktop.org blog feeds aren't mentioned on either mesa3d.org or
any of the graphics project wikis (including the DRI wiki) on
freedeskop.org. Fix that by linking to it from the sidebar.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Specify that the operation only applies to the x component, not
per-component as previously specified. This is unnecessary for GL and
creates additional complications for images which need to support these
operations as well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Each load/store on most hardware can specify what caching to do. Since
SSBO allows individual variables to also have separate caching modes,
allow loads/stores to have the qualifiers instead of attempting to
encode them in declarations.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
I believe that `1u << x`, where x >= 32 yields undefined results
according to the C standard.
Particularly MSVC says `warning C4334: '<<' : result of 32-bit shift
implicitly converted to 64 bits (was 64-bit shift intended?)`.
Reviewed-by: Brian Paul <brianp@vmware.com>
For profiling mesa's code, especially llvmpipe, PROFILE should be
defined. Currently, this define can only be generated if mesa is
built using scons.
This patch makes it possible to generate this define also when building
mesa through automake tools.
v2:
- Change --enable-llvmpipe-profile to --enable-profile
- Add -fno-omit-frame-pointer to CFLAGS and CXXFLAGS when enabling profile
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
It can be trivially derived from the number of already declared system
values. This allows ureg users not to worry about which index to choose.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
The piglit copyteximage check has recently been augmented to test this, but
apparently it hasn't been fixed in Mesa so far.
This language also already appears in the OpenGL 2.1 spec (Ian).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We don't need these for GLSL or ARB, but we need them for SPIR-V
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Both were defined as returning bool but the gpu_shader5 functions are
defined to return int. Also, we had the parameters for usub borrwo
backwards in the folding expression.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The draw groups are now split up into groups of 32 if there's a
non-packed stride, or in groups of 400-500 if the draw data is packed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This shifts all indirect draws to go through the new function. If the
driver doesn't have support for multi draws, we break those up and
perform N draws. Otherwise, we pass everything through for just a single
draw call.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This makes it possible to support indirect multidraws as well as having
the number of such draws to come from a separate GPU resource.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
All indirect draws are passed to the new draw function. By default
there's a fallback implementation which pipes it right back to
draw_prims, but eventually both the fallback and draw_prim's support for
indirect drawing should be removed.
This should allow a backend to properly support ARB_multi_draw_indirect
and ARB_indirect_parameters.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The sse path was pretty much disabled for practical purposes because the
largest allowed fb size was 128x128. So, adapt it for 64bit plane calculations.
This is actually not that difficult, though a problem is that we can't do
a signed 32x32->64bit mul, only unsigned, so need to fix that up. Overall,
the code still looks reasonable, though it's not like changes there in
setup really make much of a difference in the end...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
eo, just like dcdx and dcdy, cannot overflow 32bit.
Store it as unsigned though just in case (it cannot be negative, but
in theory twice as big as dcdx or dcdy so this gives it one more bit).
This doesn't really change anything, albeit it might help minimally on
32bit archs.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Back in the day (before 24678700ed) the values
were not actually in a struct but even then I can't see why we didn't simply
align the values. Especially since it's trivial to do so.
(Not that it actually matters since the code is pretty much unused for now.)
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Otherwise, clipped lines would have undefined stippling reset bit if line
stippling is enabled.
(Untested, and I just assume copying over the bits from the original line
is actually the right thing to do.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The unfilled stage was not filling in the prim header, and the line stage
then decided to reset the stipple counter or not based on the uninitialized
data. This causes some failures in conform linestipple test (albeit quite
randomly happening depending on environment).
So fill in the prim header in the unfilled stage - I am not entirely sure
if anybody really needs determinant after that stage, but there's at least
later stages (wide line for instance) which copy over the determinant as well.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This was added in 54f583a20 since then error handling has improved.
The test this was added to fix now fails earlier since 01822706ec
Reviewed-by: Matt Turner <mattst88@gmail.com>
In lp_build_conv() and lp_build_conv_auto(), there is a special case of
conversion when sse2 is present. That code path is suitable without any
changes to altivec, because all the functions that are called in that
code path already support altivec.
This patch increase the FPS in POWER arch across the board
between 10%-25%
I checked ipers, glxgears, glxspheres64, openarena, xonotic and glmark2.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The function will be extended to dump all binaries shaders will consist of,
so si_shader* makes sense here.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Eventually, I'd like to dump stats for several combined binaries, which is
why you don't see a binary parameter in si_shader_dump_stats
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We won't compile shaders in draw calls, but we will concatenate shader
binaries according to states in draw calls, so keep the binaries.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
There will be 1 config per variant, which will be a union of configs
from {prolog, main, epilog}. For now, just add the structure.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This doesn't fix a known bug, but better safe than sorry.
Also, simplify the expression in si_shader.c.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
because the API pixel shader binary will not emulate alpha test one day,
so the KILL_ENABLE bit must be determined elsewhere.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
vector_insert takes a vector, a scalar location, and a scalar value,
and produces a new vector with that component updated. As such, it
can't be vectorized properly.
vector_extract takes a vector and a scalar location, and returns
that scalar component of the vector. Vectorization doesn't really
make any sense.
Treating both as horizontal operations makes sure the vectorizer
won't try to touch these.
Found by inspection.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This makes it more similar to llvmpipe. It also allows us to let draw emit
code handle things like getting zeros for non-existing vs outputs
automatically. There probably isn't really any overhead either way, there isn't
really any "simply copy everything" code in the emit path it would copy each
attrib individually just the same. Likewise, we still do another mapping step
in softpipe as the layout may still not match exactly (same as in llvmpipe,
should probably nuke the pointless mapping in both drivers).
This fixes the piglit arb_fragment_layer_viewport no_gs/no_write tests.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
They can't actually be 0 (as position is there) but should avoid confusion.
This was supposed to have been done by af7ba989fb
but I accidentally pushed an older version of the patch in the end...
Also prettify slightly. And make some notes about the confusing and useless
fs input "map".
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
draw emit couldn't care less what the interpolation mode is...
This somehow looked like it would matter, all drivers more or less
dutifully filled that in correctly. But this is only used for emit,
if draw needs to know about interpolation mode (for clipping for instance)
it will get that information from the vs anyway.
softpipe actually used to depend on that interpolation parameter, as it
abused that structure quite a bit but no longer.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
softpipe would calculate two "vertex layouts". The second one was however
just used for internal purposes, draw would know nothing about it even though
it looked exactly the same as the other one we tell draw about.
So, store that information separately as this was just confusing.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Unlike llvmpipe, softpipe always tells draw to emit the vertices as-is.
The two vertex layouts it calculates are a bit confusing, one which is just
used to tell draw to emit vertices as-is, and the other which has draw written
all over it but draw is completely unaware of and is used only to look up the
correct interpolation info later in setup.
Thus, the slots used are different to what llvmpipe does (I'm going to clean
up the confusing two layout stuff).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
It was actually slightly buggy (missing initialization / setup not dependent
on new vs albeit I didn't see issues), but the case of non-existing attributes
is now handled by draw emit code so don't need that anymore.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Previously the code would just redirect requests for attributes which
don't exist to use output 0. Rework this to output all zeros instead which
seems more useful - in particular some extensions like
ARB_fragment_layer_viewport require 0 in the fs even if it wasn't output by
previous stages. That way, drivers don't have to special case this depending
if the vs/gs outputs some attribute or not.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Rename SVGA_HINT_FLAG_DRAW_EMITTED to SVGA_HINT_FLAG_CAN_PRE_FLUSH
because preemptive flush can be unblocked by more commands than
draw.
Reviewed-by: Brian Paul <brianp@vmware.com>
The existing code effectively turns off preemptive flushing for all
but the regions used for draws. This turns out to be overly
restrictive as some memory regions, e.g. GMR, may never get a draw
when used as a DMA upload staging area, causing problems for apps
that upload a large amount of textures, e.g. Unigine Heaven.
This patch fixes the Unigine Heaven memory allocation error and
has been verified to not cause a regression in the previous extended
retina display issue.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
In emit_input_declarations(), we are skipping declarations for those
registers that are not being used. But in emit_vertex_attrib_instructions(),
we are still emitting instructions to tweak the vertex attributes even if
they are not being used. This causes an assert in the backend because an
input register is not declared in the shader. This patch fixes the problem
by skipping the instruction if the vertex attribute is not being used.
Changes in this patch is originated from the code snippet from Jose as
suggested in bug 1530161.
Tested with piglit, Heaven, Turbine, glretrace.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
If the only dirty state is mesa's _NEW_PROGRAM_CONSTANTS flag, we can
skip state validation before drawing a bitmap since that state doesn't
effect bitmap rendering.
This further increases the performance of the ipers demo on llvmpipe
to about what it was before commit 36c93a6fae.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We were checking the dirty->st flags but not the dirty->mesa flags.
When we took the early return, we didn't clear the dirty->mesa flags
so the next time we called st_validate_state() we'd often flush the
glBitmap cache. And since st_validate_state() is called from
st_Bitmap(), it meant we flushed the bitmap cache for every glBitmap()
call.
This change seems to recover most of the performance loss observed
with the ipers demo on llvmpipe since commit commit 36c93a6fae.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Previously each member was being counted as using a single slot,
count_attribute_slots() fixes the count for array and struct members.
Also don't assign a negitive to the unsigned expl_location variable.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Previously we were only reserving a single location for arrays and
structs.
We also didn't take into account implicit locations clashing with
explicit locations when assigning locations for their arrays or
structs.
This patch fixes both issues.
V5: fix regression for patch inputs/outputs in tessellation shaders
V4: just use count_attribute_slots() to get the number of slots,
also calculate the correct number of slots to reserve for gs and
tess stages by making use of the new get_varying_type() helper.
V3: handle arrays of structs
V2: also fix for arrays of arrays and structs.
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
This will be used in the following patch for calculating array sizes correctly
when reserving explicit varying locations.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Previously we would pack varyings before trying to remove them, this
relied on the packing pass not packing varyings with a location of -1
to avoid packing varyings that should be removed.
However this meant unused varyings with an explicit location would be
packed before they could be removed when we enable packing of them in a
later patch.
V2: fix regression in V1 removing unused varyings in multi-stage SSO,
fix regression with single stage programs.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
ALIGN_DIVUP is a driver specific(r600g) macro that duplicates DIV_ROUND_UP functionality.
Replacing it with DIV_ROUND_UP eliminates this problems.
Signed-off-by: Krzysztof A. Sobiecki <sobkas@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
I had the driver all tested for the last series, and in my last build I
noticed that get_swizzled_channel was unused now, and removed
it... apparently without testing to find that I removed the wrong channel
swizzle function.
We routinely have code like:
vec1 ssa_220 = fge ssa_104, ssa_61
vec1 ssa_199 = bcsel ssa_220, ssa_106, ssa_105
and we would compare fge's args and choose between ~0 and 0 to generate
ssa_220, then compare ssa_220 to 0 and choose between bcsel's args.
Instead, try to notice the pattern and compare between fge's args to
select between bcsel's args.
total instructions in shared programs: 88019 -> 87574 (-0.51%)
instructions in affected programs: 9985 -> 9540 (-4.46%)
total estimated cycles in shared programs: 245752 -> 245237 (-0.21%)
estimated cycles in affected programs: 17232 -> 16717 (-2.99%)
We can't use its other features currently (mostly because we don't want
Newton-Raphson on rcps for texture coordinates), but it gets us started.
This eliminates some comparisons with constants in GLB2.7 and ETQW traces
at the QIR level by moving the comparisons into NIR, where they get
constant-folded out.
instructions in affected programs: 165 -> 156 (-5.45%)
total uniforms in shared programs: 32087 -> 32085 (-0.01%)
total estimated cycles in shared programs: 245762 -> 245752 (-0.00%)
estimated cycles in affected programs: 461 -> 451 (-2.17%)
I'm moving away from QIR being SSA (since NIR is doing lots of SSA
optimization for us now) and instead having QIR just be QPU operations
with virtual registers. By making our SELs be composed of two MOVs, we
could potentially coalesce the registers for the MOV's src and dst and
eliminate the MOV.
total instructions in shared programs: 88448 -> 88028 (-0.47%)
instructions in affected programs: 39845 -> 39425 (-1.05%)
total estimated cycles in shared programs: 246306 -> 245762 (-0.22%)
estimated cycles in affected programs: 162887 -> 162343 (-0.33%)
If you want the SF of the value of a register produced from a series of
packing MOVs or conditional MOVs, we can't just SF on the last MOV into
the register.
Fix a 's/unsigned int/unsigned/' consistency case while here.
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fix silly issue with MSVC case fall-though support to need
a extra 'break;'
Found-by: Coccinelle
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch converts the SSE-optimized lp_rast_triangle_32_3_16()
to VMX/VSX.
I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.
FPS/Score
Name Before After Delta
------------------------------------------------
openarena 16.35 16.7 2.14%
xonotic 4.707 4.97 5.57%
glmark2 didn't show a significant (more than 1%) difference.
v2: Make sure code is build only on POWER8 LE machine
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This patch converts the SSE-optimized build_mask_32() and
build_mask_linear_32() to VMX/VSX.
I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.
FPS/Score
Name Before After Delta
------------------------------------------------
glmark2 (score) 139.8 142.7 2.07%
openarena and xonotic didn't show a significant (more than 1%)
difference.
v2: Make sure code is build only on POWER8 LE machine
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This patch converts the SSE optimization done in do_triangle_ccw to
VMX/VSX.
I measured the results on POWER8 machine with 32 cores at 3.4GHz and
16GB of RAM.
FPS/Score
Name Before After Delta
------------------------------------------------
glmark2 (score) 136.6 139.8 2.34%
openarena 16.14 16.35 1.30%
xonotic 4.655 4.707 1.11%
v2:
- Convert loads to use aligned loads
- Make sure code is build only on POWER8 LE machine
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This file provides a portability layer that will make it easier to convert
SSE-based functions to VMX/VSX-based functions.
All the functions implemented in this file are prefixed using "vec_".
Therefore, when converting from SSE-based function, one needs to simply
replace the "_mm_" prefix of the SSE function being called to "vec_".
Having said that, not all functions could be converted as such, due to the
differences between the architectures. So, when doing such
conversion hurt the performance, I preferred to implement a more ad-hoc
solution. For example, converting the _mm_shuffle_epi32 needed to be done
using ad-hoc masks instead of a generic function.
All the functions in this file support both little-endian and big-endian
but currently the file is build only on POWER8 LE machine.
All of the functions are implemented using the Altivec/VMX intrinsics,
except one where I needed to use inline assembly (due to missing
intrinsic).
v2:
- Use vec_vgbbd instead of __builtin_vec_vgbbd
- Add an aligned load function
- Don't use typeof()
- Make file build only on POWER8 LE machine
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
To determine if we could use special POWER8 assembly directives, we first
need to detect whether we are running on POWER8 architecture. This patch
adds this detection to configure.ac and adds the necessary compilation
flags accordingly.
v2:
- Add option to disable POWER8 instructions generation
- Detect whether building on BE or LE machine and build with
-mpower8-vector only on LE machine
- Make the printed messages more standard
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The nir_opt_algebraic rule
(('fadd', ('flog2', a), ('fneg', ('flog2', b))), ('flog2', ('fdiv', a, b))),
can produce new fdiv operations, which need to be lowered on i965,
as we don't actually implement fdiv. (Normally, we handle this in
GLSL IR's lower_instructions pass, but in the above case we introduce
an fdiv after that point. So, make NIR do it for us.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Compute shaders require reconfiguring the L3 for shared local memory
support. We have to be able to write the L3 registers to do that.
This effectively turns off compute shaders prior to Kernel 4.2.
(Previously, the extension enable was in an API_OPENGL_CORE conditional.
However, that isn't necessary - core Mesa extension handling already
restricts it properly. I've moved it out in this patch.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This fixes some piglit subtests for ARB_program_interface_query.
V3: remove some of the unnecessary parentheses
V2: fix alignment
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There is a function dedicated to demoting unused varyings lets
trust it to do its job.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
After lowering the matching flag is_unmatched_generic_inout is lost so
we need to move this validation before lowering.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
An SSO program can have multiple stages and we only want to add the externally
facing varyings. The current code was adding both the packed inputs and outputs
for the first and last stage of each program.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Conditions modified allow skl+ to use blitter:
- for all tiling formats
- to write data to YF/YS tiled surfaces
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Overlapping blits are anyway undefined in OpenGL. So no need
of overlap check here.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
It's very rare that a GL app calls glVertex3dv(), but one in particular
calls it lot, always with Z = 0. Check for that condition and convert
the call into glVertex2f. This reduces VBO memory used and reduces
the number of times we have to switch between float[2] and float[3]
vertex formats in the svga driver. This results in a small but
measurable performance improvement.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We only want to set the SVGA_NEW_STIPPLE dirty flag when the polygon
stipple state changes. Before, we only set the flag when we were
enabling stipple, but not disabling.
We don't really have to add SVGA_NEW_STIPPLE to the dirty FS state
set since it's a subset of SVGA_NEW_RAST, but let's be explicit.
This doesn't fix any known bugs.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
and svga_set_sampler_views(). If there's no change, return early
and don't set a SVGA_NEW_x dirty state flag.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
gcc 4.9.3 shows the following error:
brw_vue_map.c:260:20: warning: array subscript is above array bounds
[-Warray-bounds]
return brw_names[slot - VARYING_SLOT_MAX];
This is because BRW_VARYING_SLOT_COUNT is a valid value for the enum
type. Adding an assert will generate no additional code but will teach
the compiler to not complain.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The counter was not set but used by the nouveau driver.
It is required otherwise visual output is garbage.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian Koenig <christian.koenig@amd.com>
This fixes the same tests that commit 8cf2e892f was attempting to fix:
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeOffset
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeSize
as confirmed by Samuel.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This reverts commit 8cf2e892fc. It's
entirely bogus to attempt to store anything about the binding in the
buffer object itself, which might be bound any number of times.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Instead, keep track of GL_DEBUG_OUTPUT and (un)install the pipe_debug_callback
accordingly. Hardware drivers can still use the absence of the callback to
skip more expensive operations in the normal case, and users can no longer be
surprised by the need to set the debug flag at context creation time.
v2:
- re-add the proper initialization of debug contexts (Ilia Mirkin)
- silence a potential warning (Ilia Mirkin)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Commit 5bb5eeea fixes a bug indicating that the surfaces should have the
API buffer size. Hovewer it picked the wrong value.
This patch adds a new variable, which takes into account
glBindBufferRange() values. This patch fixes the following CTS
regressions:
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeOffset
ES31-CTS.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-bindrangeSize
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reported by Tom^ on IRC. The original intent was to mark the pointer
constant as well as the data being pointed to, so move the *.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Note these are a bit uglier, due to avoidance of GNU C extensions. But
drivers which do not need to be built with compilers that don't support
the extension can wrap these macros with their own.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Found during NIR_TEST_CLONE=1 piglit run. We were using block->index
but forgetting to require it. Causing things to not work with a cloned
shader which didn't preserve block_index.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Immediately convert into NIR and do an initial key-agnostic lowering/
optimization pass. This should let us share most of the per-variant
transformations between each variant, and hopefully minimize the draw-
time variant creation part of the compilation process.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
It will still hit a compile_assert() in emit_tex, which has the
advantage of dumping out the offending shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Instead of iterating over all the buffer resources looking for coherent
buffers, we keep track of a context-wide count. This will save some
iterations (and CPU cycles) in 99.99% case because usually coherent
buffers are not so used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
If there's a linked TES program, we should just use the actual
primitive mode. If not, just guess triangles (as we did before).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Using the push model for inputs is much more efficient than pulling
inputs - the hardware can simply copy a large chunk into URB registers
at thread creation time, rather than having the thread send messages to
request data from the L3 cache. Unfortunately, it's possible to have
more TES inputs than fit in registers, so we have to fall back to the
pull model in some cases.
However, it turns out that most tessellation evaluation shaders are
fairly simple, and don't use many inputs. An arbitrary cut-off of
32 vec4 slots (16 registers) is more than sufficient to ensure that
100% of TES inputs are pushed for Shadow of Mordor, Unigine Heaven,
GPUTest/TessMark, and SynMark.
Note that unlike most SIMD8 stages, this actually reads packed vec4
data, since that is what our vec4 TCS programs write.
Improves performance in GPUTest's tessmark_x64 microbenchmark
by 93.4426% +/- 5.35541% (n = 25) on my Lenovo X250 at 1024x768.
Improves performance in Synmark's Gl40TerrainFlyTess microbenchmark
by 22.74% +/- 0.309394% (n = 5).
Improves performance in Shadow of Mordor at low settings with
tessellation enabled at 1280x720 by 2.12197% +/- 0.478553% (n = 4).
shader-db statistics for files containing tessellation shaders:
total instructions in shared programs: 184358 -> 181181 (-1.72%)
instructions in affected programs: 27971 -> 24794 (-11.36%)
helped: 226
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We need a MOV to replicate g0.0<0,1,0> to all 8 channels. Since the
message payload is a single register, MOV seemed more sensible than
LOAD_PAYLOAD. However, MOV cannot be CSE'd, while LOAD_PAYLOAD can.
All input loads can use the same header - we don't need to re-expand
g0 every time. CSE accomplishes this, saving instructions.
shader-db statistics for files containing tessellation shaders:
total instructions in shared programs: 186923 -> 184358 (-1.37%)
instructions in affected programs: 30536 -> 27971 (-8.40%)
helped: 226
HURT: 0
total cycles in shared programs: 1009850 -> 1005356 (-0.45%)
cycles in affected programs: 168206 -> 163712 (-2.67%)
helped: 226
HURT: 0
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
While most align16 instructions only support a SubRegNum of 0 or 4
(using swizzling to control the other channels), 3-src instructions
actually support arbitrary SubRegNums. When the RepCtrl bit is set,
we believe it ignores the swizzle and uses the equivalent of a <0,1,0>
region from the subnr.
In the past, we adopted a vec4-centric approach of specifying subnr of
0 or 4 and a swizzle, then having brw_eu_emit.c convert that to a proper
SubRegNum. This isn't a great fit for the scalar backend, where we
don't set swizzles at all, and happily set subnrs in the range [0, 7].
This patch changes brw_eu_emit.c to use subnr and swizzle directly,
relying on the higher levels to set them sensibly.
This should fix problems where scalar sources get copy propagated into
3-src instructions in the FS backend. I've only observed this with
TES push model inputs, but I suppose it could happen in other cases.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Diagnostics sent during code generation and the every error message reported
by LLVMTargetMachineEmitToMemoryBuffer are disjoint reporting mechanisms. We
take care of both and also send an explicit message indicating failure at the
end, so that log parsers can more easily tell the boundary between shader
compiles.
Removed an fprintf that could never be triggered.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will allow us to send shader debug info via the context's debug callback.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The output via stderr is very helpful for ad-hoc debugging tasks, so that remains
unchanged, but having the information available via debug messages as well
will allow the use of parallel shader-db runs.
Shader stats are always provided (if the context is a debug context, that is),
but you still have to enable the appropriate R600_DEBUG flags to get
disassembly (since it is rather spammy and is only generated by LLVM when we
explicitly ask for it).
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The fixed alignment of u_upload_mgr will go away.
This is the first step.
The motivation is that one u_upload_mgr can have multiple users,
each allocating from the same buffer, but requiring a different alignment.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The function only aligned the size, but not the offset.
The offset was aligned only when the previous suballocation was aligned.
That yielded the correct offset alignment if the alignment was constant
for all suballocations.
Instead, directly align the offset, but allow an unaligned size.
There is no change in behavior, because the alignment is constant
at the moment.
This a prerequisite for allowing a variable alignment for suballocations.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Spotted by luck. The GLSL uniform storage is only associated once
in LinkShader and can't be reallocated afterwards, because that would
break the association.
v2: don't remove st_upload_constants calls, clarify why they're needed
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
First off, we can't flush in the middle of a command. Secondly
requesting the extra push space might cause a flush to happen. If that
flush happens, we'd have to do the PUSH_REFN again. So instead do
PUSH_REFN after the push space request. This helps avoid rare crashes
with supertuxkart in libdrm due to assertion failures.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
While playing with fp64, I disable varying packing to debug
something else, and noticed we never emitted half the output
movs for double matrix arrays.
We should be moving the left index two slots for dual
source doubles, and the right index two slots for non-vs
input doubles.
Signed-off-by: Dave Airlie <airlied@redhat.com>
vertex inputs are counted differently in some cases, with
vertex inputs we need to make sure we don't double count them.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Otherwise we end up emitting the wrong index for the second
double.
This fixes dmat-vs-gs-tcs-tes.shader_test and dvec3-vs-gs-tcs-tes.shader_test
Signed-off-by: Dave Airlie <airlied@redhat.com>
It's important for the double instruction emission code that
the writemasks are correct going in for double so it know
which channels to replicate.
This fixes it for the array and matrix cases.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This code takes into account double inputs in the array
shrinking code. This fixes some issues with doubles
and geom/tess inputs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This handles the case where a double output is stored
in an array, and tracks it for use in the double
instruction emit code.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just a precursor patch to a fix for doubles with
tessellation that I've written.
We need to descend into output arrays in that case and
mark dst's as double.
Signed-off-by: Dave Airlie <airlied@redhat.com>
varying_matches::record tries to compute the number of components in
each varying, which varying_matches::assign_locations uses to assign
locations. With varying packing, it uses glsl_type::component_slots()
to come up with a reasonable value.
Without varying packing, it fell back to an open-coded computation
that didn't bother to handle structs at all. I believe we can simply
use 4 * glsl_type::count_attribute_slots(false), which already handles
these cases correctly.
Partially fixes rendering in GFXBench 4.0's tessellation benchmark.
(NVE0 is almost right after this, but i965 is still mostly garbage.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Unigine Heaven 4.0 and Valley 1.0 use dual color blending but don't
specify which fragment shader output is which, so there's at best a
50/50 chance of us guessing it correctly. This is invalid.
Unigine fixed this in 4.1 and 1.1 versions over a year and a half ago,
but hasn't actually released them for whatever reason. So, add the
workaround back so that it works for most people.
Fixes Heaven 4.0/Valley 1.0 rendering on Ivybridge. For whatever
reason, Broadwell worked. 4.1 and 1.1 have always worked.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92233
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Hooks up the new system values, passes the drawid in.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This will allow the state tracker to inform the driver where in a
broken-up multidraw we currently are. This can then be passed into the
vertex shader.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This allows the state tracker to know that the various draw parameters
are available in vertex shaders.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The add might actually have a 0 as an argument, which would convert it
into a mov. Make sure to detect that. Also avoid the hack of putting the
immediate directly into the instruction, instead use a mov to put it
into place and let the later LoadPropagation pass place it if possible.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The imulExtended tests of the shader bitfield tests of the
OpenGL ES 3.1 CTS, fail on gen8+, when BRW_REGISTER_TYPE_W
is used for SHADER_OPECODE_MULH.
Also, remove unused helper function:
static inline bool type_is_signed(unsigned type)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There used to be more members but they now share other fields
in order to keep memory use low.
Also making the naming more generic will allow us to reuse the
field for explicit byte offsets within blocks for
ARB_enhanced_layouts.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
A hugely common case when using nir_builder is to have a shader with a
single function called main. This adds a helper that gives you just that.
This commit also makes us use it in the NIR control-flow unit tests as well
as tgsi_to_nir and prog_to_nir.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
GL_ARB_shader_draw_parameters added two new system values. This gets us
back to mapping mesa system values to the right TGSI semantics.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
If we're doing an indirect draw, prims[i].basevertex is always 0 and the
real base vertex value is in the indirect parameter buffer. We try to
avoid flagging BRW_NEW_VERTICES if prims[i].basevertex doesn't change,
which then breaks down for indirect draws. Thus, if a program uses base
vertex or base instance, and the draw call is indirect, always flag
BRW_NEW_VERTICES. A new piglit test,
spec/ARB_shader_draw_parameters/drawid-indirect-vertexid tests this.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This optimizes a + b - b to just a. Modest shader-db results (BDW):
total instructions in shared programs: 7842452 -> 7841862 (-0.01%)
instructions in affected programs: 61938 -> 61348 (-0.95%)
total loops in shared programs: 2131 -> 2131 (0.00%)
helped: 263
HURT: 0
GAINED: 0
LOST: 0
but the optimization turns
gl_VertexID - gl_BaseVertexARB
into just a reference to SYSTEM_VALUE_VERTEX_ID_ZERO_BASE, which the
i965 hardware supports natively. That means we can avoid using the
internal vertex buffer for gl_BaseVertexARB in this case.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We have to break open a new vec4 for gl_DrawIDARB. We've used up all
space in the vec4 we use for SGVS and gl_DrawIDARB has to come from its
own separate vertex buffer anyway. This is because we point the vb for
base vertex and base instance into the draw parameter BO for indirect
draw calls, but the draw id is generated by mesa in a different buffer.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We already have gl_BaseVertexARB in the .x component of the SGVS vec4
and plug gl_BaseInstanceARB into the last free component (.y).
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
fs_visitor::emit_vs_system_value() looks like it's trying to handle
SYSTEM_VALUE_VERTEX_ID, but we should never see that value in the
backend.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The drivers will need this for passing in gl_DrawIDARB. For indirect
multidraw calls, we get the prim array and prim[i].draw_id == i and is
redundant. But for non-indirect calls, we get one primitive at a time
and need the draw_id field.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This option allows replacing a single shader by a pre-compiled ELF object
as generated by LLVM's llc, for example. This can be useful for debugging a
deterministically occuring error in shaders (and has in fact helped find
the causes of https://bugs.freedesktop.org/show_bug.cgi?id=93264).
v2: drop the debug flag, use DEBUG_GET_ONCE_OPTION instead
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This changes the count slightly (because of si_generate_gs_copy_shader), but
this is only relevant for the driver-specific num-compilations query. It sets
the stage for the next commit.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Setting interleave on the TCS EOT message causes Ivybridge hardware to
GPU hang like crazy. Individual tests would pass, but running even a
simple test like nop.shader_test in a loop would hang within 1-3 runs.
Adding sleep delays worked around the problem, somehow.
Interleave doesn't make much sense given that we only have one patch
URB handle, not two. Complete doesn't seem useful either.
There's no reason to actually set those bits. We were just being lazy.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Pre-Broadwell hardware requires us to manually release the ICP Handles
by issuing URB read messages with the "Complete" bit set. We can do
this in pairs to use fewer URB read messages.
Based heavily on work from Chris Forbes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
When Connor originally drafted NIR, he copied the same function+overload
system that GLSL IR had with a few names changed. However, this
double-indirection is not really needed and has only served to confuse
people. Instead, let's just have functions which may not have unique names
and may or may not have an implementation. If someone wants to do overload
resolving, they can hav a hash table based function+overload system in the
overload resolving pass. There's no good reason to keep it in core NIR.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
ir3 bits are
Reviewed-by: Rob Clark <robclark@gmail.com>
NIR has never been built with MSVC2008, so we shouldn't add
MSVC2008_COMPAT_CFLAGS to anything that uses it. This allows us to get
rid of the pragma in tgsi_to_nir.c.
Build tested with freedreno.
v2: Use MSVC2013_COMPAT_CLFAGS instead.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
The BDW PRM Vol2a: Command Reference: Instructions, section MEDIA_CURBE_LOAD,
says that 'CURBE Total Data Length' and 'CURBE Data Start Address' are
64-byte aligned. This is different from previous gens, that were 32-byte
aligned.
v2 (Jordan):
- CURBE Data Start Address is also 64-byte aligned.
- The call to brw_state_batch should also use 64-byte alignment.
- Improve PRM reference.
v3:
* New patch from Jordan. Always align base and size to 64 bytes.
Fixes the following SSBO CTS tests on BDW:
ES31-CTS.shader_storage_buffer_object.basic-atomic-case1-cs
ES31-CTS.shader_storage_buffer_object.basic-operations-case1-cs
ES31-CTS.shader_storage_buffer_object.basic-operations-case2-cs
ES31-CTS.shader_storage_buffer_object.basic-stdLayout_UBO_SSBO-case2-cs
ES31-CTS.shader_storage_buffer_object.advanced-write-fragment-cs
ES31-CTS.shader_storage_buffer_object.advanced-indirectAddressing-case2-cs
ES31-CTS.shader_storage_buffer_object.advanced-matrix-cs
And many other CS CTS tests as reported by Marta Lofstedt.
(Commit message is from Iago, but in v3, code is from Jordan.)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Everything is in place and I'm not aware of any further issues.
Tested with:
- Piglit
- Tessmark
- Unigine Heaven
- Shadow of Mordor
- GRID Autosport
I have patches to backport this to Haswell, Ivybridge, and Baytrail as
well (the first Intel hardware to support tessellation), but there are
still a lot of GPU hangs left to debug. So that will come later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
GL_ARB_separate_shader_objects allows the application to mix-and-match
TCS and TES programs separately. This means that the interface between
the two stages isn't known until the final SSO pipeline is in place.
This isn't a great match for our hardware: the TCS and TES have to agree
on the Patch URB entry layout. Since we store data as per-patch slots
followed by per-vertex slots, changing the number of per-patch slots can
significantly alter the layout. This can easily happen with SSO.
To handle this, we store the [Patch]OutputsWritten and [Patch]InputsRead
bitfields in the TCS/TES program keys, introducing program recompiles.
brw_upload_programs() decides the layout for both TCS and TES, and
passes it to brw_upload_tcs/tes(), which store it in the key.
When creating the NIR for a shader specialization, we override
nir->info.inputs_read (and friends) to the program key's values.
Since everything uses those, no further compiler changes are needed.
This also replaces the hack in brw_create_nir().
To avoid recompiles, brw_precompile_tes() looks to see if there's a
TCS in the linked shader. If so, it accounts for the TCS outputs,
just as brw_upload_programs() would. This eliminates all recompiles
in the non-SSO case. In the SSO case, there should only be recompiles
when using a TCS and TES that have different input/output interfaces.
Fixes Piglit's mix-and-match-tcs-tes test.
v2: Pull the brw_upload_programs code into a brw_upload_tess_programs()
helper function (requested by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
With tessellation shaders and SSO, we won't be able to always decide on
VUE map layouts at LinkProgram time. Unfortunately, we have to delay it
until shader specialization time.
However, uniform lowering cannot be deferred - brw_codegen_*_prog()
reads nir->num_uniforms. Fortunately, we don't need to defer it -
uniform, system value, atomic, and sampler lowering can safely stay
where it is. This patch moves those to brw_lower_nir()'s only caller,
renames brw_lower_nir() to brw_nir_lower_io(), and introduces calls
to that.
For non-tessellation stages, I chose to call brw_nir_lower_io() from
brw_create_nir(), so it's still done at the same time. There's no
need to defer it, and doing it at LinkProgram time is nice.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This way, I can safely use brw_tcs_prog_key::program_string_id == 0
to mean "not filled out because no program exists", which avoids the
need for adding an extra boolean to that struct.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
When the application hasn't supplied a TCS, and we have to create one,
we need to know what VS outputs to copy to TES inputs.
To do this, we create a new program key field, and set it to the TES
InputsRead bitfield.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
When using tessellation on OpenGL without a TCS, default values for
gl_TessLevelOuter/gl_TessLevelInner are provided via the API.
Core Mesa will flag ctx->DriverFlags.NewDefaultTessLevels whenever those
values change. We add a corresponding BRW_NEW_DEFAULT_TESS_LEVELS flag
and hook it up to HS push constants (which will be used to upload these
default values to the autogenerated TCS).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
With the automatic-TCS creation, we won't have a prog, but still need to
upload push constants.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tessellation control shaders are optional, but evaluation shaders will
always be present when using tessellation. However, we'll always enable
the TCS (HS) hardware stage when using tessellation - we'll just create
a program on the fly.
That program, however, won't have a gl_program or gl_shader_program.
So we shouldn't check brw->tess_ctrl_program or
shader_prog->_LinkedShaders[MESA_SHADER_TESS_CTRL] - if we want to know
whether tessellation is enabled, we should look for a TES.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is trying to enforce the fact that the hardware requires HS, TE,
and DS to be enabled or disabled together. But it's kind of an ad-hoc
attempt, and not too useful.
More importantly, we aren't going to have a gl_shader_program for the
TCS which is automatically generated when none is present. (We'll just
handle it in the driver backend.) So, these will trip for no reason.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
For several reasons, I don't think it's particularly useful to have
separate flags:
1. Most of the time, tessellation shaders are paired, so both will be
replaced at the same time.
2. The data layout is tightly coupled. Both need to agree on the number
of per-patch slots in the VUE map. Even adding extra TCS outputs
that aren't read by the TES will trigger the need for recompiles.
3. The TCS is optional from an API perspective, but required by the
hardware whenever tessellation is enabled. So, atoms that deal with
the TCS must check brw->tess_eval_program (BRW_NEW_TESS_EVAL_PROGRAM?)
rather than brw->tess_ctrl_program to tell whether tessellation is
enabled.
So, not only is it unlikely to be useful, it's a bit confusing to get
right. Simply using one flag for both simplifies this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
If there's no evaluation shader, tessellation is disabled. The upload
functions would just bail. Instead, don't bother calling them.
This will simplify the optional-TCS case a bit, as brw_upload_tcs can
assume that we're doing tessellation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I need access to glsl_type::vec2_type from C. Wrapping vec() also gives
us access to vec3 if we need it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Instead of performing the read-modify-write cycle in glsl->nir, we can
simply emit a partial writemask. For locals, nir_lower_vars_to_ssa will
do the equivalent read-modify-write cycle for us, so we continue to get
the same SSA values we had before.
Because glsl_to_nir calls nir_lower_outputs_to_temporaries, all outputs
are shadowed with temporary values, and written out as whole vectors at
the end of the shader. So, most consumers will still not see partial
writemasks.
However, nir_lower_outputs_to_temporaries bails for tessellation control
shader outputs. So those remain actual variables, and stores to those
variables now get a writemask. nir_lower_io passes that through. This
means that TCS outputs should actually work now.
This is a functional change for tessellation control shaders.
v2: Relax the nir_validate assert to allow partial writemasks.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tessellation control shaders need to be careful when writing outputs.
Because multiple threads can concurrently write the same output
variables, we need to only write the exact components we were told.
Traditionally, for sub-vector writes, we've read the whole vector,
updated the temporary, and written the whole vector back. This breaks
down with concurrent access.
This patch prepares the way for a solution by adding a writemask field
to store_var intrinsics, as well as the other store intrinsics. It then
updates all produces to emit a writemask of "all channels enabled". It
updates nir_lower_io to copy the writemask to output store intrinsics.
Finally, it updates nir_lower_vars_to_ssa to handle partial writemasks
by doing a read-modify-write cycle (which is safe, because local
variables are specific to a single thread).
This should have no functional change, since no one actually emits
partial writemasks yet.
v2: Make nir_validate momentarily assert that writemasks cover the
complete value - we shouldn't have partial writemasks yet
(requested by Jason Ekstrand).
v3: Fix accidental SSBO change that arose from merge conflicts.
v4: Don't try to handle writemasks in ir3_compiler_nir - my code
for indirects was likely wrong, and TTN doesn't generate partial
writemasks today anyway. Change them to asserts as requested by
Rob Clark.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v3]
Patch makes following changes for interface matching:
- do not try to match builtin variables
- handle swizzle in input name, as example 'a.z' should
match with 'a'
- add matching by location
- check that amount of inputs and outputs matches
These changes make interface matching tests to work in:
ES31-CTS.sepshaderobjs.StateInteraction
The test still does not pass completely due to errors in rendering
output. IMO this is unrelated to interface matching.
Note that type matching is not done due to varying packing which
changes type of variable, this can be added later on. Preferably
when we have quicker way to iterate resources and have a complete
list of all existed varyings (before packing) available.
v2: add spec reference, return true on desktop since we do not
have failing cases for it, inputs and outputs amount do not
need to match on desktop.
v3: add some more spec reference, remove desktop specifics since
not used for now on desktop, add match by location qualifier,
rename input_stage and output_stage as producer and consumer
as suggested by Timothy.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The i965 driver uses this function to decide if it can disable the
FS unit in the absence of color/depth writes. We don't want to disable
the unit in the presence of SSBOs, since the fragment shader could
be writing to it.
We could go a step further and check not just for the presence of SSBOs
but also if the shader code writes to them. Does not look worth the trouble
though and we are not doing this for atomic buffers either anyway.
v2: put this into a generic _mesa_active_fragment_shader_has_side_effects
function instead of having one specific for SSBOs (Jason).
Fixes the following CTS test:
ES31-CTS.shader_storage_buffer_object.advanced-usage-sync-vsfs
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On Haswell we need to set the UAV_ONLY WM state bit when there are no colour
or depth buffer writes and on all hardware we should set the early
depth/stencil control field to PSEXEC unless early fragment tests are enabled
to make sure that the fragment shader is executed regardless of whether
per-fragment tests pass or not as the spec requires.
So far we have been doing this for images only, but we should apply the same
treatment to all side effectful scenarios. Suggested by Curro.
This is not strictly required for compliance with the original
ARB_shader_atomic_counters extension, it's only necessary to get the execution
semantics specified in GL4.2+ right.
v2:
- Mark active_fs_has_side_effects as constant. (Curro)
- Mention that this is only only necessary to get the execution semantics
specified in GL4.2+ right. (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Some drivers can disable the FS unit if there is nothing in the shader code
that writes to an output (i.e. color, depth, etc). Right now, mesa has
a function to check for atomic buffers and the i965 driver also checks for
images. Refactor this logic into a generic function that we can use for
any source of side effects in a fragment shader. Suggested by Jason.
v2:
- Use '_Shader', as suggested by Tapani, to fix the following CTS test:
ES31-CTS.shader_atomic_counters.advanced-usage-many-draw-calls2
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The hardware provides us no decent way of getting at the number of input
vertices in the patch topology from the tessellation control shader.
It's actually very surprising - normally this sort of information would
be available in the thread payload.
For the precompile, we guess that the number of vertices will be the
same for both the input and output patches. This usually seems to be
the case.
On Gen8+, we could pass in an extra push constant containing this value.
We may be able to do that on Haswell too. It's quite a bit trickier on
Ivybridge, however.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The TCS is the first tessellation shader stage, and the most
complicated. It has access to each of the control points in the input
patch, and computes a new output patch. There is one logical invocation
per output control point; all invocations run in parallel, and can
communicate by reading and writing output variables.
One of the main responsibilities of the TCS is to write the special
gl_TessLevelOuter[] and gl_TessLevelInner[] output variables which
control how much new geometry the hardware tessellation engine will
produce. Otherwise, it simply writes outputs that are passed along
to the TES.
We run in SIMD4x2 mode, handling two logical invocations per EU thread.
The hardware doesn't properly manage the dispatch mask for us; it always
initializes it to 0xFF. We wrap the whole program in an IF..ENDIF block
to handle an odd number of invocations, essentially falling back to
SIMD4x1 on the last thread.
v2: Update comments (requested by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The TES is essentially a post-tessellator VS, which has access to the
entire TCS output patch, and a special gl_TessCoord input. Otherwise,
they're very straightforward.
This patch implements SIMD8 tessellation evaluation shaders for Gen8+.
The tessellator can generate a lot of geometry, so operating in SIMD8
mode (8 vertices per thread) is more efficient than SIMD4x2 mode (only
2 vertices per thread). I have another patch which implements SIMD4x2
mode for older hardware (or via an environment variable override).
We currently handle all inputs via the pull model.
v2: Improve comments (suggested by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This field is used as a flag to optimise out any varyings that don't have
a matching varying on the other side of the interface.
The value should be the same for all varyings (except for SSO but we can't
optimise those) by the time they reach nir and are no longer be needed.
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Also emits a method to properly bind the class to a subchannel, which
was missing previously. The kernel currently doesn't care, but this
will break if it ever decides to (ie. to support multiple sw classes).
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The kernel previously exposed incorrect classes for some of the chipsets
that this code supports. It no longer does, but the older object ioctls
have compatibility to avoid breaking userspace.
This needs to be fixed before switching over to the newer interfaces.
Rather than hardcoding chipset->class like the rest of the driver does,
this makes use of (new) sclass queries to determine what's available.
v2.
- update to use symbolic class identifier from <nvif/class.h>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Switching to the newer libdrm entry-points tells libdrm that it's OK to
make use of newer kernel interfaces.
We want to be able to isolate any bugs to either the interfaces changes,
or the use of NVIF itself. As such, this commit has a slight hack which
forces libdrm to continue using the older kernel interfaces.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The winsys layer would attempt to cleanup the nouveau_device if screen
init failed, however, in most paths the pipe driver would have already
destroyed it, resulting in accesses to freed memory etc.
This commit fixes the problem by allowing the winsys to detect whether
the pipe driver's destroy function needs to be called or not.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
If oViewport is written, vertex reuse need to be turned off.
If oViewport is constant, vertex reuse is fine, and VPORT_PROVOKE_DISABLE
need to be set. (we don't have enough info to program VPORT_PROVOKE).
Fixes: arb_viewport_array-render-viewport-2 and some CTS tests.
v2: drop vport provoke write, drop initial state writing this
on evergreen, only program it on evergreen.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If oViewport is written, vertex reuse need to be turned off.
If oViewport is constant, vertex reuse is fine, and VPORT_PROVOKE_DISABLE
need to be set. (We don't know if oViewport is constant so we
skip this.)
Fixes: arb_viewport_array-render-viewport-2 and some CTS tests.
v2: drop writing to provoke disable, drop write in initial
state.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This function deals with vertex inputs and fragment
outputs, so we should count the attribute locations
correctly for the vertex inputs.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I was cleverly using one iteration to obtain a pointer to the last item
in ralloc's singly list child list, while also setting parents.
Unfortunately, I forgot to set the parent on that last item.
Cc: "11.1 11.0 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This doubles the element width for the types that are greater
than 2 elements wide.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This doesn't apply to other stages. This is only
used in the mesa/st code, which needs further fixes.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a rewrite of vc4_opt_qpu_schedule.c to operate on QIR. Texture
fetch can probably take as much as the rest of the cycles of the program,
so it's important to hide our other cycles during it (which is hard to do
after register allocation). Also, we can queue up multiple texture
requests before collecting the resulting samples, so that we keep the
texture unit busy more of the time.
High-settings openarena performance +2.35849% +/- 0.221154% (n=7). Also
about 2-3% on the multiarb demo. 8 piglit tests
(ext_framebuffer_multisample accuracy depthstencil) go from failing in
rendering to failing in register allocation, but hopefully I can fix that
up with some better register pressure handling here.
total instructions in shared programs: 87723 -> 88448 (0.83%)
instructions in affected programs: 78411 -> 79136 (0.92%)
total estimated cycles in shared programs: 276583 -> 246306 (-10.95%)
estimated cycles in affected programs: 265691 -> 235414 (-11.40%)
There's only high latency between a complete texture fetch setup and
collecting its result, not between each step of setting up the texture
fetch request.
So vertex shader input attributes are handled different than internal
varyings between shader stages, dvec3 and dvec4 only count as
one slot for vertex attributes, but for internal varyings, they
count as 2.
This patch comments all the uses of this API to clarify what we
pass in, except one which needs further investigation
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The old function didn't work for matrices, and we need this
in other places to fix some other problems, so move to a helper
in glsl type and fix the one user so far.
A dual slot double is one that has 3 or 4 components in it's
base type.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
As in the previous patches, these can be implemented as
any(v) -> any_nequal(v, false)
all(v) -> all_equal(v, true)
and their removal simplifies the code in the next patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The GLSL IR to TGSI/Mesa IR paths for any_nequal have the same
optimizations the ir_unop_any paths had.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
NaNs mean it should be clipped, otherwise the NaNs might get passed to the
next stages (if clipping didn't happen for another reason already), which
might cause all kind of problems.
The llvm path got this right already (possibly by luck), but this isn't used
when there's a gs active.
Found by code inspection, verified with some hacked piglit test and some more
hacked debug output.
(Note the clipper can still itself incorrectly generate NaN and INF position
values in its output prims (at least after w divide / viewport transform) even
if the inputs weren't NaNs, if the position data of the vertices is
"sufficiently bad".)
Reviewed-by: Brian Paul <brianp@vmware.com>
Those stages only really work for OGL-style texturing (so number of samplers
and views mostly the same, certainly for the max values).
These get often set up all at once, thus there might be max number of both
even if all of them are just NULL. We must not set the max number of samplers
and views to the same value since that will lead to terrible things if a driver
supports more views than samplers (and the state tracker set up all the views).
(This will not make these stages magically work if a shader uses dx10-style
texturing, they might still replace an actually used sview in that case.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch modifies the SSE4.1 test in configure.ac to use a global
variable to initialize vector variables. In addition, we now return the
value of the computation instead of 0.
This is done so gcc 4.9 (and lower) won't optimize the SSE4.1 assembly
instructions (when using -O1 and higher), because then the configure test
might incorrectly pass even though the assembler doesn't support the
SSE4.1 instructions (the test will pass because the compiler does support the intrinsics).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91806
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
OpenGLES 3.1 cannot be enabled for gen 7 (Ivy Bridge, Haswell) since
they are still missing ARB_stencil_texturing.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Previously we were checking the desktop OpenGL ARB_compute_shader
requirements, but for OpenGLES 3.1, the requirements are lower.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
The OpenGL ARB_compute_shader extension specfication requires at least
1024 for GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS, whereas OpenGLES 3.1
only required 128.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The maximum number of resident warps per multiprocessor is 64 on
Kepler instead of 48 on Fermi.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
It's nonsense to drain the pipeline like this.
v2: keep the drain for DMA-buf exports.
v3: flush before the export and after compositing and add TODO comment.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
Tested-by: Julien Isorce <j.isorce@samsung.com>
This reverts commit 839793680f.
The patch was breaking DRI3 because driGLFormatToImageFormat does not
handle MESA_FORMAT_B8G8R8X8_SRGB which ended up making it fail to
create the renderbuffer and it would later crash. It's not trivial to
add this format because there is no __DRI_IMAGE_FORMAT nor
__DRI_IMAGE_FOURCC define for the format either. I'm not sure how
difficult adding this would be and whether adding a new format would
require some sort of new version for DRI. Seeing as this might take a
while to fix I think it makes sense to just revert the patch in the
meantime in order to avoid regressing master.
It is also not handled in intel_gles3_srgb_workaround and there may be
other cases where it breaks.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93388
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Previously the GL spec required that whenever glBlitFramebuffer is
used with either buffer being multisampled, the internal formats must
match. However the GL 4.4 spec was later changed to remove this
restriction. In the section entitled “Changes in the released
Specification of July 22, 2013” it says:
“Relax BlitFramebuffer in section 18.3.1 so that format conversion can
take place during multisample blits, since drivers already allow this
and some apps depend on it.”
If most drivers already allowed this in earlier versions I think it's
safe to assume that this is a spec bug and it should also be allowed
in all versions.
This patch just removes the restriction on desktop GL. For GLES there
are conformance tests that assert the previous behaviour so it is
probably safer to leave it in.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92706
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
"image" is not ready yet since it will be set at
the end of the function by: *image = *img;
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
We just ignored them altogether. While this feature is rather old-fashioned
supporting it is actually rather trivial.
This fixes the associated piglit tests (2 gl-1.0-edgeflag, 2 gl-2.0-edgeflag
and all (7) of point-vertex-id).
v2: comment fixes, and make the use of the edgeflag in clipmask consistent
with when it's actually there (should be impossible to hit a case where the
difference would actually matter but still...)
Reviewed-by: Brian Paul <brianp@vmware.com>
This just adds confusion, these parameters are used when fetching vertices
by translate, but certainly not when emitting hw vertices for drivers, they
make no sense there (setting them has no consequences otherwise since there
won't be any elements with instance_divisor set). So just set them to 0 (the
draw_pipe_vbuf code for emitting vertices when the draw pipeline is run
already does exactly that).
Also while here do some whitespace cleanup.
Reviewed-by: Brian Paul <brianp@vmware.com>
Now that we have a helper in the builder for system values and a helper in
core NIR to get the intrinsic opcode, there's really no point in having
things split out into a helper function. This commit "modernizes" this
pass to use helpers better and look more like newer passes.
Reviewed-by: Eric Anholt <eric@anholt.net>
The VC4_DEBUG=cl,qpu is nice and all, but I want to be able to get more
detailed dumps, and to replay the same exact commands in simulation. For
that I need a dump with all of the VBOs, shaders, shader recs, etc. This
dump can be parsed by vc4-gpu-tools.
For now this is only doable from simulator mode, because otherwise we
don't have access to the RCL contents generated by the kernel.
Any update here should have been the same as in
vc4_set_framebuffer_state(), except for the point where vc4_blit.c
temporarily sets different state for its different buffers.
This is apparently a weirdness of gallium -- nr_samples==1 is occasionally
used and means the same thing as nr_samples==0. Fixes a bunch of
ARB_framebuffer_srgb blit cases in piglit.
It's really harsh to abort() the X Server because of a momentary failure
(particularly -ENOMEM). I don't see a way to pass an -ENOMEM up the stack
from here, but we can at least log to stderr before proceeding on.
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Fixes the following Lintian (Debian package checker) error:
privacy-breach-logo
usr/share/doc/mesa-common-dev/contents.html
(http://sourceforge.net/sflogo.php?group_id=3&type=1)
usr/share/doc/mesa-common-dev/thanks.html
(http://sourceforge.net/sflogo.php?group_id=3&type=1)
The extended description of this tag is:
This package creates a potential privacy breach by fetching a logo
at runtime.
Before using a local copy you should check that the logo is suitable
for main. You can get help with determining this by posting a link to
the logo and a copy of, or a link to, the logo copyright and license
information to the debian-legal mailing list.
Please replace any scripts, images, or other remote resources with
non-remote resources. It is preferable to replace them with text and
links but local copies of the remote resources are also acceptable as
long as they don't also make calls to remote services. Please ensure
that the remote resources are suitable for Debian main before making
local copies of them.
Severity: serious, Certainty: possible
Check: files, Type: binary, udeb
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This test is a left-over of the initial development. It is unneeded and
misleading, so let's get rid of it.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
It seems like disabling earlyz on a4xx also, by defaults, disables
fragcoord.z to the FS. For frag shaders that both read fragcoord(.z)
and write fragdepth, we need to set some extra bits to prevent a
lockup.
This lets us get rid of the hack of disabling fragcoord.z (which
prevented 0ad from lockups, but resulted in rendering corruption). Also
fixes fbo-depth-sample-compare.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The picture id in this case is a VA-API surface handle, checking
for a certain value can't be correct.
Signed-off-by: Christian König <christian.koenig@amd.com>
v2: (by Ken, incorporating feedback from Matt Turner):
- Rewrite the push constant allocation code to be clearer.
- Only apply the minimum VS entries workaround on Gen 8.
v3: (by Ken)
- Fix a bug in v2 where we failed to allocate the full push constant
space when the number of enabled stages didn't divide the available
push constant space evenly. (Any left over space is now allocated
to the PS, as it was in v1.)
- Fix an off-by-one error in v2's number of enabled stages calculation.
- Use DIV_ROUND_UP for nicer formatting.
- Line wrapping fixes.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This change also adds explicit location support for structs and interfaces which
is currently missing in Mesa but is allowed with SSO and GLSL 1.50+.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
This makes the code easier to follow, should be more efficient
and will makes it easier to add matching via explicit locations
in the following patch.
This patch also replaces the hash table with the newer
resizable hash table this should be more suitable as the table
is likely to only contain a small number of entries.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
vertex header had both clip_pos and clip_vertex.
We only really need one (clip_pos) because the draw llvm shader would
overwrite the position output from the vs with the viewport transformed.
However, we don't really need the second one, which was only really used
for gl_ClipVertex - if the shader didn't have that the values were just
duplicated to both clip_pos and clip_vertex. So, just use this from the vs
output instead when we actually need it.
Also change clip debug to output both the data from clip_pos and the
clipVertex output (if available).
Makes some things more complex, some things less complex, but seems more
easy to understand what clipping actually does (and what values it uses
to do its magic).
Reviewed-by: Brian Paul <brianp@vmware.com
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Seems obvious now this should use the data from position and not clip_vertex
(albeit might not really make a difference).
Reviewed-by: Brian Paul <brianp@vmware.com
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
clip -> clip_vertex and pre_clip_pos -> clip_pos.
Looks more obvious to me what these values actually represent (so use
something resembling the vs output names).
Reviewed-by: Brian Paul <brianp@vmware.com
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is just for code cleanup, conceptually the have_clipdist really
isn't per-vertex state, so don't put it there (just dependent on the
shader). Even though there wasn't really any overhead associated with
this, we shouldn't store random shader information in the vertex header.
Reviewed-by: Brian Paul <brianp@vmware.com
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
I'm pretty sure this should use position (i.e. pre_clip_pos) and not
the output from clipVertex. Albeit piglit doesn't care. It is what we
use in the clip test, and it is what every other driver does (as they
don't even have clipVertex output and lower the additional planes to
clip distances).
Reviewed-by: Brian Paul <brianp@vmware.com
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is a newer convention, which we prefer over ALIGN(x, n) / n.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The compact VUE map only works when varying packing is in use.
Unfortunately, varying packing is disabled for TCS inputs.
This is needed to fix Piglit's tcs-input-read-array-interface test.
v2: Make lines fit in 80 columns (caught by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
TCS outputs and TES inputs both refer to a common "patch URB entry"
shared across all invocations. First, there are some number of
per-patch entries. Then, there are per-vertex entries accessed via
an offset for the variable and a stride times the vertex index.
Because these calculations need to be done in both the vec4 and scalar
backends, it's simpler to just compute the offset calculations in NIR.
It doesn't necessarily make much sense to use per-vertex intrinsics
afterwards, but that at least means we don't lose the per-patch vs.
per-vertex information.
v2: Use is_input/is_output helpers (suggested by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
TES outputs work exactly like VS outputs, so we can simply add a case
statement for those.
TCS inputs are very similar to geometry shaders - they're arrays of
per-vertex data. We use the same method I used for the scalar GS
backend.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Without varying packing, if a VS writes a compound variable, and the GS
only reads part of it, the base location of the variable may not
actually be in the VUE map.
To cope with this, we do lowering in terms of varying slots, add any
constant offsets to the base, and then do the VUE map remapping. This
ensures we only look up VUE map entries for slots which actually exist.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
My tessellation branch has two additional remap functions. I don't want
to replicate this logic there.
v2: Handle inputs/outputs separately (suggested by Jason Ekstrand).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Shared variables and input reworks landed around the same time.
Presumably, this was some sort of mistake in rebase conflict resolution.
This really only affects the num_indices field in nir_intrinsic_infos,
which is rarely used. However, it's used by the printer.
Found by inspection.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
GL_DRAW_FRAMEBUFFER does not exist in OpenGL ES 1.x, and since
_mesa_meta_begin hasn't been called yet, we have to work-around API
difficulties. The whole reason that GL_DRAW_FRAMEBUFFER is used instead
of GL_FRAMEBUFFER is that the read framebuffer may be different. This
is moot in OpenGL ES 1.x.
I have another patch series that would also fix this (by removing the
calls to _mesa_BindFramebuffer and friends), but it's not quite ready
yet... and I think it may be a bit heavy for some stable branches.
Consider this a stop-gap fix.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93215
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Remove unused variables from clear_state and use a hardcoded location
for color uniform to get rid of 2 more variables. Modify shaders to use
explicit location for vertex attribute too as extension is enabled.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
GRID Autosport uses SSO shaders. When a tessellation evaluation shader
is passed through this, it triggers assertion failures down the line
with unassigned varying locations. Make sure to do this when the first
shader in the pipeline is not a vertex shader.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Previously if the visual didn't have an alpha channel then it would
pick a format that is not sRGB-capable. I don't think there's any
reason not to always have an sRGB-capable visual. Since 28090b30 there
are now visuals advertised without an alpha channel which means that
games that don't request alpha bits in the config would end up without
an sRGB-capable visual. This was breaking supertuxkart which assumes
the winsys buffer is always sRGB-capable.
The previous code always used an RGBA format if the visual config
itself was marked as sRGB-capable regardless of whether the visual has
alpha bits. I think we don't actually advertise any sRGB-capable
visuals (but we just use sRGB formats anyway) so it shouldn't make any
difference. However this patch also changes it to use RGBX if an
sRGB-capable visual is requested without alpha bits for consistency.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92759
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
brw_init_surface_formats overrides the render format for RGBX formats
which aren't supported for rendering so that they internally use RGBA
instead. However, B8G8R8X8_SRGB was missing so it wasn't marked as a
renderable format. This patch just adds it.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With the current algorithm, we only look at tex uses. However there's a
write-after-write hazard where we might decide to, on some path, not use
a texture's output at all, but instead to write a different value to
that register. However without the barrier, the texture might complete
later and overwrite that value.
This fixes Unreal Elemental demo on GK110/GK208, flightgear on GK10x,
and likely other random-looking failures.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
In some cases shaders want non-default rounding when converting float to
integer. This can be done in one go, so merge the two ops. This comes up
in the packUnorm4x8 & co functions, as well as a few random shaders.
Overall shader-db impact is minimal, helping a handful of witcher2 and
other misc shaders.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The conversion of 32-bit integer multiplies into 16-bit ones happens
after the regular optimization loop. However it's fairly common to
multiply by a small integer, rendering some of the expansion pointless.
Firstly, propagate immediates when possible into mul ops, secondly just
remove the ops when they are unnecessary.
Including the change to generate imad immediates, the effect is:
total instructions in shared programs : 6365463 -> 6351898 (-0.21%)
total gprs used in shared programs : 728684 -> 728684 (0.00%)
total local used in shared programs : 9904 -> 9904 (0.00%)
total bytes used in shared programs : 44001576 -> 44036120 (0.08%)
local gpr inst bytes
helped 0 0 3288 4
hurt 0 0 0 842
It's easy for this to hurt bytes since we end up always generating the
8-byte form, while we can't always get rid of the immediate in question.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
There will usually be a split before the mad op, peer through that and
pick out the right word of the immediate.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Support emission of the short imad, but also include it in the various
logic that tries to make it possible to emit.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
On NV50, we use 16-bit reg units (to make it all work with half-regs). A
few places assumed that it was always in 32-bit units.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This helps in the use of GALLIUM_DDEBUG_SKIP: first run a target application
with skip set to a very large number and note how many draw calls happen
before the bug. Then re-run, skipping the corresponding number of calls.
Despite the additional run, this can still be much faster than not skipping
anything.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When we know that hangs occur only very late in a reproducible run (e.g.
apitrace), we can save a lot of debugging time by skipping the flush and hang
detection for earlier draw calls.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ARB_fragment_layer_viewport requires that if a fs reads layer or viewport
index but it wasn't output by gs (or vs with other extensions), then it reads
0. This never worked for llvmpipe, and is surprisingly non-trivial to fix.
The problem is the mechanism to handle non-existing outputs in draw is rather
crude, it will simply redirect them to whatever is at output 0, thus later
stages will just get garbage. So, rather than trying to fix this up (which
looks non-trivial), fix this up in llvmpipe setup by detecting this case there
and output a fixed zero directly.
While here, also optimize the hw vertex layout a bit - previously if the gs
outputted layer (or vp) and the fs read those inputs, we'd add them twice
to the vertex layout, which is unnecessary.
And do some minor cleanup, slots don't require that many bits, there was some
bogus (but harmless) float/int mixup for psize slot too, make the slots all
unsigned (we always put pos at pos zero thus everything else has to be positive
if it exists), and make sure they are properly initialized (layer and vp index
slot were not which looked fishy as they might not have got set back to zero
when changing from a gs which outputs them to one which does not).
This fixes the failures in piglit's arb_fragment_layer_viewport group
(3 each for layer and vp).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This greatly reduces the number of SetSamplers() commands for some
applications.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The offset for loads is in src[0]. This was a copy+paste error in the
nir_intrinsic_load/store refactoring. This commit fixes a segfault in
ES31-CTS.compute_shader.work-group-size. I have no idea how piglit failed
to catch this...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93348
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This is brw_gs_surface_state.c copy and pasted twice with search and
replace.
brw_binding_table.c code is similarly copy and pasted.
v2: Drop dword_pitch related fields.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tessellation evaluation shaders work almost identically to vertex
shaders - we have a set of URB writes at the end of the program, and the
last one should terminate it.
Geometry shaders really are the special case, where multiple
EmitVertex() calls trigger URB writes in the middle of the program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tessellation evaluation shaders will use g4 instead. For now, make an
fs_reg called urb_handle and use that in place of hardcoding g1.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This is a helper function for setting up the local invocation ID
payload according to the cs_prog_data generated by the compiler. It's
intended to be available to users of libi965_compiler so move it there.
GL likes to saturate your incoming color, but if that color's coming from
unpacking from unorms, there's no point. Ideally we'd have a range
propagation pass that cleans these up in NIR, but that doesn't seem to be
going to land soon. It seems like we could do a one-off optimization in
nir_opt_algebraic, except that doesn't want to operate on expressions
involving unpack_unorm_4x8, since it's sized.
total instructions in shared programs: 87879 -> 87761 (-0.13%)
instructions in affected programs: 6044 -> 5926 (-1.95%)
total estimated cycles in shared programs: 349457 -> 349252 (-0.06%)
estimated cycles in affected programs: 6172 -> 5967 (-3.32%)
No SSPD on openarena (which had the biggest gains, in its VS/CSes), n=15.
The caller isn't going to expect it from a return, so it would probably
get misinterpreted. If the caller had an unpack in its reg, that's fine,
but don't lose track of it.
I apparently broke this in a late refactor, in such a way that I decided
its tests were some of those interminable ones that I should just
blacklist from my testing. As a result, the refactors related to it were
totally wrong.
Mostly related to making sure the rasterizer can correctly
pick out the correct scissor box for the current viewport.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
When GL_FRAMEBUFFER_SRGB is enabled any single-sampled renderbuffers
are resolved in intel_update_state because the hardware can't cope
with fast clears on SRGB buffers. In that case it's pointless to do a
fast clear because it will just be immediately resolved.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
SRGB buffers are not marked as losslessly compressible so previously
they would not be used for fast clears. However in practice the
hardware will never actually see that we are using SRGB buffers for
fast clears if we use the linear equivalent format when clearing and
make sure to resolve the buffer as a linear format before sampling
from it.
This is an important use case because by default the window system
framebuffers are created as SRGB so without this fast clears won't be
used there.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
SKL can't cope with the CCS buffer for SRGB buffers. Normally the
hardware won't see the SRGB formats because when GL_FRAMEBUFFER_SRGB
is disabled these get mapped to their linear equivalents. In order to
avoid relying on the CCS buffer when it is enabled this patch now
makes it flush the renderbuffers.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
For single-sampled textures the MCS buffer is only used to implement
fast clears. However the surface always needs to be resolved before
being used as a texture anyway so the the MCS buffer doesn't actually
achieve anything. This is important for Gen9 because in that case SRGB
surfaces are not supported for fast clears and we don't want the
hardware to see the MCS buffer in that case.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Adds MESA_META_FRAMEBUFFER_SRGB to the meta save state so that
GL_FRAMEBUFFER_SRGB will be disabled when performing the fast clear.
That way the render surface state will be programmed with the linear
equivalent format during the clear. This is important for Gen9 because
the SRGB formats are not marked as losslessly compressible so in
theory they aren't support for fast clears. It shouldn't make any
difference whether GL_FRAMEBUFFER_SRGB is enabled for the fast clear
operation because the color is not actually written to the framebuffer
so there is no chance for the hardware to apply the SRGB conversion on
it anyway.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This simplified (basically duplicated) version of pb_cache_manager will
allow removing some ugly hacks from radeon and amdgpu winsyses and
flatten simplify their design.
The difference is that winsyses must manually add buffers to the cache
in "destroy" functions and the cache doesn't know about the buffers before
that. The integration is therefore trivial and the impact on the winsys
design is negligible.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
This is the recommended setting according to hw people and it makes Hyper-Z
stable. Just the two magic states.
This fixes Evergreen, Cayman, SI, CI, VI (using the Cayman code).
Cc: 11.0 11.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit 32f05fadbb.
It turned out the problem with Stoney was caused by incorrect handling of
a non-power-two VRAM size in the kernel driver.
This is an optional BIOS setting and can be worked around by choosing
a different VRAM size in the BIOS.
Cc: 11.1 <mesa-stable@lists.freedesktop.org>
If we have a dmat2[4], then dmat2[0] is at 17, dmat2[1] at 19,
dmat2[2] at 21 etc. The old code was returning 17,18,19.
I think this code is also wrong for float matricies as well.
There is now a piglit for the float case.
This partly fixes:
GL41-CTS.vertex_attrib_64bit.limits_test
[airlied: update with Tapani suggestion to clean it up].
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Discovered this when working on other clip code, apparently didn't work
correctly - the combination of linear interpolated values and using
gl_ClipVertex produced wrong values (failing all such combinations
in piglits glsl-1.30 interpolation tests, named
interpolation-noperspective-XXX-vertex).
Use the pre-clip-pos values when determining the interpolation factor to
fix this.
Noone really understands this code well, but everybody agrees this looks
sane... This fixes all those failing tests (10 in total) both with
the llvm and non-llvm draw paths, with no piglit regressions.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
There is some special-casing needed in a competent back-end. However, they
can do their special-casing easily enough based on whether or not the
offset is a constant. In the mean time, having the *_indirect variants
adds special cases a number of places where they don't need to be and, in
general, only complicates things. To complicate matters, NIR had no way to
convdert an indirect load/store to a direct one in the case that the
indirect was a constant so we would still not really get what the back-ends
wanted. The best solution seems to be to get rid of the *_indirect
variants entirely.
This commit is a bunch of different changes squashed together:
- nir: Get rid of *_indirect variants of input/output load/store intrinsics
- nir/glsl: Stop handling UBO/SSBO load/stores differently depending on indirect
- nir/lower_io: Get rid of load/store_foo_indirect
- i965/fs: Get rid of load/store_foo_indirect
- i965/vec4: Get rid of load/store_foo_indirect
- tgsi_to_nir: Get rid of load/store_foo_indirect
- ir3/nir: Use the new unified io intrinsics
- vc4: Do all uniform loads with byte offsets
- vc4/nir: Use the new unified io intrinsics
- vc4: Fix load_user_clip_plane crash
- vc4: add missing src for store outputs
- vc4: Fix state uniforms
- nir/lower_clip: Update to the new load/store intrinsics
- nir/lower_two_sided_color: Update to the new load intrinsic
NIR and i965 changes are
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NIR indirect declarations and vc4 changes are
Reviewed-by: Eric Anholt <eric@anholt.net>
ir3 changes are
Reviewed-by: Rob Clark <robdclark@gmail.com>
NIR changes are
Acked-by: Rob Clark <robdclark@gmail.com>
There was way too much incrementing of things going on. Instead, let's
just start everything off at the right base location, and then increment in
the loop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In case a state tracker unbinds every slot by a seperate
pipe->set_vertex_buffers() call, starting from slot zero, the number
of bound buffers would not reach zero at all.
The current algorithm does not account for pre-existing holes in the
buffer list.
Unbinding all buffers at once or starting at the top-most slot results
in correct behaviour.
Calculating the correct number of bound buffers fixes a NULL pointer
dereference in nvc0_validate_vertex_buffers_shared().
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93004
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
According to the GLES3 spec, blitting between multisample FBOs with
different internal formats should not be allowed. The
compatible_resolve_formats function implements this check. Previously
it had a shortcut where if the Mesa formats of the two renderbuffers
were the same then it would assume the blit is ok. However some
drivers map different internal formats to the same Mesa format, for
example it might implement both GL_RGB and GL_RGBA textures with
MESA_FORMAT_R8G8B8A_UNORM. The function is used to generate a GL error
according to what the GL spec requires so the blit should not be
allowed in that case. This patch just removes the shortcut so that it
only ever looks at the internal format.
Note that I posted a related patch to disable this check altogether
for desktop GL. However this function is still used on GLES3 because
there are conformance tests that require this behaviour so this patch
is still useful.
Cc: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The tiled memcpy doesn't work for copying from RGBX to RGBA because it
doesn't override the alpha component to 1.0. Commit 2cebaac479 added
a check to disable it for RGBX formats by looking at the TexFormat.
However a lot of the rest of the code base is written with the
assumption that an RGBA texture can be used internally to implement a
GL_RGB texture. If that is done then this check breaks. This patch
makes it instead check the base format of the texture which I think
more directly matches the intention.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Since Gen8 this is allowed as a rendering target so we don't need to
override it to B8G8R8A8. This is helpful on Gen9+ where using this
override causes fast clears not to work.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Previously fast clear was disallowed on Gen9 for MSRTs with the claim
that some formats don't work but we didn't understand why. On further
investigation it seems the formats that don't work are the ones where
the render surface format is being overriden to a different format
than the one used for texturing. The one used for texturing is not
actually a renderable format. It arguably makes sense that the sampler
hardware doesn't handle the fast color correctly in these cases
because it shouldn't be possible to end up with a fast cleared surface
that is non-renderable.
This patch changes the limitation to prevent fast clear for surfaces
where the format for rendering is overriden.
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
If GL_FRAMEBUFFER_SRGB is enabled when writing to an SRGB-capable
framebuffer then the color will be converted from linear to SRGB
before being written. There is no chance for the hardware to do this
itself because it can't modify the clear color that is programmed in
the surface state so it seems pretty clear that the driver should be
handling this itself.
Note that this wasn't a problem before Gen9 because previously we were
only able to do fast clears to 0 or 1 and those values are the same in
linear and SRGB space.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Enable ARB_compute_shader on gen7+, on hardware that supports the
OpenGL 4.3 requirements of a local group size of 1024.
With SIMD16 support, this is limited to Ivy Bridge and Haswell.
Broadwell will work with a local group size up to 896 on SIMD16
meaning programs that use this size or lower should run when setting
MESA_EXTENSION_OVERRIDE=GL_ARB_compute_shader.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Shared variables can be accessed by other threads within the same
local workgroup. This prevents us from performing certain
optimizations with shared variables.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
When an intrinsic atomic operation is used on a shared variable, we
translate it to a new 'shared variable' specific intrinsic function
call.
For example, a call to __intrinsic_atomic_add when used on a shared
variable will be translated to a call to
__intrinsic_atomic_add_shared.
v3:
* Fix stale comments copied from SSBOs (Iago)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The compiler probably already blocks this earlier on, but we should be
checking for an SSBO here.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
When an atomic function is called, we need to check to see if it is
for an SSBO variable before lowering it to the SSBO specific intrinsic
function.
v2:
* is_in_buffer_block => is_in_shader_storage_block (Iago)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The atomic functions can also be used with shared variables in compute
shaders.
When lowering the intrinsic in lower_ubo_reference, we still create an
SSBO specific intrinsic since SSBO accesses can be indirectly
addressed, whereas all compute shader shared variable live in a single
shared variable area.
v2:
* Also remove the _internal suffix from ssbo atomic intrinsic names (Iago)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
In this lowering pass, shared variables are decomposed into intrinsic
calls.
v2:
* Send mem_ctx as a parameter (Iago)
v3:
* Shared variables don't have an associated interface block (Iago)
* Always use 430 packing (Iago)
* Comment / whitespace cleanup (Iago)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This code will also be usable by the pass to lower shared variables.
Note, that *const_offset is adjusted by setup_buffer_access so it must
be initialized before calling setup_buffer_access.
v2:
* Add comment for lower_buffer_access::setup_buffer_access
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This class has code that will be shared by lower_ubo_reference and
lower_shared_reference. (lower_shared_reference will be used to
support compute shader shared variables.)
v2:
* Add lower_buffer_access.h to makefile (Emil)
* Remove static is_dereferenced_thing_row_major from
lower_buffer_access.cpp. This will become a lower_buffer_access
method in the next commit.
* Pass mem_ctx as parameter rather than using a member variable (Iago)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This allows the code in emit_access to be generic enough to also be
for lowering shared variables.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
* Rename ssbo_get_array_length to ssbo_unsized_array_length_access (Iago)
* Use always use this-> when referencing buffer_access_type (Iago)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Otherwise packed and inactive varyings get optimized away. This needs
to be prevented when using separate shader objects where interface
needs to be preserved.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This will cause validation to run during next draw, this is done
because possible changes in used stages and programs can cause
invalid pipeline state.
This fixes a subtest in following CTS test:
ES31-CTS.sepshaderobjs.StateInteraction
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Doesn't have any effect in practice I don't think, but
CTS reads back using GetVertexAttrib.
This fixes: GL41-CTS.vertex_attrib_64bit.get_vertex_attrib
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
src/gallium/tests/trivial/compute.c expects samplers to be cleaned
when the samplers list is NULL.
Like in radeon, the function behave like when the number of samplers
parameter is set to 0.
[small s/hwsco/hwcso/ typo fix]
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Let us avoid trapping in hardware from a SIGFPE and instead
assert on a zero divisor.
Hint: This can occur if a PIPE_PRIM_? is not handled in
u_prim_vertex_count() that results in ' info ' not
being initialized in the expected manner.
Further, we also fix a possibly NULL pointer dereference
from ' info ' being NULL from a u_prim_vertex_count() call.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
When a buffer is created with GL_STATIC_DRAW, its contents should not
be changed frequently. But that's exactly what one application I'm
debugging does. This patch adds code to try to detect inefficient
buffer use in a couple places. The GL_ARB_debug_output mechanism is
used to report the issue.
NVIDIA's driver detects these sort of things too.
Other types of inefficient buffer use could also be detected in the
future.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Every other gen the representation of the URB size was changed and
previous ones weren't updated. I'd be willing to write a series
normalizing this to be KB on all generations if anybody else cares.
This is going to require some rather intrusive kernel changes to fix
properly, in the meantime (and forever on at least pre-v4.1 kernels)
we'll have to restore the hardware defaults at the end of every batch
in which the L3 configuration was changed to avoid interfering with
the DDX and GL clients that use an older non-L3-aware version of Mesa.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2: Optimize look-up of the default configuration by assuming it's the
first entry of the L3 config array in order to avoid an FPS
regression in GpuTest Triangle and SynMark OglBatch2-7 on most
affected platforms.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The L3 state atom calculates the target L3 partition weights when the
program bound to some shader stage is modified, and in case they are
far enough from the current partitioning it makes sure that the L3
state is re-emitted.
v2: Fix for inconsistent units the context URB size is expressed in.
Clamp URB size to 1008 KB on SKL due to FF hardware limitation.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This calculates a rather conservative partitioning of the L3 cache
based on the shaders currently bound to the pipeline and whether they
use SLM, atomics, images or scratch space. The result is intended to
be fine-tuned later on based on other pipeline state.
Note that the L3 partitioning calculated for VLV in the non-SLM non-DC
case differs from the hardware defaults in that it doesn't include a
DC partition and has twice as much RO cache space -- This is an
intentional functional change that improves performance in several
bandwidth-bound benchmarks on VLV (5% significance): SynMark
OglTexFilterAniso by 14.18%, SynMark OglTexFilterTri by 7.15%, Unigine
Heaven by 4.91%, SynMark OglShMapPcf by 2.15%, GpuTest Fur by 1.83%,
SynMark OglDrvRes by 1.80%, SynMark OglVsTangent by 1.71%, and a few
other benchmarks from the Finnish system by less than 1%.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The input of the L3 set-up code is a vector giving the approximate
desired relative size of each partition. This implements logic to
compare the input vector against the table of validated configurations
for the device and pick the closest compatible one.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Improves performance of the arb_shader_image_load_store-atomicity
piglit test by over 25x (which isn't a real benchmark it's just heavy
on atomics -- the improvement in a microbenchmark I wrote a while ago
seemed to be even greater). The drawback is one needs to be
extra-careful not to hang the GPU (in fact the whole system). A DC
partition must have been allocated on L3, the "convert L3 cycle for DC
to UC" bit may not be set, the MOCS L3 cacheability bit must be set
for all surfaces accessed using DC atomics, and the SCRATCH1 and
ROW_CHICKEN3 bits must be kept in sync.
A fairly recent kernel is required for the command parser to allow
writes to these registers.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
It should be possible to use additional L3 configurations other than
the ones listed in the tables of validated allocations ("BSpec »
3D-Media-GPGPU Engine » L3 Cache and URB [IVB+] » L3 Cache and URB [*]
» L3 Allocation and Programming"), but it seems sensible for now to
hard-code the tables in order to stick to the hardware docs. Instead
of setting up the arbitrary L3 partitioning given as input, the
closest validated L3 configuration will be looked up in these tables
and used to program the hardware.
The included tables should work for Gen7-9. Note that the quantities
are specified in ways rather than in KB, this is because the L3
control registers expect the value in ways, and because by doing that
we can re-use a single table for all GT variants of the same
generation (and in the case of IVB/HSW and CHV/SKL across different
generations) which generally have different L3 way sizes but allow the
same combinations of way allocations.
v2: Use slice count from the devinfo structure instead of the gt
number to implement get_l3_way_size().
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
According to the hardware docs a DC flush is sufficient to make
CS_STALL happy, there's no need to add STALL_AT_SCOREBOARD whenever
it's present.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This will make sure that we recalculate the URB layout anytime the URB
size is modified by the L3 partitioning code.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This stores the result of can_do_pipelined_register_writes() in the
context struct so we can find out later whether LRI can be used to
program the L3 configuration.
v2:
* Split change of gen check in can_do_pipelined_register_writes (jljusten)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Allow for pipelined register writes for gen < 7.
v2:
* Split from another patch and adjust comment (jljusten)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This fixes:
glsl-1.50/execution/geometry/dynamic_input_array_index.shader_test
my profanity.
We need to load the AR register with the value from the index reg
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes:
arb_transform_feedback3-ext_interleaved_two_bufs_gs
arb_transform_feedback3-ext_interleaved_two_bufs_gs_max
transform-feedback-builtins
If we are only emitting one ring, then emit all output
buffers on it.
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
See: `commit e82c527f1fc2f8ddc64954ecd06b0de3cea92e93`
which is where a block in src maps to a pixel in dst and vice versa.
e.g. DXT1 <-> R32G32_UINT
DXT5 <-> R32G32B32A32_UINT
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The only effect here is a space savings - 822 programs in shader-db
affected with the following overall change:
total bytes used in shared programs : 44154976 -> 44139880 (-0.03%)
Fixes: 641eda0c (nv50/ir: r63 is only 0 if we are using less than 63 registers)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
This allows us to use the short encoding, and potentially fold
immediates in later on.
total instructions in shared programs : 6379731 -> 6367861 (-0.19%)
total gprs used in shared programs : 728502 -> 728683 (0.02%)
total local used in shared programs : 9904 -> 9904 (0.00%)
total bytes used in shared programs : 44661008 -> 44154976 (-1.13%)
local gpr inst bytes
helped 0 51 7267 20306
hurt 0 232 125 274
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Operations that take immediates can only encode registers up to 64. This
fixes a shader in a "Powered by Unity" intro.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
We already semi-did this but the list of uses as unsorted, so it was
unreliable. Sort the uses by bb and serial, and don't unspill for each
instruction in a sequence. (And also don't unspill multiple times for a
single instruction that uses the value in question multiple times.)
This causes a minor reduction in generated instructions for shader-db
(as few programs spill) but more importantly it brings determinism to
each run's output.
On SM10:
total instructions in shared programs : 6387945 -> 6379359 (-0.13%)
total gprs used in shared programs : 728544 -> 728544 (0.00%)
total local used in shared programs : 9904 -> 9904 (0.00%)
local gpr inst bytes
helped 0 0 322 322
hurt 0 0 0 0
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes the fetching of fp64 inputs to the geometry shader,
this fixes the recently posted piglit's
arb_gpu_shader_fp64/execution/gs-fs-vs-double-array.shader_test
arb_vertex_attrib_64bit/execution/gs-fs-vs-attrib-double-array.shader_test
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
v2: Move new rule to Boolean simplification section
Add a a@bool != true simplification
Suggested-by: Neil Roberts <neil@linux.intel.com>
... and allow the "binding" qualifier in ES 3.1 as well.
GLSL ES 3.1 incorporates only a few features from the extension
ARB_shading_language_420pack: the relaxed qualifier ordering
requirements and the binding qualifier.
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unforunately the Appveyor -> SourceForge connection seems a bit
unreliable, causing frequent build failures while downloading
winflexbison (approx once every 2 days).
Fetching winflexbison archive into Appveyor's cache should eliminate
these.
Fetching Python modules from PyPI doesn't seem to be a problem, so they
are left alone for now, though they could eventually get the same
treatment.
We still have several failures in the newly enabled tests in simulation:
sRGB downsampling is done as if it was just linear, stencil blits are not
supported on MSAA either, and derivatives are still not supported
(breaking some MSAA simulation shaders). So, other than sRGB downsampling
quality, things seem to be in good shape.
This is the core of ARB_texture_multisample. Most of the piglit tests for
GL_ARB_texture_multisample require GL 3.0, but exposing support for this
lets us use the gallium blitter for multisample resolves. We can
sometimes multisample resolve using just the RCL, but that requires that
the blit is 1:1, unflipped, and aligned to tile boundaries.
This includes GL_SAMPLE_COVERAGE, GL_SAMPLE_ALPHA_TO_ONE, and
GL_SAMPLE_ALPHA_TO_COVAGE.
I haven't implemented a dithering function yet, and gallium doesn't give
me a good chance to do so for GL_SAMPLE_COVERAGE.
I only stumbled on this while experimenting due to reading about HW-2905.
I don't know if the EZ disable in the Z-clear is actually necessary, but
go with it for now.
The recent unaligned fix successfully prevented RCL blits that weren't
aligned inside of the surface, but we also want to be able to do RCL blits
for the whole surface when the width or height of the surface aren't
aligned (we don't care what renders inside of the padding).
This commit pushes makes uniform offsets be terms of bytes starting with
nir_lower_io. They get converted to be in terms of vec4s or floats when we
cram them in the UNIFORM register file but reladdr remains in terms of
bytes all the way down to the point where we lower it to a pull constant
load.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The one and only place where the FS backend allows reladdr is on uniforms.
For locals, inputs, and outputs, we lower it away before the backend ever
sees it. This commit gets rid of the dead indirect handling code.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, the VS_OPCODE_PULL_CONSTANT_LOAD opcode operated on
vec4-aligned byte offsets on Iron Lake and below and worked in terms of
vec4 offsets on Sandy Bridge. On Ivy Bridge, we add a new *LOAD_GEN7
variant which works in terms of vec4s. We're about to change the GEN7
version to work in terms of bytes, so this is a nice unification.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It is legal to have a texture view of a single layer from a 2D array texture;
you can sample from it, or render to it. Intel hardware needs to be made aware
when it is using a 2d array surface in the surface state. The texture view is
just a 2d surface with the backing miptree actually being a 2d array surface.
This caused the previous code would not set the right bit in the surface state
since it wasn't considered an array texture.
I spotted this early on in debug but brushed it off because it is clearly not
needed on other platforms (since they all pass). I have no idea how this works
properly on other platforms (I think gen7 introduced the bit in the state, but I
am too lazy to check). As such, I have opted not to modify gen7, though I
believe the current code is wrong there as well.
Thanks to Chris for helping me debug this.
v2: Just use the underlying mt's target type to make the array determination.
This replaces a bug in the first patch which was incorrectly relying only
on non-zero depth (not sure how that had no failures). (Ilia)
Cc: Chris Forbes <chrisf@ijw.co.nz>
Reported-by: Mark Janes <mark.a.janes@intel.com> (Jenkins)
References: https://www.opengl.org/registry/specs/ARB/texture_view.txt
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92609
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
For some reason this has been disabled for integers ever since codegen
was merged, despite there being emission code for IMAD. Seems to work.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
According to nvdisasm both the immediate and non-imm cases use the same
bits. Both of these flags are quite rarely set though.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
From the 3DSTATE_URB_DS documentation:
"Project: IVB, HSW
If Domain Shader Thread Dispatch is Enabled then the minimum number of
handles that must be allocated is 10 URB entries."
"Project: BDW+
If Domain Shader Thread Dispatch is Enabled then the minimum number of
handles that must be allocated is 34 URB entries."
When the HS is run in SINGLE_PATCH mode (the only mode we support
today), there is no minimum for HS - it's just zero.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For now, this just splits the existing code to disable these stages into
separate atoms/files. We can then replace it with real code.
v2: Bump the render atoms in this patch so it compiles (in my branch,
I'd bumped it in an earlier patch). 61 seems to be the minimum
that works, which doesn't match the old value + the number of atoms
I added in this patch, so apparently we had some slop before.
v3: Actually disable the DS unit on Gen8+.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
We actually leave the sampler unset for OP_TXF, which caused the GK104+
logic to treat some texel fetches as indirect. While this works, it's
incredibly wasteful. This only happened when the texture was > 0 (since
sampler remained == 0).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
_mesa_is_array_texture provides the same functionality and:
1. it returns bool instead of GLboolean
2. it's not related to the texture format (texformat.c)
3. the name's a little shorter
v2: remove _mesa_tex_target_is_array instead (Brian Paul)
Reviewed-by: Brian Paul <brianp@vmware.com>
Both methods provide the same functionality, so one would be
removed.
v2: use _mesa_is_array_texture and not the other way (Brian Paul)
Reviewed-by: Brian Paul <brianp@vmware.com>
Use the new debug callback hook to report conformance, performance
and fallbacks to the state tracker. The state tracker, in turn can
report this issues to the user via the GL_ARB_debug_output extension.
More issues can be reported in the future; this is just a start.
v2: remove conditionals around pipe_debug_message() calls since the
check is now done in the macro itself.
v3: remove unneeded dummy %s substitutions
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>,
Reviewed-by: José Fonseca <jfonseca@vmware.com>
So the callers don't have to do it.
v2: also check cb!=NULL in the macro
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This function is unfinished there is a bunch more validation rules
that need to be applied here. We will still want to call it for desktop
GL we just don't want to validate precision so move the ES check to
reflect this.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The validation api doesn't trigger this error so just move it to the
code called during rendering.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
At last on ARUBA this is required to stop tessellation hanging
in heaven.
This removes one of the SIMDs from use by the HS/LS.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Tested-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This enables tessellation for evergreen/cayman,
This will need changes before committing depending
on what hw works etc.
working are CAYMAN/REDWOOD/BARTS/TURKS/SUMO/CAICOS
v2: only enable on evergreen and above.
Reads from the queue shouldn't be merged for now read operations.
Reads from the queue shouldn't be merged for now, or put in
T slots.
Signed-off-by: Dave Airlie <airlied@redhat.com>
At least one SIMD must be kept away from the HS/LS
stages in order to avoid a hw issue on evergreen/cayman.
This patch implements this workaround.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds handling for TESSINNER/TESSOUTER in the TES
where they need to be fetched from LDS,
and TESSCOORD which comes in via r0.
It also handle primitive ID and invocation ID.
Signed-off-by: Dave Airlie <airlied@redhat.com>
when tessellation is enabled the TES shader is responsible
for handling streamout and exports.
This adds the streamout and export workarounds to TES,
and also makes sure TES sets up spi_sid.
Signed-off-by: Dave Airlie <airlied@redhat.com>
When we are finished the shader, we read back all the tess factors
from LDS and write them to special global memory storage using
GDS instructions.
This also handles adding NOP when GDS or ENDLOOP end the TCS.
Signed-off-by: Dave Airlie <airlied@redhat.com>
TCS outputs whenever they are written in the shader,
need to be written to LDS not temporaries, this handles
this case. It also fixes up the case where the output
is a relative addressed output, so we don't try to apply
the relative address at the wrong time.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This handles the logic for doing fetches from LDS for
TCS and TES. For TCS we need to fetch both inputs and outputs,
for TES only inputs need to be fetched.
v2: use 24-bit ops.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This retrievs the offset into the LDS for a patch or
non-patch variable, it takes the RelPatch channel
and a temporary register.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This function retrieves the tess input/output info
from the tess constant buffer that is bound to the shader.
This uses a vfetch to get the values into the shader.
Signed-off-by: Dave Airlie <airlied@redhat.com>
These utilities are to be used to do things like integer adds and
multiplies to be used in calculating the LDS offsets etc.
It handles CAYMAN MULLO differences as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
When using tessellation on eg/ni chipsets, we must disable
dynamic GPRs to workaround a hw bug where the GPU hangs
when too many things get queued.
This implements something like the r600 code to emit
the transition between static and dynamic GPRs, and to
statically allocate GPRs when tessellation is enabled.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we have no tess control shader, then we have to use a fallback
one that just writes the tessellation factors.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This creates a constant buffer with the information about
the layout of the LDS memory that is given to the vertex, tess
control and tess evaluation shaders.
This also programs the LDS size and the LS_HS_CONFIG registers,
on evergreen only.
v2: calculate lds hs num waves properly (Marek)
Emit the state only when something has changed (airlied).
Signed-off-by: Dave Airlie <airlied@redhat.com>
This update the setting of the shader stages register
when tess is enabled and add the setting of the VGT_TF_PARAM
register from the tess shader properties.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This initialises the tess min/max using fglrx values,
and also initialises a number of other registers related
to tessellation.
v1.1: caicos doesn't have some registers.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds printing for the hw shader types, and hooks it up.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds the LDS ops to the SB bytecode reader/writers.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These are used in tessellation shaders to read/write values
between VS/TCS/TES.
This splits the eg alu assembler out to handle these
instructions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds support to the decoder, not actual SB support.
v1.1: fixup GDS relative mode. (Glenn).
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This function is going to get a lot messier with tessellation
so I'm going to use some macros to try and clean some bits
of common code up.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves to using an array of hw stages for the atoms.
Note this drops the 23 from the vertex shader, this value
is calculated internally when shaders are bound, so not
required here.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This changes the r600 specific GPR adjustment code
to use the stage defines, and arrays.
This is prep work for the tess changes later.
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add a list of defines for the HW stages.
We will use this for GPR calculations amongst other things.
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Use NULL tests of the form `if (ptr)' or `if (!ptr)'.
They do not depend on the definition of the symbol NULL.
Further, they provide the opportunity for the accidental
assignment, are clear and succinct.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Use NULL tests of the form `if (ptr)' or `if (!ptr)'.
They do not depend on the definition of the symbol NULL.
Further, they provide the opportunity for the accidental
assignment, are clear and succinct.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
On SM20 this gives:
total instructions in shared programs : 6299222 -> 6294240 (-0.08%)
total gprs used in shared programs : 944139 -> 944068 (-0.01%)
total local used in shared programs : 54116 -> 54116 (0.00%)
local gpr inst bytes
helped 0 126 2781 2781
hurt 0 55 11 11
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This way $r1 = $r0 + 4; c1[$r1] becomes c1[$r0+4].
On SM35:
total instructions in shared programs : 6206257 -> 6185058 (-0.34%)
total gprs used in shared programs : 911045 -> 910722 (-0.04%)
total local used in shared programs : 39072 -> 39072 (0.00%)
local gpr inst bytes
helped 0 417 4195 4195
hurt 0 280 0 0
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This works when the add also has an immediate. This often happens in
address calculations. These addresses can then be inlined as well.
On code targeted to SM35:
total instructions in shared programs : 6223346 -> 6206257 (-0.27%)
total gprs used in shared programs : 911075 -> 911045 (-0.00%)
total local used in shared programs : 39072 -> 39072 (0.00%)
local gpr inst bytes
helped 0 119 3664 3664
hurt 0 74 15 15
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Even if the rasterizer has scissor disabled, we'll have whatever
vc4->scissor bounds were last set when someone set up a scissor, so we
shouldn't clip to them in that case.
Fixes piglit fbo-blit-rect, and a lot of MSAA tests once they're enabled.
We could potentially handle scissored blits when they're tile aligned, but
it doesn't seem worth it. If you're doing a scissored blit, you're
probably a testcase.
Fixes piglit's fbo-scissor-blit fbo
This implements more performance metrics than the previous support,
but some other metrics still need to be figured out.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
These performance metrics will be re-introduced in an upcoming
patch that will follow the same design as Fermi.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
inst_issued is performance metric not a hardware event on Kepler (SM30).
It will be re-introduced in an upcoming patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
SM30 is the compute capability version for GK104/GK106/GK107.
This also introduces a new signal group selection called UNK0F.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Move these to 'disasm' instead of the more verbose 'optmsgs' since, like
the tgsi dumps, it is useful without the more verbose compiler logging
enabled.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For MSAA, we store full resolution tile buffer contents, which have their
own tiling format. Since they're full resolution buffers, we have to
align their size to full tiles.
We were checking that the blit started at 0 and was 1:1, but not that it
went to the full width of the surface, or that the width was aligned to a
tile. We then told it to blit to the full width/height of the surface,
causing contents to be stomped in a bunch of MSAA tests that happen to
include half-screen-width blits to 0,0.
I've played with a few different approaches to tweak instruction
priority according to how much they increase/decrease register pressure,
etc. But nothing seems to change the fact that compared to original
(pre-multiple-block-support) scheduler, in some edge cases we are
generating shaders w/ 5-6x higher register usage.
The problem is that the priority queue approach completely looses the
dependency between instructions, and ends up scheduling all paths at the
same time.
Original reason for switching was that recursive approach relied on
starting from the shader outputs array. But we can achieve more or less
the same thing by starting from the depth-sorted list.
shader-db results:
total instructions in shared programs: 113350 -> 105183 (-7.21%)
total dwords in shared programs: 219328 -> 211168 (-3.72%)
total full registers used in shared programs: 7911 -> 7383 (-6.67%)
total half registers used in shader programs: 109 -> 109 (0.00%)
total const registers used in shared programs: 21294 -> 21294 (0.00%)
half full const instr dwords
helped 0 322 0 711 215
hurt 0 163 0 38 4
The shaders hurt tend to gain a register or two. While there are also a
lot of helped shaders that only loose a register or two, the more
complex ones tend to loose significanly more registers used. In some
more extreme cases, like glsl-fs-convolution-1.shader_test it is more
like 7 vs 34 registers!
Signed-off-by: Rob Clark <robclark@freedesktop.org>
It causes confusion in sched if we need to split_addr() since otherwise
we wouldn't easily know which block the new addr instr will be scheduled
in. So just side-step the whole situation.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We'll need to add similar for ir3_instruction, but following the pattern
to use 'id' seems confusing. Let's just go w/ generic 'data' as the
name.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Undefining the NDEBUG is relevant for release build, as they are the
ones that set it.
[Emil Velikov: split from previous patch]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This follows the src/util/u_atomic_test.c model of undefining NDEBUG
unconditionally throughouth the XvMC tests, to force asserts regardless
of debug mode.
The comment on u_atomic_test.c is also fixed (read 'debug' where it
should have been 'release').
v2: s/debug/release/ in relevant comments
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
[Emil Velikov: keep the src/util/ hunk as separate patch]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Since we're using nir_lower_outputs_to_temporaries to shadow all our
outputs, it's impossible to actually get an indirect store. The code we
had to "handle" this was pretty bogus as it created a register with a
reladdr and then stuffed it in a fixed varying slot without so much as a
MOV. Not only does this not do the MOV, it also puts the indirect on the
wrong side of the transaction. Let's just delete the broken dead code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's not really buying us anything at this point. It's just a way of
remapping one offset namespace onto another. We can just use the location
namespace the whole way through.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The original change to put zeroes directly into instructions created
conditional mov's with the zero immediate. However that can't be
emitted, so make sure to replace the zero with r63.
Fixes: 52a800a68 (nv50/ir: allow immediate 0 to be loaded anywhere)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
A situation where there's a 128-bit load where the last component gets
DCE'd causes a 96-bit load to be generated, which no GPU can actually
emit. Avoid generating such instructions by scaling back to 64-bit on
the first load when splitting.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
This was just plain broken. It used always the value from v0 (for vp_index)
but would pass the value from the provoking vertex to later stages - but only
if there was a corresponding fs input, otherwise the layer/vp index would get
lost completely (as it would try to interpolate the (unsigned) values as
floats).
So, make it obey provoking vertex rules (drivers relying on draw will need to
do the same). And make sure that the default interpolation mode (when no
corresponding fs input is found) for them is constant.
Also, change the code a bit so constant inputs aren't interpolated then
copied over later.
Fixes the new piglit test gl-layer-render-clipped.
v2: more consistent whitespaces fixes for function defs, and more tab killing
(overall still not quite right however).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Same as for llvmpipe, albeit softpipe only really handles multiple layers,
not multiple viewports/scissors.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
d3d10 actually requires using provoking (first) vertex. GL is happy with
any vertex (as long as we say it's undefined in the corresponding queries).
Up to now we actually used vertex 0 for viewport index, and vertex 1 for
layer (for tris), which really didn't make sense (probably a typo). Also,$
since we reorder vertices of clockwise triangle, that actually meant we used
a different vertex depending if the traingle was cw or ccw (still ok by gl).
However, it should be consistent with what draw (clip) does, and using
provoking vertex seems like the sensible choice (draw clip will be fixed
next as it is totally broken there).
While here, also use the correct viewport always even when not needed
in setup (we pass it down to jit fragment shader it might be needed there
for getting correct near/far depth values).
No piglit changes.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We can't dump it in the real driver, since the kernel doesn't give us a
handle to it (except after a GPU hang, using a root ioctl). In the
simulator we can.
It should be MSVC2008_COMPAT_CFLAGS and not MSVC2008_COMPAT_CXXFLAGS.
This is why the recent util_blitter breakage went unnoticed on autotools
builds.
Trivial.
nir is the exception among gallium/auxiliary -- we don't need to compile
it with MSVC2008 yet. And this enables us to use
-Werror=declaration-after-statement in the next commit as we should,
without complicated fixes to tgsi_to_nir module.
Trvial. Tested with GCC and Clang.
Currently it stores strlen(buf) whenever the user originally provided a
negative value for length.
Although I've not seen any explicit text in the spec, CTS requires that
the very same length (be that negative value or not) is returned back on
Pop.
So let's push down the length < 0 checks, tweak the meaning of
gl_debug_message::length and fix GetDebugMessageLog to add and count the
null terminators, as required by the spec.
v2: return correct total length in GetDebugMessageLog
v3: rebase (drop _mesa_shader_debug hunk).
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
We're about to rework the meaning of gl_debug_message::length to only
store the user provided data. Thus we should add an explicit validation
for null terminated strings.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
These new (relative to ARB_debug_output) tokens, have been explicitly
separated from the existing ones in the spec text. With the reference
to glDebugMessageInsert was dropped.
At the same time, further down the spec says:
"The value of <type> must be one of the values from Table 5.4"
... and these two are listed in Table 5.4.
The GL 4.3 and GLES 3.2 do not give any hints on the former
'definition', plus CTS requires that the tokens are valid values for
glDebugMessageInsert.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
As per the spec quote:
"All messages are initially enabled unless their assigned severity
is DEBUG_SEVERITY_LOW"
We already had MEDIUM and HIGH set, let's toggle NOTIFICATION as well.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
We already have one group (the default) as specified in the spec. So
lets return its size, rather than the index of the current group.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
The extension requires (cough implements) GetPointervKHR (alias of
GetPointerv) which in itself is available for ES 1.1 enabled mesa.
Anyone willing to fish around and implement it for ES 1.0 is more than
welcome to revert this commit. Until then lets restrict things.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93048
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
There are a few legacy OpenGL apps on Windows which need this extension.
We basically use glCopyTex[Sub]Image to implement wglBindTexImageARB (see
the implementation notes for details).
v2: refactor code to use st_copy_framebuffer_to_texture() helper function.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This helper is used by the WGL state tracker to implement the
wglBindTexImageARB() function.
This is basically a new "meta" function. However, we're not putting
it in the src/mesa/drivers/common/ directory because that code is not
linked with gallium-based drivers.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Atomic counters and Images were using ctx::Shader that does not take in
to account program pipeline changes, ctx::_Shader must be used for SSO to
work. Commit c0347705 already changed ubo's to use this.
Fixes failures seen with following Piglit test:
arb_separate_shader_object-atomic-counter
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
Noticed this when looking at a trace that caused flags to spill to/from
registers. The flags source/destination wasn't encoded correctly
according to both envydis and nvdisasm.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The algorithm expects the entire CFG to be reachable, so make sure that
we hit every node. Otherwise we will end up with uninitialized data,
memory corruption, etc.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
For example if there are only returns, the break bb will not end up part
of the CFG. However there will have been a prebreak already emitted for
it, and when hitting the RET that comes after, we will try to insert the
current (i.e. break) BB into the graph even though it will be
unreachable. This makes the SSA code sad.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
There's a post-RA fixup to replace 0's with $r63 (or $r127 if too many
regs are used), so just as nvc0, let an immediate 0 be loaded anywhere.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This appeared in brw_vs.c and brw_wm.c, should have appeared in
brw_gs.c, and was soon going to have to be in brw_tcs.c and brw_tes.c as
well.
So, instead, move it to a central location (which has to know about both
struct brw_context and perf_debug()).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
That texture mask thing doesn't seem to be needed for surface ops, so
just as nve4+, let do that only for texture ops.
This fixes a segfault with 'test_surface_st' from
gallium/tests/trivial/compute.c on Fermi because this test uses sustp.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
AppVeyor doesn't require an appveyor.yml in the repos (in fact it has
some limitations as noted in comments below), but doing so has two great
advantages over the web UI:
- appveyor.yml can be revisioned together with the code, so instructions
should always be in synch with the code
- appveyor.yml can be reused for people's private repositories (be on
fdo or GitHub, etc.)
Acked-by: Roland Scheidegger <sroland@vmware.com>
Previously util_blitter_clear_depth_stencil() could not clear more
than the first layer. We need to generalise this as we did for
util_blitter_clear_render_target().
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Previously util_blitter_clear_render_target() could not clear more
than the first layer. We need to generalise this so that
ARB_clear_texture can pass the 3d piglit test.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
These are implementation-dependent queries, but so far we just returned the
value of whatever the current provoking vertex convention was set to, which
was clearly wrong.
Just make this a variable in the context constants like for other things
which are implementation dependent (I assume all drivers will want to set
this to the same value for both queries), and set it to GL_UNDEFINED_VERTEX
which is correct for everybody (and drivers can override it).
Reviewed-by: Brian Paul <brianp@vmware.com>
CC: <mesa-stable@lists.freedesktop.org>
Since I just broke the scons build, I figured I'd make Travis test that I
don't break it again in the future. The script runs the builds in
parallel across VMs, so it still takes just 5 minutes to turn around
results.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The NVIDIA binary driver and Intel's closed source driver both expose
14 here, rather than the GL minimum of 12. Let's follow suit.
Without this, Shadow of Mordor fails to render correctly and triggers
OpenGL errors:
Mesa: User error: GL_INVALID_VALUE in glBindBufferBase(index=68)
Mesa: User error: GL_INVALID_VALUE in glUniformBlockBinding(block binding 68 >= 60)
There are 5 stages (VS, TCS, TES, GS, FS), and 12 * 5 = 60 is too small.
14 * 5 = 70 will work just fine.
Tapani believes this will also help Alien Isolation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
The first pass marked dead instructions as opcode = NOP, and a second
pass deleted those instructions so that the live ranges used in the
first pass wouldn't change.
But since we're walking the instructions in reverse order, we can just
do everything in one pass. The only thing we have to do is walk the
blocks in reverse as well.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Removes dead code from glsl-mat-from-int-ctor-03.shader_test.
Reported-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The docs say we should send the emit after the ring writes,
so lets do that and not have an ALU in between.
Signed-off-by: Dave Airlie <airlied@redhat.com>
For the compute support, we might stick buffers as surfaces. This fixes
an assertion when executing src/gallium/tests/trivial/compute.
To avoid using these "restricted" surfaces as render targets, these
assertions have been moved. Note that it's already handled for the
framebuffer thing on nvc0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
String literals cannot exceed 65535 characters for MSVC. Instead of
emiting a string, emit an array of characters.
v2: fix indentation and add comment in the gl_enums.py file about this
ugliness.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This just builds/installs our dependencies, and runs "make check". I'm
interested in integrating more tests into it, but this seems like a pretty
easy first start.
If your personal branches of Mesa are on github, you can enable it on your
account and the repository (see
https://docs.travis-ci.com/user/for-beginners), then any pushes you do
will get their HEAD commit tested, and any pull requests to your tree will
get their merge commits tested.
Now when people need new extensions, they can skip the entire
enum-definition process, and we can stop reviewing new extension XML for
its enum content.
This also brings in a new enum that I wanted to use in enum_strings.cpp
for testing the code generator.
v2: Drop comment about disabled GL_1PASS_EXT test.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
With GLES 3.1, GL 4.5, and many new vendor extensions about to get their
enums added, we jump up to 85k of table.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Sometimes GL likes to rename an old enum when it grows a more general
purpose, and we should prefer the new name. Changes from this:
GL_POINT/LINE_SIZE_* (1.1) -> GL_SMOOTH_POINT/LINE_SIZE_* (1.2)
GL_FOG_COORDINATE_* (1.4) -> GL_FOG_COORD_* (1.5)
GL_SOURCE[012]_RGB/ALPHA (1.3) -> GL_SRC0_RGB (1.5)
GL_COPY_READ/WRITE_BUFFER (3.1) -> GL_COPY_READ_BUFFER_BINDING (4.2)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Asking the table for bitfield names doesn't make any sense. For 0x10, do
you want GL_GLYPH_HORIZONTAL_BEARING_ADVANCE_BIT_NV or
GL_COLOR_BUFFER_BIT4_QCOM or GL_POLYGON_STIPPLE_BIT or
GL_SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV? Giving a useful answer would
depend on a whole lot of context.
This also fixes a bad enum table entry, where we chose GL_HINT_BIT instead
of GL_ABGR_EXT for 0x8000, so we can now fix its entry in the enum_strings
test.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I've used a bunch of python code to cut out new enums so that the two
generated files can be diffed. I'll remove all that hardcoding in the
following commits. All remaining differences between the generated code:
- GL_TEXTURE_BUFFER_FORMAT didn't appear in GL3 when TBOs got merged to
core, so it now gets an _ARB suffix instead.
- Blacklisting can't keep EXT_sso's GL_ACTIVE_PROGRAM_EXT from becoming
GL_ACTIVE_PROGRAM -- in our hash table, GL_ACTIVE_PROGRAM_EXT points at
the GLES2 enum's value (aka GL_CURRENT_PROGRAM). By not blacklisting
the core name, we get both enums translated.
- GL_DRAW_FRAMEBUFFER_BINDING and GL_FRAMEBUFFER_BINDING both appeared in
GL3 as synonyms, and the new code happens to choose
GL_FRAMEBUFFER_BINDING instead.
- GL_TEXTURE_COMPONENTS and GL_TEXTURE_INTERNAL_FORMAT both appear in 1.1,
and the new code chooses GL_TEXTURE_INTERNAL_FORMAT instead (which seems
better, to me)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
emacs whines at me every time I open the file about these unsafe
variables, and the file was reformatted from 8 space to 4 space long ago.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GL_ALL_ATTRIB_BITS is a thing, and GL_CLIENT_ALL_ATTRIB_BITS, but I don't
see GL_ALL_CLIENT_ATTRIB_BITS in my grepping of khronos XML, GL extension
specs, GL 1.1, GL 2.2, and GL 4.4.
Reviewed-by: Brian Paul <brianp@vmware.com>
Mesa hasn't been using these enums and the finalized specs don't reference
them, so losing them from our generated enum-to-string code should be
fine. Reduces diffs to generating from Khronos XML, which has these enums
noted defined but commented out from any consumers.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In converting to using the Khronos XML, I found that our XML had these two
swapped, and the text spec agreed.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The intention here is to keep a pristine copy of the upstream gl.xml that
can be updated at any time with a new version, and use that to generate
Mesa code from instead of our private XML.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The previous contents appeared to be the output of some form of code
generation for all enums, with a few entries hand-edited to deal with
oddness. The downside to this was that when an enum gets promoted from
vendor to _EXT or _EXT to _ARB or _ARB to core, make check starts failing
even when the commiter has done nothing wrong. Instead of black-box
testing the code generation, pick a few enums that intentionally poke the
interesting cases of code generation.
People editing the code generator should be diffing the generated code
anyway. This should catch when they fail to do so, without throwing false
negatives when people update the GL XML.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When probing for devices, clover will call pipe_loader_probe() twice.
The first time to retrieve the number of devices, and then second time
to retrieve the device structures.
We currently assume that the return value of both calls will be the
same, but this will not be the case if a device happens to disappear
between the two calls.
When a device disappears, the pipe_loader_probe() will add a NULL
device to the device list, so we need to handle this.
v2:
- Keep range for loop
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Improves register pressure, since otherwise we end up emitting
loads for all the elements in the RHS and them emitting
stores for all elements in the LHS.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Improves register pressure, since otherwise we end up emitting
loads for all the elements in the RHS and them emitting
stores for all elements in the LHS.
v2:
- Mark progress properly. This also fixes some instances where the added
nodes with individual element copies where not being lowered, which is
expected behavior as explained in the documentation for
visit_list_elements.
- Only need to do this if the RHS is a buffer-backed variable.
- We can also have arrays inside structs. A later patch will make it so
we also split struct copies and end up with multiple
ir_dereference_record assignments, so make sure that if any of these
is an array copy, we also split it.
Fixes the following piglit tests:
tests/spec/arb_shader_storage_buffer_object/execution/large-field-copy.shader_test
tests/spec/arb_shader_storage_buffer_object/linker/copy-large-array.shader_test
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Other hardwares than AMD require to parse:
VAPictureParameterBufferH264.ReferenceFrames[16]
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
In general max_references cannot be based on num_render_targets.
This patch allows to allocate buffers with an accurate size.
I.e. no more than necessary. For other codecs it is a fixed
value 2.
This is similar behaviour as vaapi/vdpau-driver.
For now HEVC case defaults to num_render_targets as before.
But it could also benefits this change by setting a more
accurate max_references number in handlePictureParameterBuffer.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
The current implementation looks for array dereferences on gl_FragData and
immediately proceeds to lower them, however this is not enough because we
can have array access on vector variables too, like in this code:
out vec4 color;
void main()
{
int i;
for (i = 0; i < 4; i++)
color[i] = 1.0;
}
Fix it by making sure that the actual variable being dereferenced is an array.
Fixes a crash in:
spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-ldexp-dvec4.shader_test
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
We need to emit at least one cut/emit in every
geometry shader, the easiest workaround it to
stick a single CUT at the top of each geom shader.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes an issue where the addition of the FLAT qualifier in
varying_matches::record() can break the expected varying order.
It also avoids a future issue with the relaxing of interpolation
qualifier matching constraints in GLSL 4.50.
V2: (by Timothy Arceri)
* reworked comment slightly
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
GL_ARB_separate_shader_objects allow matching by name variable or block
interface. Input varyings can't be removed because it is will impact the
location assignment.
This fixes the bug 79783 and likely any application that uses
GL_ARB_separate_shader_objects extension.
V2 (by Timothy Arceri):
* simplify now that builtins are not set as always active
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
https://bugs.freedesktop.org/show_bug.cgi?id=79783
The value will be set in separate-shader program when an input/output
must remains active. e.g. when deadcode removal isn't allowed because
it will create interface location/name-matching mismatch.
v3:
* Rename the attribute
* Use ir_variable directly instead of ir_variable_refcount_visitor
* Move the foreach IR code in the linker file
v4:
* Fix variable name in assert
v5 (by Timothy Arceri):
* Rename functions and reword comments
* Don't set always active on builtins
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This change allows used defined inputs/outputs with explicit locations
to be removed if they are detected to not be used between shaders
at link time.
To enable this we change the is_unmatched_generic_inout field to be
flagged when we have a user defined varying. Previously
explicit_location was assumed to be set only in builtins however SSO
allows the user to set an explicit location.
We then add a function to match explicit locations between shaders.
V2: call match_explicit_outputs_to_inputs() after
is_unmatched_generic_inout has been initialised.
Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
When working on tessellation shaders, I created some vec4 virtual
opcodes for creating message headers through a sequence like:
mov(8) g7<1>UD 0x00000000UD { align1 WE_all 1Q compacted };
mov(1) g7.5<1>UD 0x00000100UD { align1 WE_all };
mov(1) g7<1>UD g0<0,1,0>UD { align1 WE_all compacted };
mov(1) g7.3<1>UD g8<0,1,0>UD { align1 WE_all };
This is done in the generator since the vec4 backend can't handle align1
regioning. From the visitor's point of view, this is a single opcode:
hs_set_output_urb_offsets vgrf7.0:UD, 1U, vgrf8.xxxx:UD
Normally, there's no hazard between sources and destinations - an
instruction (naturally) reads its sources, then writes the result to the
destination. However, when the virtual instruction generates multiple
hardware instructions, we can get into trouble.
In the above example, if the register allocator assigned vgrf7 and vgrf8
to the same hardware register, then we'd clobber the source with 0 in
the first instruction, and read back the wrong value in the last one.
It occured to me that this is exactly the same problem we have with
SIMD16 instructions that use W/UW or B/UB types with 0 stride. The
hardware implicitly decodes them as two SIMD8 instructions, and with
the overlapping regions, the first would clobber the second.
Previously, we handled that by incrementing the live range end IP by 1,
which works, but is excessive: the next instruction doesn't actually
care about that. It might also be the end of control flow. This might
keep values alive too long. What we really want is to say "my source
and destinations interfere".
This patch creates new infrastructure for doing just that, and teaches
the register allocator to add interference when there's a hazard. For
my vec4 case, we can determine this by switching on opcodes. For the
SIMD16 case, we just move the existing code there.
I audited our existing virtual opcodes that generate multiple
instructions; I believe FS_OPCODE_PACK_HALF_2x16_SPLIT needs this
treatment as well, but no others.
v2: Rebased by mattst88.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We've apparently always been botching JIP for sequences such as:
do
cmp.f0.0 ...
(+f0.0) break
...
if
...
else
...
endif
...
while
Normally, UIP is supposed to point to the final destination of the jump,
while in nested control flow, JIP is supposed to point to the end of the
current nesting level. It essentially bounces out of the current nested
control flow, to an instruction that has a JIP which bounces out another
level, and so on.
In the above example, when setting JIP for the BREAK, we call
brw_find_next_block_end(), which begins a search after the BREAK for the
next ENDIF, ELSE, WHILE, or HALT. It ignores the IF and finds the ELSE,
setting JIP there.
This makes no sense at all. The break is supposed to skip over the
whole if/else/endif block entirely. They have a sibling relationship,
not a nesting relationship.
This patch fixes brw_find_next_block_end() to track depth as it does
its search, and ignore anything not at depth 0. So when it sees the
IF, it ignores everything until after the ENDIF. That way, it finds
the end of the right block.
I noticed this while reading some assembly code. We believe jumping
earlier is harmless, but makes the EU walk through a bunch of disabled
instructions for no reason. I noticed that GLBenchmark Manhattan had
a shader that contained a BREAK with a bogus JIP, but didn't measure
any performance improvement (it's likely miniscule, if there is any).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This just splits out a common pattern into an inline function
to make things cleaner to read.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We really should initialise HS/LS_2 and SQ_LDS_ALLOC exists
on all evergreen not just cayman, so we should initialise
it as well.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the defines for a bunch of registers and shader
values that are required to implement tessellation.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Move some common code into one place, tess will also need
to use this function.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
There was only a single user which was using strlen(buf).
As this function is not user facing (i.e. we don't need to feed back
original length via a callback), we can simplify things.
Suggested-by: Timothy Arceri <timothy.arceri@collabora.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Add some checks if the original/dup'd fd is valid and ensure that we
don't leak it on error. The former is implicitly handled within the
pipe_loader, although let's make things explicit and check beforehand.
Spotted by Coverity (CID 1339865)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
In theory this wouldn't be an issue, as we'll find the correct name and
break out of the loop before we hit the sentinel.
Let's fix this and avoid issues in the future.
Spotted by Coverity (CID 1339869, 1339870, 1339871)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Earlier commit factored out the mpeg4 IQ matrix handling into separate
function, although it forgot to add a break in its case statement.
Thus the data ended up partially overwritten as the mpeg4 and h265
structs are members of the desc union.
Spotted by Coverity (CID 1341052)
Fixes: 64761a841d "st/va: move MPEG4 functions into separate file"
Cc: Julien Isorce <j.isorce@samsung.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Non-timer queries are suspended during blits. When the blits end, the queries
are resumed, but this resume operation itself might run out of CS space and
trigger a flush. When this happens, we must prevent a duplicate suspend during
preflush suspend, and we must also prevent a duplicate resume when the CS flush
returns back to the original resume operation.
This fixes a regression that was introduced by:
commit 8a125afa6e
Author: Nicolai Hähnle <nhaehnle@gmail.com>
Date: Wed Nov 18 18:40:22 2015 +0100
radeon: ensure that timing/profiling queries are suspended on flush
The queries_suspended_for_flush flag is redundant because suspended queries
are not removed from their respective linked list.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reported-by: Axel Davy <axel.davy@ens.fr>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Tested-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Zero length arrays are non standard:
warning C4200: nonstandard extension used : zero-sized array in struct/union
Cannot generate copy-ctor or copy-assignment operator when UDT contains a zero-sized array
And all code does `N * sizeof query_result->batch[0]`, so it should work
exactly the same.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Added to OpenGL 4.3 section, tagged as 'in progress (elima)'. See
https://bugs.freedesktop.org/show_bug.cgi?id=92687.
Thanks to Thomas H.P. Andersen for remainding me about this.
v1: - Update the already existing entry in section 4.3
instead (Ilia Mirkin).
- Added my BZ nickname as contact person (Felix Schwarz).
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
From Section 11.1.3.11 (Validation) of the GLES 3.1 spec:
"An INVALID_OPERATION error is generated by any command that trans-
fers vertices to the GL or launches compute work if the current set
of active program objects cannot be executed, for reasons including:"
It then goes on to list the rules we validate in the
_mesa_validate_program_pipeline() function.
For ValidateProgramPipeline the only mention of generating an error is:
"An INVALID_OPERATION error is generated if pipeline is not a name re-
turned from a previous call to GenProgramPipelines or if such a name has
since been deleted by DeleteProgramPipelines,"
Which we handle separately.
This fixes:
ES31-CTS.sepshaderobjs.PipelineApi
No regressions on the eEQP 3.1 tests.
Cc: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Rather than assigning inloc up front, when we don't yet know if it will
be unused, assign it last thing before the legalize pass.
Also, realize when inputs are unused (since for frag shader's we can't
rely on them being removed from ir->inputs[]). This doesn't make sense
if we don't also dynamically assign the inloc's, since we could end up
telling the hw the wrong # of varyings (since we currently assume that
the # of varyings and max-inloc are related..)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Make the interpolation / point-sprite replacement mode setup deal with
varying packing.
In a later commit, we switch to packing just the varying components that
are actually used by the frag shader, so we won't be able to assume
everything is vec4's aligned to vec4. Which would highly confuse the
previous vinterp/vpsrepl logic.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The thread scratch space is thread-local so using the full IA-coherent
stateless surface index (255 since Gen8) is unnecessary and
potentially expensive. On Gen8 and early steppings of Gen9 this is
not a functional change because the kernel already sets bit 4 of
HDC_CHICKEN0 which overrides all HDC memory access to be non-coherent
in order to workaround a hardware bug.
This happens to fix a full system hang when running any spilling code
on a pre-production SKL GT4e machine I have on my desk (forcing all
HDC access to non-coherent from the kernel up to stepping F0 might be
a good idea though regardless of this patch), and improves performance
of the OglPSBump2 SynMark benchmark run with INTEL_DEBUG=spill_fs by
33% (11 runs, 5% significance) on a production SKL GT2 (on which HDC
IA-coherency is apparently functional so it wouldn't make sense to
disable globally).
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Unfortunately Gen7 scratch block reads and writes seem to be hardwired
to BTI 255 even on Gen9+ where that index causes the dataport to do an
IA-coherent read or write. This change is required for the next patch
to be correct, since otherwise we would be writing to the scratch
space using non-coherent access and then reading it back using
IA-coherent reads, which wouldn't be guaranteed to return the value
previously written to the same location without introducing an
additional HDC flush in between.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Since the query names are not very enlightening, and there are thousands
of them, GALLIUM_HUD=help should only show the first and last query name
for each hardware block.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
No drivers currently implement ARB_geometry_shader4, nor are there
any plans to implement it. We only support the version of geometry
shaders that was incorporated into OpenGL 3.2 / GLSL 1.50.
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Patch adds additional mask for tracking which vertex arrays have
associated vertex buffer binding set. This array can be directly
compared to which vertex arrays are enabled and should match when
drawing.
Fixes following CTS tests:
ES31-CTS.draw_indirect.negative-noVBO-arrays
ES31-CTS.draw_indirect.negative-noVBO-elements
v2: update mask in vertex_array_attrib_binding
v3: rename mask and make it track _BoundArrays which matches what
was actually originally wanted (Fredrik Höglund)
v4: code cleanup, check for GLES 3.1 (Fredrik Höglund)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
This was missed in commit 59cfb21d ("targets: use the non-inline sw
helpers").
Fixes build failure:
CXXLD libXvMCgallium.la
../../../../src/gallium/auxiliary/pipe-loader/.libs/libpipe_loader_static.a(libpipe_loader_static_la-pipe_loader_sw.o):(.data.rel.ro+0x0): undefined reference to `sw_screen_create'
collect2: error: ld returned 1 exit status
Makefile:756: recipe for target 'libXvMCgallium.la' failed
make[3]: *** [libXvMCgallium.la] Error 1
Trivial.
Analogous to previous commit. As we no longer have anyone who uses NIR
we can drop the link.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Previously (with the inline ones) things were embedded into the
pipe-loader, which means that we cannot control/select what we want in
each target.
That also meant that at runtime we ended up with the empty
sw_screen_create() as the GALLIUM_SOFTPIPE/LLVMPIPE were not set.
v2: Cover all the targets, not just dri.
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Edward O'Callaghan <edward.ocallaghan@koparo.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
While we correctly set output[] for composite varyings, we set completely
bogus values for output_components[], making emit_urb_writes() output
zeros instead of the actual values.
Unfortunately, our simple approach goes out the window, and we need to
recurse into structs to get the proper value of vector_elements for each
field.
Together with the previous patch, this fixes rendering in an upcoming
game from Feral Interactive.
v2: Use pointers instead of pass-by-mutable-reference (Jason, Matt).
Cc: "11.1 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Apparently we have literally no support for FS varying struct inputs.
This is somewhat surprising, given that we've had tests for that very
feature that have been passing for a long time.
Normally, varying packing splits up structures for us, so we don't see
them in the backend. However, with SSO, varying packing isn't around
to save us, and we get actual structs that we have to handle.
This patch changes fs_visitor::emit_general_interpolation() to work
recursively, properly handling nested structs/arrays/and so on.
(It's easier to read with diff -b, as indentation changes.)
When using the vec4 VS backend, this fixes rendering in an upcoming
game from Feral Interactive. (The scalar VS backend requires additional
bug fixes in the next patch.)
v2: Use pointers instead of pass-by-mutable-reference (Jason, Matt).
Cc: "11.1 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Expose most of the performance counter groups that are exposed by Catalyst.
Ideally, the driver will work with GPUPerfStudio at some point, but we are not
quite there yet. In any case, this is the reason for grouping multiple
instances of hardware blocks in the way it is implemented.
The counters can also be shown using the Gallium HUD. If one is interested to
see how work is distributed across multiple shader engines, one can set the
environment variable RADEON_PC_SEPARATE_SE=1 to obtain finer-grained performance
counter groups.
Part of the implementation is in radeon because an implementation for
older hardware would largely follow along the same lines, but exposing
a different set of blocks which are programmed slightly differently.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Performance monitor queries can become very big, especially considering that
instances of a block in different shader engines are queried separately.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The enable of AMD_performance_monitor is no longer related to whether
queries are run by the GPU since the commit mentioned below.
Suggested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
commit ddf27a3dd0
Author: Nicolai Hähnle <nhaehnle@gmail.com>
Date: Tue Nov 10 13:35:01 2015 +0100
gallium: remove pipe_driver_query_group_info field type
Most applications never use performance counters, so allow drivers to
skip potentially expensive initialization steps.
A driver that wants to use this must enable the appropriate extension(s)
at context initialization and set the InitPerfMonitorGroups driver function
which will be called the first time information about the performance monitor
groups is actually used.
The init_groups helper is called for API functions that can be called before
a monitor object exists. Functions that require an existing monitor object
can rely on init_groups having been called before.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Previously pass did not traverse to those array dereferences which were
used as indices to arrays. This fixes Synmark2 Gl42CSCloth application
issues.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
templat->interlaced is 0 if not NV12 which is the case currently
when using VPP.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This setting is only used by glTexCoordPointer and related glEnable
calls. Since the preceeding commits removed all of those, it is not
necessary to save, reset to default, or restore this state.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Nothing left in meta does anything with the VBO binding, so we don't
need to save or restore it. The VAO binding is still modified.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.
In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions. The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.
Here's the problem scenario:
- Application calls a meta function that generates a name. The first
Gen will probably return 1.
- Application decides to use the same name for an object of the same
type without calling Gen. Many demo programs use names 1, 2, 3,
etc. without calling Gen.
- Application calls the meta function again, and the meta function
replaces the data. The application's data is lost, and the app
fails. Have fun debugging that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.
In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions. The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.
Here's the problem scenario:
- Application calls a meta function that generates a name. The first
Gen will probably return 1.
- Application decides to use the same name for an object of the same
type without calling Gen. Many demo programs use names 1, 2, 3,
etc. without calling Gen.
- Application calls the meta function again, and the meta function
replaces the data. The application's data is lost, and the app
fails. Have fun debugging that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.
In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions. The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.
Here's the problem scenario:
- Application calls a meta function that generates a name. The first
Gen will probably return 1.
- Application decides to use the same name for an object of the same
type without calling Gen. Many demo programs use names 1, 2, 3,
etc. without calling Gen.
- Application calls the meta function again, and the meta function
replaces the data. The application's data is lost, and the app
fails. Have fun debugging that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The fixed-function attribute paths don't get the DSA treatment because
there are no DSA entry-points for fixed-function attributes. These
could have been added, but this is a temporary patch intended to make
later patches easier to review.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Meta currently does this, but future changes will make this impossible.
Explicitly do it as a step in the patch series now to catch any possible
kinks.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.
In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions. The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.
Here's the problem scenario:
- Application calls a meta function that generates a name. The first
Gen will probably return 1.
- Application decides to use the same name for an object of the same
type without calling Gen. Many demo programs use names 1, 2, 3,
etc. without calling Gen.
- Application calls the meta function again, and the meta function
replaces the data. The application's data is lost, and the app
fails. Have fun debugging that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Instead of going through the GL API implementation functions, use the
lower-level functions. This means that we have to keep track of a
pointer to the gl_buffer_object and the gl_vertex_array_object.
This has two advantages. First, it avoids a bunch of CPU overhead in
looking up objects and validing API parameters. Second, and much more
importantly, it will allow us to stop calling _mesa_GenBuffers /
_mesa_CreateBuffers and pollute the buffer namespace (next patch).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Future patches will use the brw_context instead. Keeping this
non-functional change separate should make the function changes easier
to review.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Pulls the parts of enable_vertex_array_attrib that aren't just parameter
validation out into a function that can be called from other parts of
Mesa (e.g., meta).
_mesa_enable_vertex_array_attrib can also be used to enable
fixed-function arrays.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Pulls the parts of update_array_format that aren't just parameter
validation out into a function that can be called from other parts of
Mesa (e.g., meta).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This reverts commit a280e83d71.
It breaks INTEL_DEBUG=fs output. For example,
glsl-fs-discard-01.shader_test has 11 instructions but only prints 5.
Acked-by: Matt Turner <mattst88@gmail.com>
Coverity noticed that we were passing this by value, and it's 152 bytes.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
It's only called from C, it compiles as C, so just compile it as C.
Notice the missing extern "C" on the definition of the function, which
would screw things up if the prototype wasn't parsed before the
definition.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were including it in headers, which then caused it to be included in
tons of places it wasn't needed.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These functions' prototypes are marked with extern "C", which apparently
overrides a lack of extern "C" at the definition site if the prototype
has been seen first.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that backend_reg inherits from brw_reg, we have to be careful to
avoid the object slicing problem.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
In the next patch, I make backend_reg's inheritance from brw_reg
private, which confuses clang when it sees the type "struct brw_reg" in
the derived class constructors, thinking it is referring to the
privately inherited brw_reg:
brw_fs.cpp:366:23: error: 'brw_reg' is a private member of 'brw_reg'
fs_reg::fs_reg(struct brw_reg reg) :
^
brw_shader.h:39:22: note: constrained by private inheritance here
struct backend_reg : private brw_reg
^~~~~~~~~~~~~~~
brw_reg.h:232:8: note: member is declared here
struct brw_reg {
^
Avoid this by marking brw_reg with the scope resolution operator.
In order to do this, we have to change the signature of the
backend_reg(brw_reg) constructor to take a reference to a brw_reg in
order to avoid unresolvable ambiguity about which constructor is
actually being called in the other modifications in this patch.
As far as I understand it, the rule in C++ is that if multiple
constructors are available for parent classes, the one closest to you in
the class heirarchy is closen, but if one of them didn't take a
reference, that screws things up.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
3333977556 added support for ASTC textures to
gallium. They don't have any helpers hooked up for software decoding, however,
so cannot support them in drivers relying on util code for decoding.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Removing the fake format helpers (1c7d0a6aa4)
caused this to fail. These formats were never supported, but previously
they would have asserted in the generated jit functions (which, due to lack
of test cases for these formats, were never called) whereas we now assert when
trying to build the jit function. So, skip them completely.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=93092
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Kind of a handy function. And I'll want it available outside of i965
for common nir-pass helpers.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Nicolai Hähnle <nhaehnle@gmail.com>
The dirty area in this call isn't related to the screen at all.
v2: set clear dirty area to false as well
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Enables 200+ dEQP SSO tests to proceed past validation,
and fixes a ES31-CTS.sepshaderobjs.PipelineApi subtest.
V2: split out change that reverts a previous patch into its own commit,
move variable declaration to top of function, and fix some formatting
all suggested by Ian.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
This reverts commit ba02f7a3b6.
The commit checked whether the pipeline was currently bound instead
of checking whether it had ever been bound. The previous setting
of Validated during object creation makes this unnecessary. The
real problem was that Validated was not properly set to false
elsewhere in the code. This is fixed by a later patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "11.1" <mesa-stable@lists.freedesktop.org>
This should fix the getteximage-depth test that currently asserts.
I was hitting problem with virgl as well in this area.
This moves the 1D array handling code to a single place.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.6 11.0 11.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Now that nir_lower_tex can do texture swizzle lowering, we can use that
instead of repeating more-or-less the same code in both backends. This
both allows us to share code and means that things like the tg4
work-arounds are somewhat simpler because they don't have to take the
swizzle into account.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
nir_ssa_def_rewrite_uses is one of the older helpers in NIR and predated
both of those. Now it can be substantially simplified.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, if someone accidentally made an instruction that refers to its
own SSA destination, the validator wouldn't catch it. The reason for this
is that it validated the destination too early and, by the time it got to
the source, the destination SSA value was already added to the set of seen
SSA values so it would assume that it came from some previous instruction.
By moving destination validation to be after source validation, the SSA
value is not in the list of seen values and the validator will catch
self-referential instructions.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we had a rescale_texcoords helper in the FS backend for
handling rescaling of texture coordinates. Now that we can do variants in
NIR, we can use nir_lower_tex to do the rescaling for us. This allows us
to delete the i965-specific code and gives us proper TEXTURE_RECTANGLE and
GL_CLAMP handling in vertex and geometry shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows us to insert NIR passes between initial NIR compilation and
optimization (link time) and actual backend code-gen. In particular, it
will allow us to do shader variants in NIR and share some of that shader
variant code between backends.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
At the moment, brw_create_nir just calls the three stages in sequence so
there's not much difference. Soon, however, we will want to start doing
variants in NIR at which point the postprocessing step will have to move
from shader create time to codegen time.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This fixes a regression introduced in b1a83b5d1 that caused basically all
shaders to fail to compile on 32-bit platforms.
Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It appears that the hardware wants the integer to be scaled the same way
that the hardware representation is. snorm16 uses one of the float
factors, so this is only relevant for snorm8.
This fixes a number of subcases of
bin/fbo-blending-formats GL_EXT_texture_snorm
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
This doesn't account for the ldr/hdr distinction... that will probably
have to be exposed via a separate cap. When relevant hardware appears,
this can be worked out.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This was a silly hack that kept growing and growing. Instead, just write
NULLs for those functions. No need to have helpers that just assert(0)
when you call them.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Not too long ago, the dri3 code was living in src/glx, which in itself
was guarded by HAVE_DRI_GLX. As the name suggests we didn't dive into
the folder when dri was disabled, thus we missed that dri3 does not
consider/honour --enable-dri.
Cc: mesa-stable@lists.freedesktop.org
Fixes: 6bd9ba7d07 "loader: Add dri3 helper"
Cc: Pali Rohár <pali.rohar@gmail.com>
Reported-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
It looks like the sampler hardware doesn't take into account the
surface format when sampling a cleared color after a fast clear has
been done. So for example if you clear a GL_RED surface to 1,1,1,1
then the sampling instructions will return 1,1,1,1 instead of 1,0,0,1.
This patch makes it override the color that is programmed in the
surface state in order to swizzle for luminance and intensity as well
as overriding the missing components.
Fixes the ext_framebuffer_multisample-fast-clear Piglit test.
v2: Handle luminance and intensity formats
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
v2: do the same in tgsi_to_nir (Samuel)
v3: added missing cases after rebase (Iago)
v4: Add a blank space after '#' in one of the comments (Matt)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This way the caller doesn't have to initialize all 4 channels when they
aren't using them.
v2: Fix signed/unsigned comparison warning (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There are various restrictions on what the hstride can be that depend on
the Gen, and now that we're using hstride == 2 for packing/unpacking
doubles, we're going to run into these restrictions a lot more often.
Pull them out into a separate function, and move the one restriction we
checked previously into it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This can happen when the source of the compare was split by the SIMD
lowering pass. Potentially, we could allow the case where the exec size
of scan_inst is larger, and scan_inst has the right quarter selected,
but doing that seems a little more risky.
v2: Merge the bail condition into the the previous if/break block (Matt)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If we tried to get/set something that was exactly 64 bits, we would
try to do (1 << 64) - 1 to calculate the mask which doesn't give us all
1's like we want.
v2 (Iago)
- Replace ~0 by ~0ull
- Removed unnecessary parenthesis
v3 (Kristian)
- Avoid the conditional
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Regression as of 64710db664
We can't use the type returned by get_interface_type() as
the interface type has arrays removed.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
In case that the buffer has no bind at all, assume it can be a regular
buffer. This can happen on buffers created through the ARB_dsa
interfaces.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
With ARB_direct_state_access, buffers can be created without any binding
hints at all. We still need to allocate these buffers to VRAM or GART,
as we don't have logic down the line to place them into GPU-mappable
space. Ideally we'd be able to shift these things around based on usage.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92438
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0 11.1" <mesa-stable@lists.freedesktop.org>
They're exclusive at build time, but the ilo entry is always present, so
we'd try to use it and fail out.
v2: Add comment in the code, from Emil.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
A prior, literal reading of the ASTC spec led to the prohibition
of some compressed formats being used against the targets:
TEXTURE_CUBE_MAP_ARRAY and TEXTURE_3D. Since the spec does not specify
interactions with other extensions for specific compressed textures,
remove such interactions.
Fixes the following Piglit tests on Gen9:
piglit.spec.arb_direct_state_access.getcompressedtextureimage
piglit.spec.arb_get_texture_sub_image.arb_get_texture_sub_image-getcompressed
piglit.spec.arb_texture_cube_map_array.fbo-generatemipmap-cubemap array s3tc_dxt1
piglit.spec.ext_texture_compression_s3tc.getteximage-targets cube_array s3tc
v2. Don't interact with other specific compressed formats (Ian).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91927
Suggested-by: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Provide the ability to prevent any permanently enabled extension
from appearing in the string returned by glGetString[i]().
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
I noticed that brw_vs.c does this.
I believe the point is that nir->num_uniforms is either counted in
scalar components (in scalar mode), or vec4 slots (in vector mode).
But we want param_count to be in scalar components regardless, so
we have to scale up in vector mode.
We don't have to scale up in scalar mode, though.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I think I may have regressed this in the NIR conversion. TGSI-to-NIR is
putting the PSIZ in the .x channel, not .w, so we were grabbing some
garbage for point size, which ended up meaning just not drawing points.
Fixes glean pointAtten and pointsprite.
Same fix as on a3xx - set the second (tiny) layer size bitfield to the
smallest level's size so that the hw knows not to minify beyond that.
This fixes texelFetch sampler3D piglits.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
When layer is the container, slices are tightly packed inside of each
layer. We don't need any additional alignment. On a3xx, each slice
contains all the layers, so having alignment makes sense.
This fixes a whole slew of array-related piglits, including texelFetch
and tex-miplevel-selection varieties.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
GL_OES_geometry_shader not started (based on GL_ARB_geometry_shader4, which is done for all drivers)
GL_OES_geometry_shader started (Marta)
GL_OES_gpu_shader5 not started (based on parts of GL_ARB_gpu_shader5, which is done for some drivers)
GL_OES_primitive_bounding box not started
GL_OES_sample_shading not started (based on parts of GL_ARB_sample_shading, which is done for some drivers)
@@ -256,7 +256,7 @@ GLES3.2, GLSL ES 3.2
GL_OES_texture_border_clamp not started (based on GL_ARB_texture_border_clamp, which is done)
GL_OES_texture_buffer not started (based on GL_ARB_texture_buffer_object, GL_ARB_texture_buffer_range, and GL_ARB_texture_buffer_object_rgb32 that are all done)
GL_OES_texture_cube_map_array not started (based on GL_ARB_texture_cube_map_array, which is done for all drivers)
GL_OES_texture_stencil8 not started (based on GL_ARB_texture_stencil8, which is done for some drivers)
GL_OES_texture_stencil8 DONE (all drivers that support GL_ARB_texture_stencil8)
GL_OES_texture_storage_multisample_2d_array DONE (all drivers that support GL_ARB_texture_multisample)
More info about these features and the work involved can be found at
<ahref="relnotes/11.1.2.html">Mesa 11.1.2</a> is released.
This is a bug-fix release.
</p>
<h2>January 22, 2016</h2>
<p>
<ahref="relnotes/11.0.9.html">Mesa 11.0.9</a> is released.
This is a bug-fix release.
<br>
NOTE: It is anticipated that 11.0.9 will be the final release in the 11.0
series. Users of 11.0 are encouraged to migrate to the 11.1 series in order
to obtain future fixes.
</p>
<h2>January 13, 2016</h2>
<p>
<ahref="relnotes/11.1.1.html">Mesa 11.1.1</a> is released.
This is a bug-fix release.
</p>
<h2>December 21, 2015</h2>
<p>
<ahref="relnotes/11.0.8.html">Mesa 11.0.8</a> is released.
This is a bug-fix release.
</p>
<h2>December 15, 2015</h2>
<p>
<ahref="relnotes/11.1.0.html">Mesa 11.1.0</a> is released. This is a new
development release. See the release notes for more information about
the release.
</p>
<h2>December 9, 2015</h2>
<p>
<ahref="relnotes/11.0.7.html">Mesa 11.0.7</a> is released.
This is a bug-fix release.
</p>
<p>
Mesa demos 8.3.0 is also released.
See the <ahref="http://lists.freedesktop.org/archives/mesa-announce/2015-December/000191.html">announcement</a> for more information about the release.
You can download it from <ahref="ftp://ftp.freedesktop.org/pub/mesa/demos/8.3.0/">ftp.freedesktop.org/pub/mesa/demos/8.3.0/</a>.
</p>
<h2>November 21, 2015</h2>
<p>
<ahref="relnotes/11.0.6.html">Mesa 11.0.6</a> is released.
@@ -45,8 +45,6 @@ because compatibility contexts are not supported.
<ul>
<ul>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91993">Bug 91993</a> - Graphical glitch in Astromenace (open-source game).</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92214">Bug 92214</a> - Flightgear crashes during splashboot with R600 driver, LLVM 3.7.0 and mesa 11.0.2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90348">Bug 90348</a> - Spilling failure of b96 merged value</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92363">Bug 92363</a> - [BSW/BDW] ogles1conform Gets test fails</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92438">Bug 92438</a> - Segfault in pushbuf_kref when running the android emulator (qemu) on nv50</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93110">Bug 93110</a> - [NVE4] textureSize() and textureQueryLevels() uses a texture bound during the previous draw call</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92909">Bug 92909</a> - Offset/alignment issue with layout std140 and vec3</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93004">Bug 93004</a> - Guild Wars 2 crash on nouveau DX11 cards</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93215">Bug 93215</a> - [Regression bisected] Ogles1conform Automatic mipmap generation test is fail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93266">Bug 93266</a> - gl_arb_shading_language_420pack does not allow binding of image variables</li>
</ul>
<h2>Changes</h2>
<p>Boyuan Zhang (1):</p>
<ul>
<li>radeon/uvd: uv pitch separation for stoney</li>
</ul>
<p>Dave Airlie (9):</p>
<ul>
<li>r600: do SQ flush ES ring rolling workaround</li>
<li>r600: SMX returns CONTEXT_DONE early workaround</li>
<li>r600/shader: split address get out to a function.</li>
<li>r600/shader: add utility functions to do single slot arithmatic</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=38109">Bug 38109</a> - i915 driver crashes if too few vertices are submitted (Mesa 7.10.2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=49779">Bug 49779</a> - Extra line segments in GL_LINE_LOOP</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=55552">Bug 55552</a> - Compile errors with --enable-mangling</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=71789">Bug 71789</a> - [r300g] Visuals not found in (default) depth = 24</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79783">Bug 79783</a> - Distorted output in obs-studio where other vendors "work"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=80821">Bug 80821</a> - When LIBGL_ALWAYS_SOFTWARE is set, KHR_create_context is not supported</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=81174">Bug 81174</a> - Gallium: GL_LINE_LOOP broken with more than 512 points</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83508">Bug 83508</a> - [UBO] Assertion for array of blocks</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84677">Bug 84677</a> - Triangle disappears with glPolygonMode GL_LINE</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86720">Bug 86720</a> - [radeon] Europa Universalis 4 freezing during game start (10.3.3+, still broken on 11.0.2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89014">Bug 89014</a> - PIPE_QUERY_GPU_FINISHED is not acting as expected on SI</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90175">Bug 90175</a> - [hsw bisected][PATCH] atomic counters doesn't work for a binding point different to zero</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90348">Bug 90348</a> - Spilling failure of b96 merged value</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90631">Bug 90631</a> - Compilation failure for fragment shader with many branches on Sandy Bridge</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90734">Bug 90734</a> - glBufferSubData is corrupting data when buffer is > 32k</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90887">Bug 90887</a> - PhiMovesPass in register allocator broken</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91044">Bug 91044</a> - piglit spec/egl_khr_create_context/valid debug flag gles* fail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91254">Bug 91254</a> - (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91292">Bug 91292</a> - [BDW+] glVertexAttribDivisor not working in combination with glPolygonMode</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91342">Bug 91342</a> - Very dark textures on some objects in indoors environments in Postal 2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91551">Bug 91551</a> - DXTn compressed normal maps produce severe artifacts on all NV5x and NVDx chipsets</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91596">Bug 91596</a> - EGL_KHR_gl_colorspace (v2) causes problem with Android-x86 GUI</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91716">Bug 91716</a> - [bisected] piglit.shaders.glsl-vs-int-attrib regresses on 32 bit BYT, HSW, IVB, SNB</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91719">Bug 91719</a> - [SNB,HSW,BYT] dEQP regressions associated with using NIR for vertex shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91726">Bug 91726</a> - R600 asserts in tgsi_cmp/make_src_for_op3</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91780">Bug 91780</a> - Rendering issues with geometry shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91785">Bug 91785</a> - make check DispatchSanity_test.GLES31 regression</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91788">Bug 91788</a> - [HSW Regression] Synmark2_v6 Multithread performance case FPS reduced by 36%</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91847">Bug 91847</a> - glGenerateTextureMipmap not working (no errors) unless glActiveTexture(GL_TEXTURE1) is called before</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91857">Bug 91857</a> - Mesa 10.6.3 linker is slow</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91881">Bug 91881</a> - regression: GPU lockups since mesa-11.0.0_rc1 on RV620 (r600) driver</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91898">Bug 91898</a> - src/util/mesa-sha1.c:250:25: fatal error: openssl/sha.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92052">Bug 92052</a> - nir/nir_builder.h:79: error: expected primary-expression before ‘.’ token</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92054">Bug 92054</a> - make check gbm-symbols-check regression</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92066">Bug 92066</a> - [ILK,G45,regression] New assertion on BRW_MAX_MRF breaks ilk and g45</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92072">Bug 92072</a> - Wine breakage since d082c5324 (st/mesa: don't call st_validate_state in BlitFramebuffer)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92122">Bug 92122</a> - [bisected, cts] Regression with Assault Android Cactus</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92124">Bug 92124</a> - shader_query.cpp:841:34: error: ‘strndup’ was not declared in this scope</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92183">Bug 92183</a> - linker.cpp:3187:46: error: ‘strtok_r’ was not declared in this scope</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92214">Bug 92214</a> - Flightgear crashes during splashboot with R600 driver, LLVM 3.7.0 and mesa 11.0.2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92221">Bug 92221</a> - Unintended code changes in _mesa_base_tex_format commit</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92265">Bug 92265</a> - Black windows in weston after update mesa to 11.0.2-1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92363">Bug 92363</a> - [BSW/BDW] ogles1conform Gets test fails</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92437">Bug 92437</a> - osmesa: Expose GL entry points for Windows build, via .def file</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92438">Bug 92438</a> - Segfault in pushbuf_kref when running the android emulator (qemu) on nv50</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92621">Bug 92621</a> - [G965 ILK G45] Regression: 24 piglit regressions in glsl-1.10</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92623">Bug 92623</a> - Differences in prog_data ignored when caching fragment programs (causes hangs)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92634">Bug 92634</a> - gallium's vl_mpeg12_decoder does not work with st/va</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92705">Bug 92705</a> - [clover] fail to build with llvm-svn/clang-svn 3.8</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92709">Bug 92709</a> - "LLVM triggered Diagnostic Handler: unsupported call to function ldexpf in main" when starting race in stuntrally</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92738">Bug 92738</a> - Randon R7 240 doesn't work on 16KiB page size platform</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92744">Bug 92744</a> - [g965 Regression bisected] Performance regression and piglit assertions due to liveness analysis</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92900">Bug 92900</a> - [regression bisected] About 700 piglit regressions is what could go wrong</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92909">Bug 92909</a> - Offset/alignment issue with layout std140 and vec3</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92985">Bug 92985</a> - Mac OS X build error "ar: no archive members specified"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93015">Bug 93015</a> - Tonga Elemental segfault + VM faults since radeon: implement r600_query_hw_get_result via function pointers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93235">Bug 93235</a> - [regression] dispatch sanity broken by GetPointerv</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93266">Bug 93266</a> - gl_arb_shading_language_420pack does not allow binding of image variables</li>
</ul>
<h2>Changes</h2>
TBD.
<li>MPEG4 decoding has been disabled by default in the VAAPI driver</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91596">Bug 91596</a> - EGL_KHR_gl_colorspace (v2) causes problem with Android-x86 GUI</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93628">Bug 93628</a> - Exception: attempt to use unavailable module DRM when building MesaGL 11.1.0 on windows</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93648">Bug 93648</a> - Random lines being rendered when playing Dolphin (geometry shaders related, w/ apitrace)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93650">Bug 93650</a> - GL_ARB_separate_shader_objects is buggy (PCSX2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93717">Bug 93717</a> - Meta mipmap generation can corrupt texture state</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93722">Bug 93722</a> - Segfault when compiling shader with a subroutine that takes a parameter</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93731">Bug 93731</a> - glUniformSubroutinesuiv segfaults when subroutine uniform is bound to a specific location</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=93761">Bug 93761</a> - A conditional discard in a fragment shader causes no depth writing at all</li>
print'scons: warning: Visual Studio versions prior to 2012 are known to produce incorrect code when optimizations are enabled ( https://bugs.freedesktop.org/show_bug.cgi?id=58718 )'
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.